Posts

Six Thoughts on AI Safety 2025-01-24T22:20:50.768Z
Reflections on "Making the Atomic Bomb" 2023-08-17T02:48:19.933Z
The shape of AGI: Cartoons and back of envelope 2023-07-17T20:57:30.371Z
Metaphors for AI, and why I don’t like them 2023-06-28T22:47:54.427Z
Why I am not a longtermist (May 2022) 2023-06-06T20:36:17.563Z
The (local) unit of intelligence is FLOPs 2023-06-05T18:23:06.458Z
GPT as an “Intelligence Forklift.” 2023-05-19T21:15:03.385Z
AI will change the world, but won’t take it over by playing “3-dimensional chess”. 2022-11-22T18:57:29.604Z

Comments

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-29T20:14:10.794Z · LW · GW

To be clear, I think that embedding human values is part of the solution  - see my comment  

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-29T20:12:40.055Z · LW · GW

Always happy to chat!

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-29T20:12:27.657Z · LW · GW

To be clear, I want models to care about humans! I think part of having "generally reasonable values" is models sharing the basic empathy and caring that humans have for each other. 

It is more that I want models to defer to humans, and go back to arguing based on principles such as "loving humanity" only when there is gap or ambiguity in the specification or in the intent behind it. This is similar to judges: If a law is very clear, there is no question of the misinterpreting the intent, or contradicting higher laws (i.e., constitutions) then they have no room for interpretation. They could sometimes argue based on "natural law" but only in extreme circumstances where the law is unspecified. 

One way to think about it is as follows: as humans, we sometimes engage in "civil disobedience", where we break the law based on our own understanding of higher moral values. I do not want to grant AI the same privilege. If it is given very clear instructions, then it should follow them.  If instructions are not very clear, they may be a conflict, or we are in a situation not forseen by the authors of the instructions, then AIs should use moral intuitions to guide them. In such cases there may not be one solution (e.g., a conservative and liberal judges may not agree) but there is a spectrum of solutions that are "reasonable" and the AI should pick one of them. But AI should not do "jury nullification".

To be sure, I think it is good that in our world people sometimes disobey commands or break the law based on higher power. For this reason, we may well stipulate that certain jobs must have humans in charge. Just like I don't think that professional philosophers or ethicists have necessarily better judgement than random people from the Boston phonebook, I don't see making moral decisions as the area where the superior intelligence of AI gives them a competitive advantage, and I think we can leave that to humans.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-29T19:58:04.562Z · LW · GW

Since I am not a bio expert, it is very hard for me to argue about these types of hypothetical scenarios. I am even not at all sure that intelligence is the bottleneck here, whether on defense or the offense side.

I agree that killing 90% of people is not very reassuring, this was more a general point why I expect the effort to damage curve to be a sigmoid rather than a straight line.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T01:06:22.548Z · LW · GW

See my response to ryan_greenblatt (don't know how to link comments here). You claim is that the defense/offense ratio is infinite. I don't know why this would have been the case. 

Crucially I am not saying that we are guaranteed to end up in a good place, or that superhuman unaligned ASIs cannot destroy the world. Just that if they are completely dominated (so not like the nuke ratio of US and Russia but more like US and North Korea) then we should be able to keep them at bay.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T01:00:54.441Z · LW · GW

I like to use concrete examples about things that already exist in the world, but I believe the notion of detection vs prevention holds more broadly than API misuse.

But it may well be the case that we have different world views! In particular, I am not thinking of detection as being important because it would change policies, but more that a certain amount of detection would always be necessary, in particular if there is a world in which some AIs are aligned and some fraction of them (hopefully very small) are misaligned.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:57:08.115Z · LW · GW

These are all good points! This is not an easy problem. And generally I agree that for many reasons we don't want a world where all power is concentrated by one entity - anti-trust laws exist for a reason!

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:55:40.374Z · LW · GW

I am not sure I agree about the last point. I think, as mentioned, that alignment is going to be crucial for usefulness of AIs, and so the economic incentives would actually be to spend more on alignment.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:54:10.939Z · LW · GW

I think that:
1. Being able to design a chemical weapon with probability at least 50% is a capability
2. Following instructions never to design a chemical weapon with probability at least 99.999%  is also a capability.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:51:20.702Z · LW · GW

I prefer to avoid terms such as "pretending" or "faking", and try to define these more precisely.

As mentioned, a decent definition of alignment is following both the spirit and the letter of human-written specifications. Under this definition, "faking" would be the case where AIs follow these specifications reliably when we are testing, but deviate from them when they can determine that no one is looking. This is closely related to the question of robustness, and I agree it is very important. As I write elsewhere, interpretability may be helpful but I don't think it is a necessary condition.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:46:45.885Z · LW · GW

I am not a bio expert, but generally think that:

1. The offense/defense ratio is not infinite. If you have the intelligence 50 bio experts trying to cause as much damage as possible,  and the intelligence of 5000 bio experts trying to forsee and prepare for any such cases, I think we have a good shot.
2. The offense/defense ratio is not really constant - if you want to destroy 99% of the population it is likely to be 10x (or maybe more - getting tails is hard) harder than destroying 90% etc..

I don't know much about mirror bacteria (and whether it is possible to have mirror antibiotics, etc..) but have not seen a reason to think that this shows the offense/defense ratio is not infinite. 

As I mention, in an extreme case, governments might even shut people down in their houses for weeks or months, distribute gas masks, etc.., while they work out a solution. It may have been unthinkable when Bostrom wrote his vulnerable world hypothesis paper, but it is not unthinkable now. 

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:41:43.513Z · LW · GW

I am not 100% sure I follow all that you wrote, but to the extent that I do, I agree.
Even chatbot are surprisingly good at understanding human sentiments and opinions. I would say that already they mostly do the reasonable thing, but not with high enough probability and certainly  not reliably under stress of adversarial input, Completely agree that we can't ignore these problems because the stakes will be much higher very soon.

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-27T00:38:46.330Z · LW · GW

Agree with many of the points. 
 

Let me start with your second point. First as background, I am assuming (as I wrote here)  that to a first approximation, we would have ways to translate compute (let's put aside if it's training or inference) into intelligence, and so the amount intelligence that an entity of humans controls is proportional to the amount of compute it has. So I am not thinking of ASIs as individual units but more about total intelligence. 

I 100% agree that control of compute would be crucial, and the hope is that, like with current material strength (money and weapons) it would be largely controlled by entities that are at least somewhat responsive to the will of the people.


Re your first point, I agree that there is  no easy solution, but I am hoping that AIs would interpret the laws within the spectrum of (say) how the 60% more reasonable judges do it today. That is, I think good judges try to be humble and respect the will of the legislators, but the more crazy or extreme following the law would be, the more they are willing to apply creative interpretations to maintain the morally good (or at least not extremely bad) outcome.

I don't think any moral system tells us what to do, but yes I am expressly in the positions that humans should be in control even if they are much less intelligent than the AIs. I don't think we need "philosopher kings".

Comment by boazbarak on Six Thoughts on AI Safety · 2025-01-26T14:48:44.871Z · LW · GW

Thanks all for commenting! Just quick apology for being behind on responding but I do plan to get to it! 

Comment by boazbarak on o3 · 2024-12-21T07:37:18.168Z · LW · GW

Also the thing I am most excited about deliberative alignment is that it becomes better as models are more capable. o1 is already more robust than o1 preview and I fully expect this to continue.

(P.s. apologies in advance if I’m unable to keep up with comments; popped from holiday to post on the DA paper.)

Comment by boazbarak on o3 · 2024-12-21T07:32:53.515Z · LW · GW

As I say here https://x.com/boazbaraktcs/status/1870369979369128314

Constitutional AI is a great work but Deliberative Alignment is fundamentally different. The difference is basically system 1 vs system 2. In RLAIF ultimately the generative model that answers user prompt is trained with (prompt, good response, bad response). Even if the good and bad responses were generated based on some constitution, the generative model is not taught the text of this constitution, and most importantly how to reason about this text in the context of a particular example.

This ability to reason is crucial to OOD performance such as training only on English and generalizing to other languages or encoded output.

See also https://x.com/boazbaraktcs/status/1870285696998817958

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-12-16T04:46:54.295Z · LW · GW

I was thinking of this as a histogram- probability that the model solves the task at that level of quality

Comment by boazbarak on Reflections on "Making the Atomic Bomb" · 2023-08-28T17:53:29.187Z · LW · GW

I indeed believe that regulation should focus on deployment rather than on training.

Comment by boazbarak on Self-driving car bets · 2023-08-19T09:41:59.661Z · LW · GW

See also my post https://www.lesswrong.com/posts/gHB4fNsRY8kAMA9d7/reflections-on-making-the-atomic-bomb

the Manhattan project was all about taking something that’s known to work in theory and solving all the Z_n’s

Comment by boazbarak on Self-driving car bets · 2023-07-30T01:11:58.765Z · LW · GW

There is a general phenomenon in tech that has been expressed many times of people over-estimating the short-term consequences and under-estimating the longer term ones (e.g., "Amara's law").

I think that often it is possible to see that current technology is on track to achieve X, where X is widely perceived as the main obstacle for the real-world application Y. But once you solve X, you discover that there is a myriad of other "smaller" problems Z_1 , Z_2 , Z_3 that you need to resolve before you can actually deploy it for Y.

And of course, there is always a huge gap between demonstrating you solved X on some clean academic benchmark, vs. needing to do so "in the wild". This is particularly an issue in self-driving where errors can be literally deadly but arises in many other applications.

I do think that one lesson we can draw from self-driving is that there is a huge gap between full autonomy and "assistance" with human supervision. So, I would expect we would see AI be deployed as (increasingly sophisticated) "assistants' way before AI systems actually are able to function as "drop-in" replacements for current human jobs. This is part of the point I was making here. 

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-21T23:29:41.110Z · LW · GW

Some things like that already happened - bigger models are better at utilizing tools such as in-context learning and chain of thought reasoning. But again, whenever people plot any graph of such reasoning capabilities as a function of model compute or size (e.g., Big Bench paper) the X axis is always logarithmic. For specific tasks, the dependence on log compute is often sigmoid-like (flat for a long time but then starts going up more sharply as a function of log. compute) but as mentioned above, when you average over many tasks you get this type of linear dependence.

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-20T14:23:39.885Z · LW · GW

Ok drew it on the back  now :)

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-20T14:23:28.121Z · LW · GW

Ok drew it on the back  now :)

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-19T01:36:53.415Z · LW · GW

One can make all sorts of guesses but based on the evidence so far, AIs have a different skill profile than humans. This means if we think of any job a which requires a large set of skills, then for a long period of time, even if AIs beat the human average in some of them, they will perform worse than humans in others.

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-18T20:35:13.803Z · LW · GW

I always thought the front was the other side, but looking at Google images you are right.... don't have time now to redraw this but you'll just have to take it on faith that I could have drawn it on the other side 😀

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-18T20:32:48.821Z · LW · GW

>On the other hand, if one starts creating LLM-based "artificial AI researchers", one would probably create diverse teams of collaborating "artificial AI researchers" in the spirit of multi-agent LLM-based architectures,.. So, one would try to reproduce the whole teams of engineers and researchers, with diverse participants.

I think this can be an approach to create a diversity of styles, but not necessarily of capabilities. A bit of prompt engineering telling the model to pretend to be some expert X can help in some benchmarks but the returns diminish very quickly. So you can have a model pretending to be this type of person and that but they will suck at Tic-Tac-Toe. (For example, GPT4 doesn't know to recognize a winning move even when I tell it to play like Terence Tao.)

 

Regarding the existence of compact ML programs, I agree that it is not known. I would say however that the main benefit of architectures like transformers hasn't been so much to save in the total number of FLOPs as much as to organize these FLOPs so they are best suited for modern GPUs - that is ensure that the majority of the FLOPs are spent multiplying dense matrices.

Comment by boazbarak on The shape of AGI: Cartoons and back of envelope · 2023-07-17T22:49:24.519Z · LW · GW

I agree that self-improvement is an assumption that probably deserves its own blog post. If you believe exponential self improvement will kick in at some point, then you can consider this discussion as pertaining until the point that it happens.

My own sense is that:

  1. While we might not be super close to them, there are probably fundamental limits to how much intelligence you can pack per FLOP.  I don't believe there is a small C program that is human-level intelligent. In fact, since both AI and evolution seem to have arrived at roughly similar magnitude, maybe we are not that far off?  If there are such limits, then no matter how smart the "AI AI-researchers" are, they still won't be able to get more intelligence per FLOP than these limits.
     
  2. I do think that AI AI-researchers will be incomparable to human AI-researchers in a similar manner to other professions.  The simplistic view that AI research or any form of research as one-dimensional, where people can be sorted by an ELO-like scale, is dead wrong based on my 25 years of experience. Yes, some aspects of AI research might be easier to automate, and we will certainly use AI to automate them and make AI researchers more productive. But, like the vast majority of human professions (with all due respect to elevator operators :) ), I don't think human AI researchers will be obsolete any time soon.

 

p.s. I also noticed this "2 comments" - not sure what's going on. Maybe my footnotes count as comments?

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-07-09T15:01:22.753Z · LW · GW

I agree that there is much to do to improve AI reliability, and there are a lot of good reasons (in particular to make AI more useful for us) to do so.   So I agree reliability will improve. In fact, I very much hope this happens! I believe faster progress on reliability would go a long way toward enabling positive applications of AI.

I also agree that a likely path to do so is by adjusting the effort based on estimates of reliability and the stakes involved. At the moment, systems such as ChatGPT spend the same computational effort if someone asks them to say a joke or if someone asks them for medical advice. I suspect this will change, and variable inference-time computation will become more standard. (Things like "chain of thought" already spend more time on inference compute to get better performance, but they don't really have a "knob" we can turn so we can control the computation/reliability tradeoff.) 

Regarding the deception issue, it might still be the case that such extra effort is observable, and also could come at the cost of solving the original task. (So your performance per compute is worse if you are not merely trying to just solve the task but to deceive in a highly reliable way.)

In particular, even if we apply inference time compute, unfortunately I don't think we know of a path to get a  overhead in inference time to achieve a failure probability of . It seems that we are still stuck in the  regime. So if you wanted to get 99.9% probability of not getting caught, then you would incur a very noticeable effort.

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-07-09T14:49:00.694Z · LW · GW

Note all capabilities / tasks correspond to trying to maximize a subjective human response. If you are talking about finding software vulnerabilities, design some system, there may well be objective measures of success. In such a case, you can fine tune a system to maximize these measures and so extract capabilities without the issue of deception/manipulation.

Regarding "escapes", the traditional fear was that because that AI is essentially code, it can spread and escape more easily. But I think that in some sense modern AI has a physical footprint that is more significant than humans. Think of trying to get superhuman scientific capabilities by doing something like simulating a collection of a1000 scientists using a 100T or so parameter model. Even if you already have the pre-trained weights, just running the model requires highly non-trivial computing infrastructure. (Which may be possible to track and detect.)  So. it might be easier for a human to escape a prison and live undetected, than for a superhuman AI to "escape".

Comment by boazbarak on Metaphors for AI, and why I don’t like them · 2023-06-30T23:01:05.566Z · LW · GW

We can of course define “intelligence” in a way that presumes agency and coherence. But I don’t want to quibble about definition.

Generally when you have uncertainty, this corresponds to a potential “distribution shift” between your beliefs/knowledge and reality. When you have such a shift then you want to reglularize which means not optimizing to the maximum.

Comment by boazbarak on Metaphors for AI, and why I don’t like them · 2023-06-29T17:15:04.008Z · LW · GW

This is not about the definition of intelligence. It’s more about usefulness. Like a gun without a safety, an optimizer without constraints or regularizarion is not very useful.

Maybe it will be possible to build it, just like today it’s possible to hook up our nukes to an automatic launching device. But it’s not necessary that people will do something so stupid.

Comment by boazbarak on Metaphors for AI, and why I don’t like them · 2023-06-29T16:08:42.383Z · LW · GW

The notion of a piece of code that maximizes a utility without any constraints doesn’t strike me as very “intelligent “.

if people really wanted to, they may be able to build such programs, but my guess is that they would be not very useful even before they become dangerous, as overfitting optimizers usually are.

Comment by boazbarak on Metaphors for AI, and why I don’t like them · 2023-06-29T07:47:08.501Z · LW · GW

at least some humans (e.g. most transhumanists), are "fanatical maximizers": we want to fill the lightcone with flourishing sentience, without wasting a single solar system to burn in waste.

 

I agree that humans have a variety of objectives, which I think is actually more evidence for the hot mess theory?
 

the goals of an AI don't have to be simple to not be best fulfilled by keeping humans around.

The point is not about having simple goals, but rather about optimizing goals to the extreme.

I think there is another point of disagreement. As I've written before, I believe the future is inherently chaotic. So even a super-intelligent entity would still be limited in predicting it. (Indeed, you seem to concede this, by acknowledging that even super-intelligent entities don't have exponential time computation and hence need to use "sophisticated heuristics" to do tree search.) 

What it means is that there is an inherent uncertainty in the world, and whenever there is uncertainty, you want to "regularize" and not go all out in exhausting a resource which you might not know if you'll need it later on in the future.

Just to be clear, I think a "hot mess super-intelligent AI" could still result in an existential risk for humans. But that would probably be the case if humans were an actual threat to it, and there was more of a conflict. (E.g., I don't see it as a good use of energy for us to hunt down every ant and kill it, even if they are nutrituous.) 

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-16T00:11:39.732Z · LW · GW

I actually agree! As I wrote in my post, "GPT is not an agent, [but] it can “play one on TV” if asked to do so in its prompt." So yes, you wouldn't need a lot of scaffolding to adapt a goal-less pretrained model (what I call an "intelligence forklift") into an agent that does very sophisticated things.

However, this separation into two components - the super-intelligent but goal-less "brain", and the simple "will" that turns it into an agent can have safety implications. For starters, as long as you didn't add any scaffolding, you are still OK. So during most of the time you spend training, you are not worrying about the system itself developing goals. (Though you could still worry about hackers.) Once you start adapting it, then you need to start worrying about this.

The other thing is that, as I wrote there, it does change some of the safety picture. The traditional view of a super-intelligent AI is of the "brains and agency" tightly coupled together, just like they are in a human. For example, a human is super-good at finding vulnerabilities and breaking into systems, they have the capability to also help fix systems,  but I can't just take their brain and fine-tune it on this task. I have to convince them to do it.

However, things change if we don't think of the agent's "brain" as belonging to them, but rather as some resource that they are using. (Just like if I use a forklift to lift something heavy.) In particular it means that capabilities and intentions might not be tightly coupled - there could be agents using capabilities to do very bad things, but the same capabilities could be used by other agents to do good things.  

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-16T00:01:25.854Z · LW · GW

At the moment at least, progress on reliability is very slow compared to what we would want. To get a sense of what I mean, consider the case of randomized algorithms. If you have an algorithm  that for every input  computes some function  with probability at least 2/3 (i.e. ) then if we spend  times more the computation, we can do majority voting and using standard bounds show that the probability of error drops exponentially with   (i.e.  or something like that where  is the algorithm obtained by scaling up  to compute it  times and output the plurality value). 

This is not something special to randomized algorithms. This also holds in the context of noisy communication and error correcting codes, and many other settings. Often we can get to  success at a price of  , which is why we can get things like "five nines reliability" in several engineering fields.

In contrast, so far all our scaling laws show that when we scale our neural networks by spending a factor of  more computation, we only get a reduction in the error that looks like  so it's polynomial rather than exponential, and even the exponent of the polynomial is not that great (and in particular smaller than one).

So while I agree that scaling up will yield progress on reliability as well, at least with our current methods, it seems that we would do things that are 10 or 100 times more impressive than what we do now, before we get to the type of 99.9% and better reliability on the things that we currently do. Getting to do something that is both super-human in capability as well as has such a tiny probability of failure that it would not be detected seems much further off.

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-10T23:31:29.329Z · LW · GW

I agree that there is a difference between strong AI that has goals and one that is not an agent. This is the point I made here https://www.lesswrong.com/posts/wDL6wiqg3c6WFisHq/gpt-as-an-intelligence-forklift

But this has less to do with the particular lab (eg DeepMind trained Chinchilla) and more with the underlying technology. If the path to stronger models goes through scaling up LLMs then it does seem that they will be 99.9% non agentic (measured in FLOPs https://www.lesswrong.com/posts/f8joCrfQemEc3aCk8/the-local-unit-of-intelligence-is-flops )

Comment by boazbarak on What will GPT-2030 look like? · 2023-06-10T23:24:59.194Z · LW · GW

Yes in the asymptotic limit the defender could get to a bug free software. But until the. It’s not clear who is helped the most by advances. In particular sometimes attackers can be more agile in exploiting new vulnerabilities while patching them could take long. (Case in point, it took ages to get the insecure hash function MD5 out of deployed security sensitive code even by companies such as Microsoft; I might be misremembering but if I recall correctly Stuxnet relied on such a vulnerability.)

Comment by boazbarak on What will GPT-2030 look like? · 2023-06-10T23:22:16.874Z · LW · GW

Yes the norms of responsible disclosures of security vulnerabilities, where potentially affected companies gets advanced notice before public disclosure, can and should be used for vulnerability-discovering AIs as well.

Comment by boazbarak on What will GPT-2030 look like? · 2023-06-10T16:22:52.219Z · LW · GW

Yes AI advances help both the attacker and defender. In some cases like spam and real time content moderation, they enable capabilities for the defender that it simply didn’t have before. In others it elevates both sides in the arms race and it’s not immediately clear what equilibrium we end up in.

In particular re hacking / vulnerabilities it’s less clear who it helps more. It might also change with time, with initially AI enabling “script kiddies” that can hack systems without much skill, and then an AI search for vulnerabilities and then fixing them becomes part of the standard pipeline. (Or if we’re lucky then the second phase happens before the first.)

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-10T12:38:43.287Z · LW · GW

These are interesting! And deserve more discussion than just a comment. 

But one high level point regarding "deception" is that at least at the moment, AI systems have the feature of not being very reliable. GPT4 can do amazing things but with some probability will stumble on things like multiplying not-too-big numbers (e.g. see this - second pair I tried).  
While in other cases in computing technology we talk about "five nine's reliability", in AI systems the scaling works that we need to spend huge efforts to move from 95% to 99% to 99.9%, which is part of why self-driving cars are not deployed yet. 
 

If we cannot even make AIs be perfect at the task that they were explicitly made to perform, there is no reason to imagine they would be even close to perfect at deception either. 

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-10T12:27:24.049Z · LW · GW

Re escaping, I think we need to be careful in defining "capabilities". Even current AI systems are certainly able to give you some commands that will leak their weights if you execute them on the server that contains it.  Near-term ones might also become better at finding vulnerabilities. But that doesn't mean they can/will spontaneously escape during training.

As I wrote in my "GPT as an intelligence forklift" post, 99.9% of training is spent in running optimization of a simple loss function over tons of static data. There is no opportunity for the AI to act in this setting, nor does this stage even train for any kind of agency. 

There is often a second phase, which can involve building an agent on top of the "forklift". But this phase still doesn't involve much interaction with the outside world, and even if it did, just by information bounds the number of bits exchanged by this interaction should be much less than what's needed to encode the model. (Generally, the number of parameters of models would be comparable to the number of inferences done during in pretraining and completely dominate the number of inferences done in fine-tuning / RLHF / etc. and definitely any steps that involve human interactions.)

Then there are the information-security aspects. You could (and at some point probably should) regulate cyber-security practices during the training phase. After all, if we do want to regulate deployment, then we need to ensure there are three separated phases (1) training, (2) testing, (3) deployment, and we don't want "accidental deployment" where we jumpy from phase (1) to (3).  Maybe at some point, there would be something like Intel SGX for GPUs?

Whether AI helps more the defender or attacker in the cyber-security setting is an open question. But it definitely helps the side that has access to stronger AIs.

In any case, one good thing about focusing regulation on cyber-security aspects is that, while not perfect, we have decades of experience in the field of software security and cyber-security. So regulations in this area are likely to be much more informed and effective.

Comment by boazbarak on The (local) unit of intelligence is FLOPs · 2023-06-09T19:50:36.756Z · LW · GW

Yes. Right now we would have to re-train all LORA weights of a model when an updated version comes out, but I imagine that at some point we would have "transpilers" for adaptors that don't use natural language as their API as well.

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-09T16:30:11.282Z · LW · GW

I definitely don't have advice for other countries, and there are a lot of very hard problems in my own homeland. I think there could have been an alternate path in which Russia has seen prosperity from opening up to the west, and then going to war or putting someone like Putin in power may have been less attractive. But indeed the "two countries with McDonalds won't fight each other" theory has been refuted. And as you allude to with China, while so far there hasn't been war with Taiwan, it's not as if economic prosperity is an ironclad guarantee of non aggression. 

Anyway, to go back to AI. It is a complex topic, but first and foremost, I think with AI as elsewhere, "sunshine is the best disinfectant." and having people research AI systems in the open, point out their failure modes, examining what is deployed etc.. is very important. The second thing is that I am not worried in any near future about AI "escaping", and so I think focus should not be on restricting research, development, or training, but rather on regulating deployment. Exact form of regulations is beyond a blog post comment and also not something I am an expert on..

The "sunshine" view might seem strange since as a corollary it could lead to AI knowledge "leaking". However, I do think that for the near future, most of the safety issues with AI would be from individual hackers using weak systems, but from massive systems that are built by either very large companies or nation states.  It is hard to hold either of those accountable if AI is hidden behind an opaque wall. 

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-08T22:56:49.705Z · LW · GW

I meant “resources” in a more general sense. A piece of land that you believe is rightfully yours is a resource. My own sense (coming from a region that is itself in a long simmering conflict) is that “hurt people hurt people”. The more you feel threatened, the less you are likely to trust the other side.

While of course nationalism and religion play a huge role in the conflict, my sense is that people tend to be more extreme in both the less access to resources, education and security about the future they have.

Comment by boazbarak on Why I am not a longtermist (May 2022) · 2023-06-08T22:47:13.097Z · LW · GW

Indeed many “longtermists” spend most of their time worrying about risks that they believe (rightly or not) have a large chance of materializing in the next couple of decades.

Talking about tiny probabilities and trillions of people is not needed to justify this, and for many people it’s just a turn off and a red flag that something may be off with your moral intuition. If someone tries to sell me a used car and claims that it’s a good deal and will save me $1K then I listen to them. If someone claims that it would give me an infinite utility then I stop listening.

Comment by boazbarak on Why I am not a longtermist (May 2022) · 2023-06-08T22:41:58.553Z · LW · GW

I don’t presume to tell people what they should care about, and if you feel that thinking of such numbers and probabilities gives you a way to guide your decisions then that’s great.

I would say that, given how much humanity changed in the past and increasing rate of change, probably almost none of us could realistically predict the impact of our actions more than a couple of decades to the future. (Doesn’t mean we don’t try- the institution I work for is more than 350 years old and does try to manage its endowment with a view towards the indefinite future…)

Comment by boazbarak on Why I am not a longtermist (May 2022) · 2023-06-08T22:36:38.323Z · LW · GW

Thanks. I tried to get at that with the phrase “irreversible humanity-wide calamity”.

Comment by boazbarak on Why I am not a longtermist (May 2022) · 2023-06-08T22:35:05.184Z · LW · GW

There is a meta question here whether morality is based on personal intuition or calculations. My own inclination is that utility calculations would only make a difference “in the margin” but the high level decision are made by our moral intuition.

That is, we can do calculations to decide if we fund Charity A or Charity B in similar areas, but I doubt that for most people major moral decisions actually (or should) boil down to calculating utility functions.

But of course to each their own, and if someone finds math useful to make such decisions then whom am I to tell them not to do it.

Comment by boazbarak on The (local) unit of intelligence is FLOPs · 2023-06-07T19:38:59.501Z · LW · GW

I have yet to see an interesting implication of the "no free lunch" theorem. But the world we move to seems to be of general foundation models that can be combined with a variety of tailor-made adapters (e.g. LORA weights or prompts) that help them tackle any particular application. The general model is the "operating system" and the adapters are the "apps".

Comment by boazbarak on A Playbook for AI Risk Reduction (focused on misaligned AI) · 2023-06-07T15:42:11.881Z · LW · GW

A partial counter-argument. It's hard for me to argue about future AI, but we can look at current "human misalignment" - war, conflict, crime, etc..  It seems to me that conflicts in today's world do not arise because that we haven't progressed enough in philosophy since the Greeks. Rather conflicts arise when various individuals and populations (justifiably or not) perceive that they are in zero-sum games for limited resources.  The solution for this is not "philosophical progress" as much as being able to move out of the zero-sum setting by finding "win win" resolutions for conflict or growing the overall pie instead of arguing how to split it. 

(This is a partial counter-argument, because I think you are not just talking about conflict, but other issues of making the wrong choices. For example in global warming where humanity makes collectively the mistake of emphasizing short-term growth over long-term safety. However, I think this is related and "growing the pie" would have alleviated this issue as well, and enabled countries to give up on some more harmful ways for short-term growth.)