Does the AI control agenda broadly rely on no FOOM being possible?

sharmake-farah

Does the AI control agenda broadly rely on no FOOM being possible?

post by Noosphere89 (sharmake-farah) · 2025-03-29T19:38:23.971Z · LW · GW · No comments

This is a question post.

  Answers
    4 Zach Stein-Perlman
    3 Mis-Understandings
None
No comments

For the purposes of FOOM, I'm defining it as a situation in which once an AI is capable enough to automate away all AI R&D, progress starts exploding hyper-exponentially for a period because the returns to better software is larger than 1, meaning AI labor quality is improving faster than the problem of finding new algorithms gets harder, combined with the potentially high limits to how efficient software can get, meaning that the AI gets OOMs smarter on a fixed compute budget within months or weeks.

These articles can help explain what I mean better:

https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion

https://www.forethought.org/research/will-the-need-to-retrain-ai-models (An auxillary post)

https://www.forethought.org/research/how-far-can-ai-progress-before-hitting-effective-physical-limits (where they estimate that about 5 OOMs of progress could be gotten for free, because human compute used for pretraining us is 10^24 flops, whereas current AI pretraining compute to automate away AI R&D is 10^29 flops, with a median of 8 OOMs more efficiency possible for software, but at very large uncertainty, and their error bars are from 4-12 OOMs more efficient software being possible).

Also note that we are only talking about training compute efficiency, not runtime/inference efficiency, so for the purposes of the discussion we will only talk about training efficiencies being improved in a software intelligence explosion.

Now I don't want to debate about whether the scenario is true (though for those that want my probability of something like FOOM/Software Intelligence Explosion, my probability so far is in the 40-50% range of it happening if we automate AI R&D), but rather the question is about given a software explosion being possible, could we figure out a way to adapt AI control to that case, or is AI basically uncontrollable assuming a software intelligence explosion does happen and there's a lot of OOMs to the physical limit of intelligence in software.

I'd be especially interested in responses from @Buck [LW · GW] or @ryan_greenblatt [LW · GW] on this question, but anyone can answer this question if they have an insight to share that relates to the question.

Answers

answer by Zach Stein-Perlman · 2025-03-29T21:59:46.208Z · LW(p) · GW(p)

It is often said that control is for safely getting useful work out of early powerful AI, not making arbitrarily powerful AI safe.

If it turns out large, rapid, local capabilities increase is possible, the leading developer could still opt to spend some inference compute on safety research rather than all on capabilities research.

↑ comment by Noosphere89 (sharmake-farah) · 2025-03-29T23:54:56.392Z · LW(p) · GW(p)

I agree that some inference compute can be shifted from capabilities to safety, and it work just as well even during a software intelligence explosion.

My worry was more so that a lot of the control agenda and threat models like rogue internal deployments to get more compute would be fundamentally threatened if the assumption that you had to get more hardware compute for more power was wrong, and instead a software intelligence explosion could be done that used in principle fixed computing power, meaning catastrophic actions to disempower humanity/defeat control defenses were much easier for the model.

I'm not saying control is automatically doomed even under FOOM/software intelligence explosion, but I wanted to make sure that the assumption of FOOM being true didn't break a lot of control techniques/defenses/hopes.

answer by Mis-Understandings · 2025-03-30T04:00:28.692Z · LW(p) · GW(p)

An AI control solution is per se a way to control what a AI is doing. If you have AI control, you have the option to tell your AI, don't go FOOM, and have that work.

You would not expect a control measure to continue to work if you told an AI under an AI control protocol to go FOOM.

Improvements in training efficiency are only realized if you actually train the model, and AI control allows you to take the decision to realize training efficiency gains by training a model to a higher point of performance away from the AI that is controlled.

FOOM for software requires that that decision is always yes (either since people keep pushing or the model is in the drivers seat).

So put broadly, the AI control agendas answer to what should you do with an AI system that could go FOOM is not to let it try. Since before it goes FOOM, the model is not able to beat the controls, and going FOOM takes time where the model is working on improving itself not trying hard to not get violently disassembled, an AI control protocol is supposed to be able to turn an AI that goes FOOM over the explicit controls over the course of hours, weeks or months into a deactivated machine.

AI control protocols want to fail loud for this reason. (But a breakout will involving trying to get silent failure for the same reason)

No comments

Comments sorted by top scores.

Does the AI control agenda broadly rely on no FOOM being possible?

Contents

Answers

No comments