Preparing for the Intelligence Explosion

fin

Preparing for the Intelligence Explosion

post by fin, wdmacaskill · 2025-03-11T15:38:29.524Z · LW · GW · 17 comments

This is a link post for https://www.forethought.org/research/preparing-for-the-intelligence-explosion

17 comments

This is a linkpost for a new paper called Preparing for the Intelligence Explosion, by Will MacAskill and Fin Moorhouse. It sets the high-level agenda for the sort of work that Forethought is likely to focus on.

Some of the areas in the paper that we expect to be of most interest to EA Forum or LessWrong readers are:

Section 3 finds that even without a software feedback loop (i.e. “recursive self-improvement”), even if scaling of compute completely stops in the near term, and even if the rate of algorithmic efficiency improvements slow, then we should still expect very rapid technological development — e.g. a century’s worth of progress in a decade — once AI meaningfully substitutes for human researchers.
A presentation, in section 4, of the sheer range of challenges that an intelligence explosion would pose, going well beyond the “standard” focuses of AI takeover risk and biorisk.
Discussion, in section 5, of when we can and can’t use the strategy of just waiting until we have aligned superintelligence and relying on it to solve some problem.
An overview, in section 6, of what we can do, today, to prepare for this range of challenges.

Here’s the abstract:

AI that can accelerate research could drive a century of technological progress over just a few years. During such a period, new technological or political developments will raise consequential and hard-to-reverse decisions, in rapid succession. We call these developments grand challenges.
These challenges include new weapons of mass destruction, AI-enabled autocracies, races to grab offworld resources, and digital beings worthy of moral consideration, as well as opportunities to dramatically improve quality of life and collective decision-making.
We argue that these challenges cannot always be delegated to future AI systems, and suggest things we can do today to meaningfully improve our prospects. AGI preparedness is therefore not just about ensuring that advanced AI systems are aligned: we should be preparing, now, for the disorienting range of developments an intelligence explosion would bring.

17 comments

Comments sorted by top scores.

comment by Julian Bradshaw · 2025-03-13T00:52:12.325Z · LW(p) · GW(p)

Okay I got trapped in a Walgreens and read more of this, found something compelling. Emphasis mine:

The best systems today fall short at working out complex problems over longer time horizons, which require some mix of creativity, trial-and-error, and autonomy. But there are signs of rapid improvement: the maximum duration of ML-related tasks that frontier models can generally complete has been doubling roughly every seven months. Naively extrapolating this trend suggests that, within three to six years, AI models will become capable of automating many cognitive tasks which take human experts up to a month.

This is presented without much fanfare but feels like a crux to me. After all, the whole paper is predicated on the idea that AI will be able to effectively replace the work of human researchers. The paragraph has a footnote (44), which reads:

METR, ‘Quantifying the Exponential Growth in AI Ability to Complete Longer Tasks’ (forthcoming). See also Pimpale et al., ‘Forecasting Frontier Language Model Agent Capabilities’.

So the citation is an unreleased paper! That unreleased paper may make a splash, since (assuming this 7-month-doubling trend is not merely 1-2 years old) it strongly implies we really will find good solutions for turning LLMs agentic fairly soon.

(The second paper cited, only a couple weeks old itself, was mentioned presumably for its forecast of RE-Bench performance, key conclusion: "Our forecast suggests that agent performance on RE-Bench may reach a score of 1—equivalent to the expert baseline reported by Wijk et al. (2024)—around December 2026. We have much more uncertainty about this forecast, and our 95% CI reflects this. It has a span of over 8 years, from August 2025 to May 2033." But it's based on just a few data points from about a period of just 1 year, so not super convincing.)

Replies from: thomas-kwa, Lizka, oliver-daniels-koch

↑ comment by Thomas Kwa (thomas-kwa) · 2025-03-17T18:23:22.752Z · LW(p) · GW(p)

So the citation is an unreleased paper! That unreleased paper may make a splash, since (assuming this 7-month-doubling trend is not merely 1-2 years old) it strongly implies we really will find good solutions for turning LLMs agentic fairly soon.

The 7-month doubling trend we measured actually goes back to GPT-2 in 2019. Since 2024, the trend has been faster, doubling roughly every 3-4 months depending on how you measure, but we only have six 2024-25 models so error bars are wide and it's really unclear which trend will be predictive of the future.

↑ comment by Lizka · 2025-03-19T16:34:24.685Z · LW(p) · GW(p)

FYI: the paper is now out.

See also the LW linkpost: METR: Measuring AI Ability to Complete Long Tasks [LW · GW], and a summary on Twitter.

(IMO this is a really cool paper — very grateful to @Thomas Kwa [LW · GW] et al. I'm looking forward to digging into the details.)

↑ comment by Oliver Daniels (oliver-daniels-koch) · 2025-03-16T03:11:55.643Z · LW(p) · GW(p)

I've been confused what people are talking about when they say "trend lines indicate AGI by 2027" - seems like it's basically this?

Replies from: Julian Bradshaw

↑ comment by Julian Bradshaw · 2025-03-17T02:11:18.124Z · LW(p) · GW(p)

More than just this. OP actually documents it pretty well, see here.

comment by habryka (habryka4) · 2025-03-11T20:03:31.346Z · LW(p) · GW(p)

My guess is you know this, but the sidenote implementation appears to be broken. When clicking on the footnote labeled "1" it opens up a footnote labeled "2", and also, the footnotes overlap on the right in very broken looking ways:

Replies from: wdmacaskill, max-dalton

↑ comment by wdmacaskill · 2025-03-12T08:49:21.909Z · LW(p) · GW(p)

Thanks - appreciate that! It comes up a little differently for me, but still an issue - we've asked the devs to fix.

↑ comment by Max Dalton (max-dalton) · 2025-03-12T08:40:44.034Z · LW(p) · GW(p)

Thanks Oli! I think the clustering issue is fixed now, looking into what's going on with the numbers.

comment by Julian Bradshaw · 2025-03-12T02:59:29.825Z · LW(p) · GW(p)

Meta: I'm kind of weirded out by how apparently everyone is making their own high-effort custom-website-whitepapers? Is this something that's just easier with LLMs now? Did Situational Awareness create a trend? I can't read all this stuff man.

In general there seems to be way more high-effort work coming out since reasoning models got released. Maybe it's just crunchtime.

Replies from: gwern, wdmacaskill, daniel-kokotajlo

↑ comment by gwern · 2025-03-12T20:36:18.139Z · LW(p) · GW(p)

I think it's something of a trend relating to a mix of 'tools for thought' and imitation of some websites (LW2, Read The Sequences, Asterisk, Works in Progress & Gwern.net in particular), and also a STEM meta-trend arriving in this area: you saw this in security vulnerabilities where for a while every major vuln would get its own standalone domain + single-page website + logo + short catchy name (eg. Shellshock, Heartbleed). It is good marketing which helps you stand out in a crowded ever-shorter-attention-span world.

I also think part of it is that it reflects a continued decline of PDFs as the preferred 'serious' document format due to preferring Internet-native things with mobile support. (Adobe has, in theory, been working on 'reflowable' PDFs and other fixes, but I've seen little evidence of that anywhere.)

Most of these things would have once been released as giant doorstop whitepaper-book PDFs. (And you can see that some things do poorly because they exist only as PDFs - the annual Stanford AI report would probably much more read if they had a better HTML story. AFAIK it exists only as giant PDFs everyone intends to read but never get around to doing so, and so everyone only sees a few graphs copied out of it and put in media articles or social media squibs.) Situational Awareness, for example, a few years ago would've definitely been a PDF of some sort. But, PDFs suck on mobile, and now everyone is on mobile.

If you release something as a PDF rather than a semi-competent responsive website which is readable on mobile without opening a separate app & rotating my phone & constantly thumbing up & down a two-column layout designed when cellphones required a car to be attached to, you cut your readership at least in half. I wish I didn't have to support mobile or dark-mode, but I can see in my analytics that it's at least half my readers, and I notice that almost every time someone screenshots Gwern.net on social media, it is from the mobile version (and as often as not, the dark-mode too). Nor are these trash readers - many of them are elite readers, especially of the sort who are creating virality or referencing it or creating downstream readers in various ways. (Ivanka Trump was tweeting SA; do you think she and everyone else connected to the Trump Administration are sitting down at their desktop PC and getting in a few hours of solid in-depth reading? Probably not...) People will even exclusively use the Arxiv HTML versions of papers, despite the fact that the LaTeX->HTML pipeline has huge problems like routinely silently deleting large fractions of papers (so many problems I gave up a while ago filing bug reports on it).

Having a specialized website can be a PITA in the long run, of course, but if you design it right, it should be largely fire-and-forget, and in any case, in many of these releases (policy advocacy, security vulns), the long run is not important.

(I don't think reasoning/coding models have yet had too much to do with this trend, as they tend to either be off-the-shelf or completely bespoke. They are not what I would consider 'high-effort': the difference between something like SA and Gwern.net is truly vast; the former is actually quite simple and any 'fancy' appearance is more just its clean minimalist design and avoiding web clutter. At best, as tireless patient superhumanly knowledgeable consultants, LLMs might remove some friction and enable people unsure if they can make a whole website on their own, and thus cause a few more at the margin. But many of these predate coding LLMs entirely and I'm fairly sure Leopold didn't need much, if any, LLM assistance to do the SA website, as he is a bright guy good at coding and the website is simple.)

↑ comment by wdmacaskill · 2025-03-12T08:53:37.073Z · LW(p) · GW(p)

There's definitely a new trend towards custom-website essays. Forethought is a website for lots of research content, though (like Epoch), not just PrepIE.

And I don't think it's because of people getting more productive because of reasoning models - AI was helpful for PrepIE but more like 10-20% productivity boost than 100% boost, and I don't think AI was used much for SA, either.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2025-03-12T22:24:29.704Z · LW(p) · GW(p)

Websites are just a superior format for presenting information, compared to e.g. standard blog posts or PDFs. You can do more with them to display information in the way you want + there's less friction for the user.

comment by Julian Bradshaw · 2025-03-13T01:07:23.473Z · LW(p) · GW(p)

Random commentary on bits of the paper I found interesting:

Under Windows of opportunity that close early:

Veil of ignorance
Lastly, some important opportunities are only available while we don’t yet know for sure who has power after the intelligence explosion. In principle at least, the US and China could make a binding agreement that if they “win the race” to superintelligence, they will respect the national sovereignty of the other and share in the benefits. Both parties could agree to bind themselves to such a deal in advance, because a guarantee of controlling 20% of power and resources post-superintelligence is valued more than a 20% chance of controlling 100%. However, once superintelligence has been developed, there will no longer be incentive for the ‘winner’ to share power.
Similarly for power within a country. At the moment, virtually everyone in the US might agree that no tiny group or single person should be able to grab complete control of the government. Early on, society could act unanimously to prevent that from happening. But as it becomes clearer which people might gain massive power from AI, they will do more to maintain and grow that power, and it will be too late for those restrictions.

Strong agree here, this is something governments should move quickly on: "No duh" agreements that put up some legal or societal barriers to malfeasance later.

Next, under Space Governance:

Missions beyond the Solar System. International agreements could require that extrasolar missions should be permitted only with a high degree of international consensus. This issue isn’t a major focus of attention at the moment within space law but, perhaps for that reason, some stipulation to this effect in any new treaty might be regarded as unobjectionable.

Also a good idea. I don't want to spend hundreds of years having to worry about the robot colony five solar systems over...

Finally, under Value lock-in mechanisms:

Human preference-shaping technology. Technological advances could enable us to choose and shape our own or others’ preferences, plus those of future generations. For example, with advances in neuroscience, psychology, or even brain-computer interfaces, a religious adherent could self-modify to make it much harder to change their mind about their religious beliefs (and never self-modify to undo the change). They could modify their children’s beliefs, too.

Gotta ask, was this inspired by To the Stars at all? There's no citation, but that story is currently covering the implications of having the technology to choose/shape "preference-specifications" for yourself and for society.

comment by Immanuel Jankvist (emanueljankvist) · 2025-03-11T22:46:40.791Z · LW(p) · GW(p)

The following seems a bit unclear to me, and might warrant an update–if I am not alone in the assessment:

Section 3 finds that even without a software feedback loop (i.e. “recursive self-improvement”), [...], then we should still expect very rapid technological development [...] once AI meaningfully substitutes for human researchers.

I might just be taking issue with the word "without" and taking it in a very literal sense, but to me "AI meaningfully substituting for human researchers" implies at least a weak form of recursive self-improvement.
That is, I would be quite surprised if the world allowed for AI to become as smart as human researchers but no smarter afterwards.

Replies from: caleb-biddulph, wdmacaskill

↑ comment by Caleb Biddulph (caleb-biddulph) · 2025-03-12T00:38:58.493Z · LW(p) · GW(p)

I interpreted this as "even without a software feedback loop, there will be very rapid technological development; this gives a lower bound on the actual pace of technological development, since there will almost certainly be some feedback loop"

Replies from: fin

↑ comment by fin · 2025-03-12T14:40:24.343Z · LW(p) · GW(p)

Yes.

↑ comment by wdmacaskill · 2025-03-12T09:00:42.455Z · LW(p) · GW(p)

Ah, by the "software feedback loop" I mean: "At the point of time at which AI has automated AI R&D, does a doubling of cognitive effort result in more than a doubling of output? If yes, there's a software feedback loop - you get (for a time, at least) accelerating rates of algorithmic efficiency progress, rather than just a one-off gain from automation."

I see now why you could understand "RSI" to mean "AI improves itself at all over time". But even so, the claim would still hold - even if (implausibly) AI gets no smarter than human-level, you'd still get accelerated tech development, because the quantity of AI research effort would increase at a growth rate much faster than the quantity of human research effort.

Preparing for the Intelligence Explosion

Contents

17 comments