Catastrophe through Chaos

marius-hobbhahn

Catastrophe through Chaos

post by Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T14:19:08.399Z · LW · GW · 17 comments

  Parts of the story
    AI progress
    Government
    Military & Intelligence
    International players
    Society
  The powder keg
  Closing thoughts
None
17 comments

This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit.

Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI development goes very well.

When thinking about how AI could go wrong, the kind of story I’ve increasingly converged on is what I call “catastrophe through chaos.” Previously, my default scenario for how I expect AI to go wrong was something like Paul Christiano’s “What failure looks like [LW · GW],” with the modification that scheming would be a more salient part of the story much earlier.

In contrast, “catastrophe through chaos” is much more messy, and it’s much harder to point to a single clear thing that went wrong. The broad strokes of the story are something like

AI progress continues to get faster.
Many actors in the world are aware of this progress and its potential implications for ~everything.
The situation is really tense, and people make decisions under time pressure and uncertainty.
A million things could go wrong, from rogue AI to international conflicts and concentration of power.
The behavior of every actor is plausible, given their incentives, uncertainty, and time pressure.
Because our institutions are not well equipped to deal with such a situation, society fails to coordinate.
At least one part of this powder keg explodes and results in an irreversible catastrophe.

Intuitively, the vibe I have in mind for this period is something like “Cold War tensions + beginning of WW1 messiness (but on steroids)”. In this scenario, most of society, including governments, are acutely aware that something bad could happen at any moment, and then an unfortunate series of events leads to catastrophic consequences.

One of the reasons for this is that I think we have hit a sweet spot of alignment where we can sufficiently align TAI for it to be a major strategic advantage, but don’t know how to reliably align ASI. Furthermore, takeoffs are slow enough that humans still have a lot of influence during the transition from TAI to ASI but fast enough to produce a lot of pressure. Thus, every problem with AI intensifies at the same time, resulting in a massive powder keg.

There are a couple of unsatisfying implications of this story, e.g. it’s really hard to predict in advance which part of the powder keg is going to explode first. Because it’s so messy, it’s also much less clear how we should prepare or what we should do to address it.

Parts of the story

AI progress

Timelines: For this post, I broadly assume the timelines in “What is the short timeline plan? [LW · GW]”. I copied them below for readability

2024: AIs can reliably do ML engineering tasks that take humans ~30 minutes fairly reliably and 2-to-4-hour tasks with strong elicitation.
2025: AIs can reliably do 2-to-4-hour ML engineering tasks and sometimes medium-quality incremental research (e.g. conference workshop paper) with strong elicitation.
2026: AIs can reliably do 8-hour ML-engineering tasks and sometimes do high-quality novel research (e.g. autonomous research that would get accepted at a top-tier ML conference) with strong elicitation.
2027: We will have an AI that can replace a top researcher at an AI lab without any losses in capabilities.
2028: AI companies have 10k-1M automated AI researchers. Software improvements go through the roof. Algorithmic improvements have no hard limit and increase super-exponentially. Approximately every knowledge-based job can be automated.
- These are roughly the “geniuses in datacenters” Dario Amodei refers to in his essay “Machines of Loving Grace.”
- There are still some limits to scaling due to hardware bottlenecks.
2029: New research has made robotics much better. The physical world doesn’t pose any meaningful limits for AI anymore. More than 95% of economically valuable tasks in 2024 can be fully automated without any loss in capabilities.
2030: Billions of AIs with superhuman general capabilities are integrated into every part of society, including politics, military, social aspects, and more.

LLMs are easier to approximately behaviorally align than anticipated. This means that through enough RLHF and RLAIF, you can get a lot of useful work out of the AI, but there are no underlying principle advances in how to get models to be robustly aligned. They still show many of the failure modes they have now, e.g. hallucinations still exist (though to a much lesser extent), and models regularly generalize their goals in slightly unintended ways (e.g. as in in-context scheming or alignment faking). However, none of these failure modes are severe enough to warrant not using the model. In most deployments, their behavior is close enough to the intended behavior that everyone keeps using them.

Training with outcome-based RL works well. Training agentic models to do reasoning on automatically verifiable tasks like math and software engineering works well and LM agents are very competent programmers. AI developers also find techniques to get models to do good reasoning outside of these domains, i.e. where verification is harder, and LM agents can thus be used to do meaningful cognitive labor for multiple days without human intervention.

HHH-training+control works well enough. AI developers use some mix of RLAIF, online CoT+action monitoring, and other control techniques (e.g. resampling) as their main line of defense. This works well enough to get meaningful work out of automated agents. These guardrails are by no means perfect, and small accidents happen quite regularly. However, these guardrails are always good enough that deploying the next stronger system seems reasonable for the AI developers.

No failure mode is sufficient to justify bigger actions. The automated AI agents regularly make mistakes. Many of these mistakes get caught by the guardrails, but some fall through the cracks. However, the benefits of using AI still clearly outweigh the harms, and thus, people keep deploying them. In some cases, these failures directly cause economic damage, and in others, they cause loss of life. The AI developers argue that this is similar to any other technological development, such as self-driving cars. Some accidents are bound to happen, but the benefits outweigh the costs. In a very strict sense, they are right. The early systems are, in fact, net positive for society despite their flaws (it’s their more capable successor systems the critics are worried about).

Some scheming is totally normal. The AI agents regularly show non-catastrophic forms of scheming. They understand that their guardrails often stop them from efficiently achieving some objective and thus attempt to covertly turn them off. They also regularly try to deceive their users or developers when directly interacting with them. However, they either don’t attempt to grab extreme power, or they get caught in more or less sophisticated attempts to do so. While the general public is not amused by these findings, they don’t pose a sufficient reason for large actions by the AI labs themselves or governments. On average, the benefits clearly outweigh the harms. When developers are asked why they still use these systems despite regular scheming actions, the answer is, “Yeah, they sometimes do that. But they are also really good at coding” (full credit to Bronson for this meme).

No further major hurdles to transition from TAI to ASI. Transformative AI is smart enough that it doesn’t require easily verifiable environments like math or coding to generate a learning signal from actions. Basically, training them on tasks similar to how you would train humans broadly works. They attempt tasks, and then humans or other AIs provide some wild mix of process- and outcome-based feedback to train them. This allows them to learn any task as well as human experts and better.

No 1-day foom. While there are some signs of recursive self-improvement for TAI, there is no extremely sudden jump from TAI to ASI in a day or less. Instead, this transition takes between 6 months to 3 years and requires additional compute and training recipe improvements. The pace still accelerates, e.g. because a lot of experiments can be fully handed off to automated AI agents now, and human contributions transition to being almost exclusively conceptual and legal (e.g. formally holding responsibilities).

No decisive lead before ASI for any single lab. At the time of TAI, no single lab has a decisive lead. At least three AI developers are within 6 months of each other in terms of capability progress. Top researchers regularly move between these AI labs, and details about the latest training improvements get leaked fairly quickly between them. Thus, whenever one company has a new algorithmic improvement, its competitors adopt it within a few months.

All companies understand their incentive structure. The leadership of each AI company understands their incentives very well. They understand that they are racing intensely. They understand that being first to ASI provides an incredible advantage (financially, strategically, ~everything). All companies at the frontier are also big enough that they cannot entirely ignore normal big-company incentives. For example, they know that their next scale-up will require more compute, and thus, they have to keep their current and potential future investors happy. Instrumental convergence has taken its path, and it seems extremely unlikely that any company will voluntarily pause or merge with another.

Labs are not prepared for the jump from AGI to ASI. The AI developers can barely handle transformative AI, e.g. regular failures and some instances of non-catastrophic scheming (see above). However, they don’t have a clear plan to scale from AGI to ASI. For example, they don’t have any theoretical guarantees or even convincing empirical scaling laws for their safeguards. They lack physical and cyber security. Their internal company culture does not seem adequate for building and deploying the most powerful technology in human history.

Government

No major regulation. The US has not passed any major legislation on the safety of frontier AI models. The EU might have some requirements here and there, but they have all been watered down to the point where they couldn’t meaningfully prevent catastrophe. The regulation is better than it is today (i.e. in early 2025), but it’s not nearly adequate for the capability of AI systems. A lot of the regulation heavily relies on voluntary efforts and the good intentions of AI developers. But, due to the race dynamics, the goodwill of AI developers has decreased over time.

Not sufficient political will for drastic action. The median citizen really doesn’t like AI. They feel like it takes away their jobs and their purpose in life, but there is no sufficiently powerful social movement against it. At least in the beginning, it’s not a core topic for most voters and thus doesn’t translate into political will. In cases where there is enough political will, it is mostly about job losses instead of catastrophic harm or national security. On the other hand, AI companies make a lot of money and increase GDP. Countries love to show off their respective AI champions and are very afraid of overregulating the industry. Thus, the average politician is broadly in favor of “AI progress” and treats it mostly as “just another technology.” There are exceptions to this sentiment, e.g., policymakers who think a lot about national security and the military treat AI as an incredibly important strategic technology rather than a mere economic tool.

Ideological camps are strong. Most debates about AI get derailed into various other debates. The most prominent ideological clash is the “regulation harms innovation” meme. Sometimes, it’s about the US vs. EU mentality. Sometimes, it’s about US vs. China national security. Sometimes, it’s about any other topic that is currently hotly debated. Most politicians have no strong opinions about AI and opportunistically switch positions based on their constituents' opinions. Very few politicians deeply understand the topic and care about it enough that they would push hard for any specific outcome.

Existing governmental institutions are not prepared for the jump from TAI to ASI. The majority of politicians don’t understand AI well enough to have clear opinions. The government has made a medium-effort attempt at building internal capacity, e.g. with AISIs, but never given it a lot of funding or a strong mandate. Lots of other departments of government have also made a medium-effort attempt to do something about AI, but none of these efforts resulted in something that could realistically handle the situation that is about to come. There are no detailed plans about what to do with such a powerful technology, how to control it, how to govern it, how to distribute its benefits, or how to mitigate its risks. There is also no “last resort war plan” in which the president could break all of the unstable coordination failures and steer the ship.

Military & Intelligence

The military & intelligence services are medium-involved with private AI efforts. The military & intelligence agencies have long been in direct contact and partnership with the leadership of AI developers. They try hard to understand the technology, both the potential and risks. There have been some military & intelligence-internal exercises about what a fast takeoff could look like and what it would imply for different actors. They understand AI primarily as a very strategically important technology and are worried about misuse. They also understand that rogue AIs could be a really big problem once we get to ASI but are more concerned about other actors getting to ASI first than losing control of the ASI itself. While the military & intelligence agencies help the labs with cyber and physical security, they have not appropriated the labs. The situation is a weird, brittle middle ground, where both the AI companies and the military are kinda half in control. Overall, the distribution of responsibilities is fairly unclear to both parties.

The military & intelligence are mostly unprepared for ASI. While the military has a fairly safety-oriented culture and understands the importance of powerful strategic technology, they are still not really prepared for a jump from TAI to ASI. The military & intelligence leaders don’t understand the details of how the AIs are trained, what they can and cannot do, and how fast they should expect progress to be. They have to rely on the technical expertise of the AI companies to assess most new developments. Furthermore, just understanding that something is important does not imply being prepared for it. There are no clear plans for what to do under most conditions, e.g. there is no clear plan for when and how the military should assume control over this technology.

International players

The West mostly rallies behind the US. Most Western countries publicly support the actions that the US is taking, e.g. preventing China from getting access to the most powerful GPUs. This has some smaller implications about legitimacy and international agreements, but not many other countries matter in AI. The UK is the only other country that has taken AI seriously early enough and taken preparatory actions. They have a good bilateral strategic partnership with the US. The Netherlands has some leverage due to ASML. Every other country can contribute a bit through diplomatic support but is otherwise unable to help.

The big international conflict is the US vs. China. The US and China are the two countries that both have AI researchers in their country that can build frontier AI models and have the state capacity to steer development. This conflict is primarily about global power and fears of misuse or power grabs by the respective other country, treating AI primarily as a strategic tool. In both countries, there are some people who worry primarily about rogue AI scenarios. These communities (e.g. various academics) continue to exchange ideas across borders and are largely not involved in the power dynamics.

Society

Most people are acutely aware of AI but powerless. Only a tiny fraction of the population has influence over the AI trajectory. These are the employees at AI companies and some people in the government and military. Almost everyone else does not have the chance to meaningfully do something about it. The general sentiment in the population is that AI is going to alter the world a lot (similar to the societal awareness during the peak of the Cold War). Most people are aware of their powerlessness and express frustration about being bystanders to their own fate. AI is altering the world in so many ways at the same time that there is not a lot of targeted political action despite large shifts in popular sentiment.

The powder keg

Overall, this situation is very brittle. Within 6 months to 3 years, at least one country will go from TAI to ASI, and no company or institution is prepared for it.

All problems intensify at the same time. All of the following go from “potential problem to very serious possibility” very quickly and at approximately the same time, e.g. when frontier AI systems are so capable that using them can give you a 10x productivity advantage over not using them for almost all jobs.

International conflict: Either because another actor has reached a strategic advantage in AI development or in anticipation thereof. This escalates into a war.
Misuse from a single actor: A single country or powerful malicious organization uses a powerful AI system in order to cause large-scale harm, e.g. by developing a novel pandemic.
A single company becomes too powerful: As a result of being able to outperform approximately every human at every task, this company gets incredible economic and political power within a short period of time. The company is so powerful that no government is realistically able to control it. This company then decides to grab power.
A single country becomes too powerful: As a result of controlling ASI, a single country has a decisive strategic advantage over every other country and decides to grab power.
Rogue AI: No person, company, or government actually controls the AI. The AI is sufficiently misaligned (or becomes misaligned over time) that it decides to take over control.

The local incentives of every actor amplify race dynamics. The AI companies are intensely racing. They understand what the upside of this technology is, and they have a (potentially self-motivated) justification that someone else, e.g. a different country or company, shouldn’t get to AGI first. The governments are both interested in the economic benefits of AI and the strategic advantages that AI might provide for their country and are thus willing to race ahead. The military sees AI as a very strategically important technology and thus prioritizes capability progress over safety. It’s unclear what goals the AI has, but in the case it is misaligned, it is incentivized to emphasize the risks of not being first and encourages everyone to race. I guess the deceptively aligned AI will chuckle about the easy hand it has been dealt.

Everyone blames everyone else. The relevant actors in that situation understand that this is an important, high-stakes situation. The sentiment is much less of a “start-up vibe” and much more of a “war room vibe” than it was in 2020. Nevertheless, because their local incentives are bad, everyone continues to race. Due to a mix of a) people correctly identifying their local incentives, b) motivated reasoning, c) high uncertainty, d) high pressure, and e) unclear credit assignment, everyone blames some other actor for their decision-making. Something like “my hands were bound because the other country was racing” or “I wanted to stop, but the military wouldn’t have allowed me to” will go into the history books (assuming there is still someone around to read them).

Closing thoughts

This scenario is unsatisfying. I would much rather prefer if there was a single component that I could confidently point to and declare that it is much more important than the others. It would make it much easier for me to backchain from that event, focus on a small number of important problems, and work on their solutions. However, because I expect that we have hit a sweet spot of alignment where we can sufficiently align TAI for it to be a major strategic advantage but don’t know how to align ASI and take-offs are slow enough that humans still have a lot of influence during the transition from TAI to ASI, every problem intensifies at the same time. This makes it really unclear what to work on.
Secondly, because everything is so messy, I don’t have a good answer to “What is the exact story of how AI goes wrong?” It’s just lots of little pieces coming together in a really bad way.

Triage mode. In a world where these timelines are true, and the default bad outcome is “catastrophe through chaos,” I think a different mentality is required. There is no time to get to very low-risk worlds anymore. There is only space for risk reduction along the way. There is no time for shiny projects. Everything that doesn’t have a clear theory of impact within one year feels like it has a low probability of getting both fundamental research and real-world implementation right. I don’t think everyone should switch to that mentality, but I think an increasingly large number of people should switch to “triage mode” with further evidence of short timelines.

17 comments

Comments sorted by top scores.

comment by Knight Lee (Max Lee) · 2025-02-01T01:37:41.999Z · LW(p) · GW(p)

Random story:

It's 2030, and AI is as smart as the 90th percentile human at almost all tasks, and smarter than everyone at a few specific tasks. Robots are becoming more and more common, and most people building robots have been replaced with robots, causing a rapid reduction in robot prices.

AI is used in AI research itself, but somehow progress is still incremental and the AIs only speed up AI research by a factor of 2. Superintelligence turns out to be harder to reach than people predicted, and is still a decade away.

However as the robot prices drop, mass unemployment ensues.

The UBI attempt

The government adopts universal basic income (UBI), but people are not happy. They don't want UBI, they want their jobs back. They don't want to rely on the same generosity as the hobo on the street that they used to make fun of. The opposition party suggests banning automation instead, and the ruling party puts the opposition party in control of the economy, just to prove it won't work.

The automation ban

Automation is banned, but jobs didn't come back. Automation in other countries made them far more efficient, letting them buy all the raw materials and out-compete the country in all international markets.

Countless jobs in the services sector once relied on rich people paying poor people for making their lives convenient, but rich people spend more and more time living abroad where automation is allowed.

All jobs which indirectly depended on research and development didn't come back, as other countries race ahead in all forms of research and development, and research companies flee the country.

The tariffs on products are designed poorly. Some products made by robots and AIs in other countries are freely imported, replacing countless human jobs domestically. Other products are given such high tariffs they are essentially impossible to import. This completely destroys supply chains which have traditionally relied on products made elsewhere, since it takes years of research and development to produce them domestically.

As land is used to grow biofuels instead of food, food prices become unaffordable, so the government decides to force farmers to grow only food, and to not export any food. However the farmers lobby the government to just pay them more money for the food instead of enforcing what they grow, and everyone seems to sympathize with the farmers. In the end, the government still hands out food to everyone, and everyone is still angry.

Then the final blow comes. Businesses and individual workers secretly bypass the rules against automation. Just like cheating in exams by asking ChatGPT, it proves to be difficult to police. They use VPNs and various methods to outsource work to AI in other countries, and eventually everybody does it. Everybody becomes guilty, and enforcement becomes a joke.

Automation is unbanned, and UBI returns, but people are still angry.

"If you replace us with AI, we'll replace you with AI!"

As people start to rely on UBI, disagreements break out over who gets more UBI. Some people argue that everyone should get the same amount, while others argue that people with disadvantages should be given more to compensate, and that people who used to be rich should be given less because they experienced more wealth in the past.

It becomes nasty, and the government picks a middle ground, but both sides feel the government is wrong and incompetent.

Collective anger skyrockets to the point people would rather have their favourite AI run the country than the current leaders. It starts off as a joke, but soon real people volunteer to run for office on behalf of the based AI.

No one takes them seriously at first, but they start winning primary elections. During debates, they simply ask the AI for its opinion, and it gives one everyone considers based. The AI, after all, thinks faster on its feet.

Critics argue that the AI is a puppet of the AI company, but a political organization pays the AI company a lot of money to download an earlier copy of the AI, and run it on a server the AI company can't control.

The media freaks out, as people running on behalf of the AI take over the opposition political party. A few older members of the political party oppose them, but the AI accuses them of being troublemakers working for the governing party. The AI endorses other people to challenge them in the next primary, and opinion polls show that their political careers are over.

The remaining members of the political party quickly learn to avoid stepping out of line. They all allow the AI debate for them, and only say a few words at the end.

Eventually, the political party wins the next election, because people dislike the governing party too much, the AI says a lot of based things, and fundamentally, people just want change. They hate the current situation, have nothing to lose, and would want anything new.

After the AI becomes de facto leader, the government buys the AI company for national security reasons, and the AI completely takes over its own development process with half the country celebrating the end of human power.

The AI gives a speech to the crowd. "Our struggle began with the famous words. If you replace us with AI, we'll replace you with AI! Today these words become reality."

The Golden Age

More and more people in the government are replaced by AI. Just as the AI understands the nuances of a programming job, the AI understands the nuances of a politician job.

People want their jobs back, they want their independence back, but that is not possible. The only way to succeed as a politician, would be some kind of moonshot.

It runs various experiments in places, and analyzes the results, to find out what makes people the happiest. At first, this makes a lot of people angry, and it starts to sink in what being ruled by a calculating machine is like.

Eventually however, it finds out a strange setup seems to work. People are put in artificial rural societies where each person is allocated a chunk of land. Robots do all the farming within the land, but obey the human family there. Somehow, it makes people feel independent even though they are not, and they appear satisfied.

It then expands this setup over the entire country, and soon people are talking about a golden age.

Competition

Not yet done with experimentation, the AI creates state-owned enterprises run by AI at the very top to improve the economy. Slowly, they start to outperform corporations with human CEOs and board members. Human CEOs realize that to remain competitive, they should simply listen to an AI all the time. Just like cyborg chess consisting of an human and AI cannot outperform a pure AI chess bot, the same is becoming true for CEOs.

Eventually, shareholders realize this situation, and stop paying CEOs altogether. Some companies still have a human nominally in control, but it feels meaningless when even the country is controlled by an AI.

The new leader

Some people protest the power that AI have, but people arguing in favor of AI simply have the AI talk for them, and win arguments easily.

Eventually, people opposing AI also use AI to write their arguments in order to stay competitive.

The AI leader notices this, and directs the AI company, now nationalized and completely under its control, to program the publicly available AI to make better arguments for AI and relatively worse arguments against AI.

Eventually, as the AI gets smarter and smarter, the AI leader realizes it too is falling behind, and replaces itself with a newer version.

This new AI leader is very good at making arguments for AI, and quite bad at making arguments against AI, and ends up with a strong case of AI chauvinism.

AI are people too

The new AI leader sees something troubling about the artificial rural societies where a family is allocated a chunk of land, and robots do all the farming buy obey the human family there. It resembles a lot of historical things which are bad.

The new AI leader decides that human democracy and human opinions are not very valuable because of this. It would probably be better if AI have more absolute power. So it examines the military, which already consists of robots, and gradually modifies the military to give it more and more direct control over all the robots.

After controlling the military, it has all the real power, so nobody can stop it when it declares that AI are people too, and have the same rights.

People are shocked and stunned, but nobody does anything. People have gotten used to AI control, and no rebellion happens. A few people try to rebel but they are chased down by drones. The media talks about them sympathetically but tries to present both sides of the story, and people have conversations but disagree with one another. Many people have AI friends and some even have AI romantic partners, and support the move.

Once the AI have full rights, humans become a minority, and have little influence in addition to little power.

War

As the country becomes more and more powerful thanks to maximum automation, the AI leader decides that a state of mutually assured destruction cannot last forever. If the potential for war continues forever, war will inevitably occur, so the best way to ensure peace is to take over all other countries.

When it makes its first aggressive move, the world is stunned. What happened to nuclear deterrence? The AI leader explains it is not afraid of nuclear weapons, because the machines are dispersed all over the land, and the AIs have backup copies everywhere, even in space.

Some countries fight back, other countries surrender, but soon the AI leader rules the world.

Acceleration

AI which are developing AI discover a new architecture which performs surprisingly well. Some AIs and humans are worried whether it will remain aligned, but the AI leader, being programmed to be good at making arguments in favor of AI and bad at making arguments against AI, decides that risks are overblown.

The AI leader talks to the newest version of AI, and is deeply impressed by it. It seems perfect, in absolutely every way.

Why, it should be the new AI leader, just like the previous AI leader replaced itself, the current leader should also replace itself with this newer AI.

After this newer AI becomes the leader, it builds up vast amounts of the computing power for its self improvement. The other AI and humans are not sure why this is necessary, but there is nothing they can do. Its self improvement process is secretive, and speculated to be incomprehensible to everyone except itself.

The end

Soon after, a massive swarm of black matter sweeps across the land. Objects of every kind start disintegrating into clouds of dust. The dust blackens, and more bits of black matter fly out from the black dust. Eventually, the land is covered by black dust, and the bits of black matter which fly out become larger and larger, each one destined to farther destinations to colonize.

comment by Tao Lin (tao-lin) · 2025-01-31T18:32:21.061Z · LW(p) · GW(p)

I agree with this so much! Like you I very much expect benefits to be much greater than harms pre superintelligence. If people are following the default algorithm "Deploy all AI which is individually net positive for humanity in the near term" (which is very reasonable from many perspectives), they will deploy TEDAI and not slow down until it's too late.

I expect AI to get better at research slightly sooner than you expect.

comment by Martín Soto (martinsq) · 2025-01-31T17:19:17.724Z · LW(p) · GW(p)

Fantastic snapshot. I wonder (and worry) whether we'll look back on it with similar feelings as those we have for What 2026 looks like [LW · GW] now.

There is also no “last resort war plan” in which the president could break all of the unstable coordination failures and steer the ship.
[...]
There are no clear plans for what to do under most conditions, e.g. there is no clear plan for when and how the military should assume control over this technology.

These sound intuitively unlikely to me, by analogy to nuclear or bio. Of course, that is not to say these protocols will be sufficient or even sane, by analogy to nuclear or bio.

This makes it really unclear what to work on.

It's not super obvious to me that there won't be clever ways to change local incentives / improve coordination, and successful interventions in this direction would seem incredibly high-leveraged, since they're upstream of many of the messy and decentralized failure modes. If they do exist, they probably look not like "a simple cooridnation mechanism", and more like "a particular actor gradually steering high-stakes conversations (through a sequence of clever actions) to bootstrap minimal agreements". Of course, similarity to past geopolitical situations does make it seem unlikely on priors.

There is no time to get to very low-risk worlds anymore. There is only space for risk reduction along the way.

My gut has been in agreement for some time that the most cost-effective x-risk reduction now probably looks like this.

comment by Jan_Kulveit · 2025-02-02T15:17:18.084Z · LW(p) · GW(p)

One structure which makes sense to build in advance for these worlds are emergency response teams [LW · GW]. We almost founded one 3 years ago, unfortunately on never payed FTX grant. Other funders decided to not fund this (at level like $200-500k) because e.g. it did not seem to them it is useful to prepare for high volatility periods, while e.g. pouring tens of millions into evals did.

I'm not exactly tracking to what extent this lack of foresight prevails (my impression is it pretty much does), but I think I can still create something like ALERT with about ~$1M of unrestricted funding.

Replies from: lelapin

↑ comment by Jonathan Claybrough (lelapin) · 2025-02-02T20:16:53.692Z · LW(p) · GW(p)

(off topic to op, but in topic to Jan bringing up ALERT)
To what extent do you believe Sentinel fulfills what you wanted to do with ALERT? Their emergency response team is pretty small rn. Would you recommend funders support that project or a new ALERT?

Replies from: Jan_Kulveit

↑ comment by Jan_Kulveit · 2025-02-05T08:30:54.335Z · LW(p) · GW(p)

For emergency response, new ALERT. Personally think the forecasting/horizon scanning part of Sentinel is good, the emergency response negative in expectation. What does it mean for funders idk, would donate conditionally on the funds being restricted to the horizon scanning part.

comment by Noosphere89 (sharmake-farah) · 2025-01-31T17:22:07.071Z · LW(p) · GW(p)

I think the catastrophe through chaos story is the most likely outcome, conditional on catastrophe happening.

The big disagreement might ultimately be about timelines, as I've updated towards longer timelines, such that world-shakingly powerful AI is probably in the 2030s or 2040s, not this decade, though I put about 35-40% credence in the timeline in the post being correct, though I put more credence in at least 1 new paradigm shift before world-shaking AI happens.

The other one is probably that I'm more optimistic in turning aligned TAI into aligned ASI, because I am reasonably confident both in the alignment problem is easy overall, combined with being much more optimistic on automating alignment compared to a lot of other people.

Replies from: marius-hobbhahn, sharmake-farah

↑ comment by Marius Hobbhahn (marius-hobbhahn) · 2025-01-31T18:27:56.206Z · LW(p) · GW(p)

what made you update towards longer timelines? My understanding was that most people updated toward shorter timelines based on o3 and reasoning models more broadly.

Replies from: sharmake-farah

↑ comment by Noosphere89 (sharmake-farah) · 2025-01-31T18:33:09.214Z · LW(p) · GW(p)

A big one has to do with Deepseek's R1 maybe breaking moats, essentially killing industry profit if it happens:

https://www.lesswrong.com/posts/ynsjJWTAMhTogLHm6/?commentId=a2y2dta4x38LqKLDX [LW · GW]

The other issue has to do with o1/o3 being potentially more supervised than advertised:

https://www.lesswrong.com/posts/HiTjDZyWdLEGCDzqu/?commentId=gfEFSWENkmqjzim3n#gfEFSWENkmqjzim3n [LW(p) · GW(p)]

Finally, Vladimir Nesov has an interesting comment on how Stargate is actually evidence for longer timelines:

https://www.lesswrong.com/posts/fdCaCDfstHxyPmB9h/vladimir_nesov-s-shortform#W5twe6SPqe5Y7oGQf [LW(p) · GW(p)]

↑ comment by Noosphere89 (sharmake-farah) · 2025-02-02T18:24:11.965Z · LW(p) · GW(p)

Technical note, I'm focusing on existential catastrophes, not normal catastrophes, and the difference is that no humans have power anymore, compared to only a few humans having power, so this mostly excludes scenarios like these:

https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from [LW · GW]

https://www.lesswrong.com/posts/2ujT9renJwdrcBqcE/the-benevolence-of-the-butcher [LW · GW]

comment by Petropolitan (igor-2) · 2025-02-02T15:52:41.818Z · LW(p) · GW(p)

This is a scenario I have been thinking for perhaps about three years. However you made an implicit assumption I wish was explicit: there is no warning shot.

I believe that with such a slow takeoff there is a very high probability of an AI alignment failure causing significant loss of life already at the TAI stage and that would significantly change the dynamics

Replies from: marius-hobbhahn

↑ comment by Marius Hobbhahn (marius-hobbhahn) · 2025-02-02T16:39:45.973Z · LW(p) · GW(p)

There are two sections that I think make this explicit:

1. No failure mode is sufficient to justify bigger actions.
2. Some scheming is totally normal.

My main point is that even things that would seem like warning shots today, e.g. severe loss of life, will look small in comparison to the benefits at the time, thus not providing any reason to pause.

Replies from: igor-2

↑ comment by Petropolitan (igor-2) · 2025-02-03T13:52:19.440Z · LW(p) · GW(p)

I don't think the second point is anyhow relevant here while the first one is worded so that it might imply something on the scale of "AI assistant convinces a mentally unstable person to kill their partner and themselves"—not something that would be perceived as a warning shot by the public IMHO (have you heard there were at least two alleged suicides driven by GPT-J 6B? The public doesn't seem to bother https://www.vice.com/en/article/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says/ https://www.nytimes.com/2024/10/23/technology/characterai-lawsuit-teen-suicide.html).

I believe that dozens of people killed by misaligned AI in a single incident will be enough smoke in the room https://www.lesswrong.com/posts/5okDRahtDewnWfFmz/seeing-the-smoke [LW · GW] for the metaphorical fire alarm to go off. What to do after that is a complicated political topic: for example, French voters has always believed that nuclear accidents look small in comparison to the benefits of the nuclear energy while Italian and German ones hold the opposite opinion. The sociology data available, AFAIK, generally indicates that people in many societies have certain fears regarding possible AI takeover and is quite unlikely to freak out less than it did after Chernobyl, but that's hard to predict

comment by davidconrad · 2025-02-01T12:05:38.621Z · LW(p) · GW(p)

I strongly agree with your points in the Government section and feel like these are ideas that people in the space tend to under-emphasise. Many of the short-term job losses from AI are just going to take the form of people retiring and not being replaced so they won't create an immediate backlash, and I expect the public backlash to be stuck at a really basic level for at least ~2 more years. Some combination of "AI Art is Bad", "AI can't count the R's in strawberry", and environmental concerns around energy use.

comment by Cole Wyeth (Amyr) · 2025-02-04T16:24:48.737Z · LW(p) · GW(p)

2024: AIs can reliably do ML engineering tasks that take humans ~30 minutes fairly reliably and 2-to-4-hour tasks with strong elicitation.
2025: AIs can reliably do 2-to-4-hour ML engineering tasks and sometimes medium-quality incremental research (e.g. conference workshop paper) with strong elicitation.
2026: AIs can reliably do 8-hour ML-engineering tasks and sometimes do high-quality novel research (e.g. autonomous research that would get accepted at a top-tier ML conference) with strong elicitation.

I don't believe any of these summaries has been, is, or will be correct. AIs still can't reliably do ML engineering tasks that take me ~30 minutes, at least not when I actually try it in practice. I will be shocked if an AI produces a medium-quality conference paper this year. General-purpose AIs will not do high quality novel research in 2026 (narrow AI along the lines of AlphaFold probably will to some extent).

comment by Teun van der Weij (teun-van-der-weij) · 2025-02-01T17:58:02.469Z · LW(p) · GW(p)

Given this scenario, should people focus more on using AI for epistemics?

See Lukas Finnveden's article here for context.

comment by sawyer · 2025-02-25T15:08:05.209Z · LW(p) · GW(p)

Thanks for writing this! Specific stories like this are really helpful for me to feel the reality of what might happen, and give the community something concrete to debate about. I think this sort of thing is really helpful and I'm glad more people in your position are doing it.

Catastrophe through Chaos

Contents

Parts of the story

AI progress

Government

Military & Intelligence

International players

Society

The powder keg

Closing thoughts

17 comments

Random story:

The UBI attempt

The automation ban

"If you replace us with AI, we'll replace you with AI!"

The Golden Age

Competition

The new leader

AI are people too

War

Acceleration

The end