Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development
post by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan (deger-turan-1), David Scott Krueger (formerly: capybaralet) (capybaralet), David Duvenaud (david-duvenaud) · 2025-01-30T17:03:45.545Z · LW · GW · 4 commentsThis is a link post for https://gradual-disempowerment.ai/
Contents
4 comments
Full version on arXiv | X
Executive summary
AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship.
A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and cultures. Once this human participation gets displaced by more competitive machine alternatives, our institutions' incentives for growth will be untethered from a need to ensure human flourishing. Decision-makers at all levels will soon face pressures to reduce human involvement across labor markets, governance structures, cultural production, and even social interactions. Those who resist these pressures will eventually be displaced by those who do not.
Still, wouldn't humans notice what's happening and coordinate to stop it? Not necessarily. What makes this transition particularly hard to resist is that pressures on each societal system bleed into the others. For example, we might attempt to use state power and cultural attitudes to preserve human economic power. However, the economic incentives for companies to replace humans with AI will also push them to influence states and culture to support this change, using their growing economic power to shape both policy and public opinion, which will in turn allow those companies to accrue even greater economic power.
Once AI has begun to displace humans, existing feedback mechanisms that encourage human influence and flourishing will begin to break down. For example, states funded mainly by taxes on AI profits instead of their citizens' labor will have little incentive to ensure citizens' representation. This could occur at the same time as AI provides states with unprecedented influence over human culture and behavior, which might make coordination amongst humans more difficult, thereby further reducing humans' ability to resist such pressures. We describe these and other mechanisms and feedback loops in more detail in this work.
Though we provide some proposals for slowing or averting this process, and survey related discussions, we emphasize that no one has a concrete plausible plan for stopping gradual human disempowerment and methods of aligning individual AI systems with their designers' intentions are not sufficient. Because this disempowerment would be global and permanent, and because human flourishing requires substantial resources in global terms, it could plausibly lead to human extinction or similar outcomes.
4 comments
Comments sorted by top scores.
comment by ryan_greenblatt · 2025-01-30T23:32:20.867Z · LW(p) · GW(p)
I (remain) skeptical that the sort of failure mode described here is plausible if we solve the problem of aligning individual AI systems with their designers' intentions without this alignment requiring any substantial additional costs (that is, we solve single-single alignment with minimal alignment tax).
This has previously been argued by Vanessa here [LW(p) · GW(p)] and Paul here [LW(p) · GW(p)] in response to a post making a similar claim.
I do worry about human power grabs: some humans obtaining greatly more power as enabled by AI (even if we have no serious alignment issues). However, I don't think this matches the story you describe and the mitigations seem substantially different than what you seem to be imagining.
I'm also somewhat skeptical of the threat model you describe in the case where alignment isn't solved. I think the difference between the story you tell and something more like We get what we measure [AF · GW] is important.
I worry I'm misunderstanding something because I haven't read the paper in detail.
comment by ryan_greenblatt · 2025-01-30T23:42:48.704Z · LW(p) · GW(p)
The paper says:
Christiano (2019) makes the case that sudden disempowerment is unlikely,
This isn't accurate. The post What failure looks like [LW · GW] includes a scenario involving sudden disempowerment [LW · GW]!
The post does say:
The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
I think this is probably not what failure will look like,
But, I think it is mostly arguing against threat models involving fast AI capability takeoff (where the level of capabilities take its creators and others by suprise and fast capabilities progress allows for AIs to suddenly become poweful enough to takeover) rather than threat models involving sudden disempowerment from a point where AIs are already well known to be extremely powerful.
comment by Davidmanheim · 2025-01-30T21:26:15.491Z · LW(p) · GW(p)
I think this is correct, but doesn't seem to note the broader trend towards human disempowerment in favor of bureaucratic and corporate systems, which this gradual disempowerment would continue, and hence elides or ignores why AI risk is distinct.
comment by Martin Fell (martin-fell) · 2025-01-30T19:53:45.303Z · LW(p) · GW(p)
In my opinion this kind of scenario is very plausible and deserves a lot more attention than it seems to get.