How can humanity survive a multipolar AGI scenario?

post by Leonard Holloway (literally-best) · 2025-01-09T20:17:40.143Z · LW · GW · No comments

This is a question post.

Contents

  Short introduction
  A few ways we might arrive at a multipolar AGI scenario
  Potential catastrophes that can be caused by multiple actors having access to AGI
    1) Everyone dies directly
    2) Everyone dies indirectly
    3) Totalitarian dictatorship
  What are our best plans for preventing catastrophes like those outlined above, in a multipolar AGI scenario?
None
  Answers
    5 Nathan Helm-Burger
    2 quetzal_rainbow
    -3 sweenesm
None
No comments

Short introduction

Multipolar scenarios that I will be talking about are scenarios multiple unrelated actors have access to their own personal AGIs. For the sake of discussion, assume that we solved alignment and that AGIs will follow the orders of its owners.

A few ways we might arrive at a multipolar AGI scenario

Potential catastrophes that can be caused by multiple actors having access to AGI

1) Everyone dies directly

2) Everyone dies indirectly

3) Totalitarian dictatorship

What are our best plans for preventing catastrophes like those outlined above, in a multipolar AGI scenario?

Answers

answer by Nathan Helm-Burger · 2025-01-09T21:02:30.735Z · LW(p) · GW(p)

My current best guess: Subsidiarity

I've been thinking along these lines for the past few years, but I feel like my thinking was clarified and boosted by Allison's recent series: Gaming the Future [LW · GW]

The gist of the idea is to create clever systems of decentralized control and voluntary interaction which can still manage to coordinate on difficult risky tasks (such as enforcing defensive laws against weapons of mass destruction). Such systems could shift humanity out of the Pareto suboptimal lose-lose traps and races we are stuck in. Win-win solutions to our biggest current problems seem possible, and coordination seems like the biggest blocker.

I am hopeful that one of the things we can do with just-before-the-brink AI will be to accelerate the design and deployment of such voluntary coordination contracts.

answer by quetzal_rainbow · 2025-01-10T05:09:33.017Z · LW(p) · GW(p)

I think a lot of thinking around multipolar scenarios suffers from heuristic "solution in the shape of the problem", i.e. "multipolar scenario is when we have kinda aligned AI, but still die due to coordination failures, therefore, solution for multipolar scenarios should be about coordination".

I think the correct solution is to leverage available superintelligence in nice unilateral way:

  1. D/acc - use superintelligence to put as much defence as you can, starting from formal software verification and ending in spreading biodefence nanotech;
  2. Running away - if you set up Moon/Mars/Jovian colony of nanotech-upgraded humans/uploads and pour available resources into defence, even if Earth explodes, humanity as a species survives. 
answer by sweenesm · 2025-01-09T21:18:05.889Z · LW(p) · GW(p)

Here's what Claude 3.5 Sonnet said in response to this prompt:

Me:

What are the best plans for preventing catastrophes such as everyone dying or a totalitarian government taking over in multipolar AGI scenarios, i.e., ones in which AGI's are aligned to what their human users want to do, and at least one human user order them to do bad things?

Claude:

Let me share some thoughts on AGI governance and safety in multipolar scenarios, while noting this is a complex topic that reasonable people disagree on:

A key challenge in multipolar scenarios is that even if individual AI systems are aligned with their users' values and intentions, some users may have harmful goals. Some potential approaches to address this include:

  1. International coordination and governance frameworks that place limits on allowable AI system capabilities and uses, similar to arms control treaties. However, enforcement would be challenging.
  2. Technical measures like capability limitations built into AI systems' training that prevent them from being used for certain catastrophic actions, regardless of user intent. The challenge is defining and implementing such limitations without preventing beneficial uses.
  3. Focused work on improving human values and cooperation through better institutions, education, and mechanisms for resolving conflicts peacefully. This could reduce the likelihood of actors wanting to cause catastrophic harm.
  4. Developing AI systems with some form of common good or humanitarian principles that apply regardless of user instructions, though this involves difficult philosophical and technical questions about how to define and implement such principles.

What are your thoughts on which approaches seem most promising or practical? I'm particularly curious about your perspective on the balance between technical and social/institutional solutions.

___________________________________

I would add to this Defensive acceleration (d/acc) described by Vitalik Buterin (https://vitalik.eth.limo/general/2025/01/05/dacc2.html and https://vitalik.eth.limo/general/2023/11/27/techno_optimism.html).

comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-10T04:15:04.358Z · LW(p) · GW(p)

I too like talking things through with Claude, but I don't recommend taking Claude's initial suggestions at face value.

Try following up with a question like:

"Yes, those all sound nice, but do they comprehensively patch all the security holes? What if someone really evil fine-tuned a model to be evil or simply obedient, and then used it as a tool for making weapons of mass destruction?

Education to improve human values seems unlikely to have a 100% success rate. Some people will still do bad things, especially in the very near future.

Fine-tuning the AI will overcome the ethical principles of the AI, add in necessary technical information about weapon design, and overcome any capability limitations we currently know how to instill (or at least fail to be retroactive for pre-existing open-weights models).

If someone is determined to cause great harm through terrorist actions, it is unlikely that a patchy enforcement system could notice and stop them anywhere in the world. If the model is sufficiently powerful that it makes massive terrorist actions very easy, then even a small failure rate of enforcement would result in catastrophe."

No comments

Comments sorted by top scores.