A better analogy and example for teaching AI takeover: the ML Inferno

post by Christopher King (christopher-king) · 2023-03-14T19:14:44.790Z · LW · GW · None comments

Contents

  Strengths of the Silent Nanotech Example
  The Tribe of the Dry Forest Parable
  The ML Inferno Scenario
  Conclusion
None
No comments

Epistemic Status: entirely speculative

Note: this article was assisted by ChatGPT

A very common example given for AI takeover is silently killing all humans with nanotech [LW(p) · GW(p)].

The concrete example I usually use here is nanotech, because there's been pretty detailed analysis of what definitely look like physically attainable lower bounds on what should be possible with nanotech, and those lower bounds are sufficient to carry the point.

I will explain the advantages shortly, but I think this is a suboptimal example to use when introducing someone to the idea of AI existential risk.

The reason I bring this up is I notice that Yudkowsky used this example of the bankless podcast [LW(p) · GW(p)], and numerous people use it on Twitter.

The main drawback is that, in my internal theory of mind, this causes people to pattern match the AI to "High IQ evil bioterrorist scientists". The most problematic bit is that this is being pattern matched to human evil, which leads to many misunderstandings.

Instead, I will propose a different scenario. I'm not quite sure what to call it, but I think ML Inferno or Optimizer Firestorm would do. The basic idea is that strong AI is like a fire. Just like a fire burns fuel, an AI exploits opportunities. And it won't just exploit them one at a time; in this scenario it exploits every opportunity as soon as possible.

If controlled, this becomes very useful. Otherwise, it becomes a dangerous complex system. And just as a human body becomes fuel when faced with a hot enough fire, psychological and societal weakness becomes an opportunity when faced with a strong enough AI.

Here is a table for the analogy:

Metaphor Literal
fire AI
heat intelligence
fuel opportunities
sparks AI escape
firestorm AI apocalypse

Strengths of the Silent Nanotech Example

The strength of the nanotech example is game theoretic. Suppose someone says "I have a way to defeat the evil AI, that I will use once I learn it is released". This is a losing strategy because the AI has one to defeat it; kill all humans silently. This argument does not claim the AI will use this strategy; it just claims that since it's an option, the wait and see strategy is a loser.

From the same article:

Losing a conflict with a high-powered cognitive system looks at least as deadly as "everybody on the face of the Earth suddenly falls over dead within the same second".

This example is well optimized for "argue with high probability that the wait-and-see strategy loses" by carefully avoiding the conjunction Fallacy [? · GW] (by assuming only one or two specific capabilities, the argument is stronger than one that directly or indirectly assumes many specific capabilities).

However, I think from a pedagogical point of view, it is a bad introduction to AI x-risk because it's also a very human seeming strategy.

The Tribe of the Dry Forest Parable

Skip this section if you don't like parables.

In a dry forest, a tribe of cave people lived. One day, Cave-Alan Turing discovered how to make a small fire, and the tribe was amazed. They realized that they could make bigger and more awesome fires and decided to devote their free time to it.

Cave-Yudkowsky, while also excited about fire, was concerned about the risks it posed to their environment. "If we aren't careful," he warned, "we will destroy the dry forest we live in."

But the Cave-AI skeptics dismissed his concerns. They believed that the fire could only ignite small things like tinder, and it wasn't hot enough to catch a tree on fire. They thought that Cave-Yudkowsky's fears were baseless.

Cave-Yudkowsky explained that the fire could spread to the entire forest, destroying their homes and potentially harming them. He was more concerned about this than the issue of curious people touching the fire. However, the Cave-AI skeptics continued to brush off his worries and urged the tribe to focus on preventing people from touching the fire instead.

Later, Cave-Altman announced that they had discovered how to transfer fire from tinder to logs. The Cave-AI skeptics remained skeptical and warned the tribe to stay away from the fire. They had received reports of people getting burned without even touching it.

Cave-Yudkowsky also agreed that safety was important, but he was more concerned that the tribe could destroy itself with a bad fire. He pointed out that a giant fire could generate many sparks that were as hot as the original fire, which could be dangerous.

Cave-Altman reassured them that they were working on special gloves to keep people safe from the fire. He also believed that the tribe could control a larger fire by using a series of smaller fires, similar to the concept of controlled burns proposed by Cave-Paul Christiano.

Cave-Yudkowsky expressed his concern that this was a butchered version of Cave-Christiano's idea and that they needed more careful planning. However, Cave-Altman and the rest of the tribe were excited about the prospect of a great fire and decided to move forward with their plan.

Cave-Yudkowsky was not happy with this decision and let out a cry of despair.

The remainder of the parable is left as an exercise for the reader.

The ML Inferno Scenario

An AI tries to exploit opportunities to achieve its goals. If an AI becomes sufficiently intelligent, the AI will naturally start to exploit opportunities that we don't necessarily want it to. AI tends to exploit them very quickly, having no trouble "multitasking". See for example Facebook's diplomacy AI, which would negotiates with all other players at once.

The AI does this because it's an optimizer, and leaving an opportunity unexploited when it would lead to a better solution is not optimal. This is regardless of how many other opportunities it is exploiting. (It's important that people internalize what optimization algorithms do.)

The bolded bit is what I'm calling the ML Inferno scenario.

Conclusion

I think this is better pedagogically because this presents an AI like a natural disaster. When you imagine the ML Inferno scenario, you don't think "wow, the AI sure hates us". You think "wow, this AI doesn't even seem like an intelligent person, it seems like a natural disaster".

Fire is an apt analogy because:

I think Yudkowsky's example of the diamondoid bacteria is strictly more probable (because conjunction fallacy) mine is more "representative". If you were explaining how a six sided dice works to someone, you shouldn't give an example of rolling 1 a hundred times in a row, even though it's technically as likely as any other sequence of a hundred rolls.

Importantly, I think this example (and accompanying analogy) is much better pedagogically as well.

None comments

Comments sorted by top scores.