Poorly-Aimed Death Rays

post by Thane Ruthenis · 2022-06-11T18:29:55.430Z · LW · GW · 5 comments

Alternate framing: Optimality is the tiger, and agents are its teeth [LW · GW].

Tonally relevant: Godzilla Strategies [LW · GW].


It's a problem when people think that a superintelligent AI will be just a volitionless tool that will do as told. But it's also a problem when people focus overly much on the story of "agency". When they imagine that all of the problems come from the AI "wanting" things, "thinking" things, and consequentializing all over the place about it. If only we could make it more of a volitionless tool! Then all of our problems would be solved. Because the problem is the AI using its power in clever ways with the deliberate intent to hurt us, right?

This, I feel, fails entirely to appreciate the sheer power of optimization, and how even the slightest failure to aim it properly, the slightest leakage of its energy in the wrong direction, for the briefest of moments, will be sufficient to wash us all away.

The problem isn't making a superintelligent system that wouldn't positively want to kill us. Accidentally killing us all is a natural property of superintelligence. The problem is making an AI that will deliberately spend a lot of effort on ensuring it's not killing us.

I find planet-destroying Death Rays to be a good analogy. Think the Death Star. Think—


Imagine that you're an engineer employed by an... eccentric fellow. The guy has a volcano lair, weird aesthetic tastes, and a tendency to put words like "world" and "domination" one after another. You know the type.

One of his latest schemes is to blow up Jupiter. To that end, he'd had excavated a giant cavern underneath his volcano lair, dug a long cylindrical tunnel from that cavern to the surface, and ordered your team to build a beam weapon in that cavern and shoot it through the tunnel at Jupiter.

You're getting paid literal tons of money, so you don't complain (except about the payment logistics). You have a pretty good idea of how to do that project, too. There are these weird crystal things your team found lying around. If you poke one in a particular way, it releases a narrow energy beam which blows up anything it touches. The power of the beam scales superexponentially with the strength of the poke; you're pretty sure shooting one with a rifle will do the Jupiter-vanishing trick.

There's just one problem: aim. You can never quite predict which part of the crystal will emit the beam. It depends on where you poke it, but also on how hard you poke, with seemingly random results. And your employer is insistent that the Death Ray be fired from the cavern through the tunnel, not from space where it's less likely to hit important things, or something practical like that.

If you say that can't be done, your employer will just replace you with someone less... pessimistic.

So, here's your problem. How do you build a machine that uses one or more of these crystals in such a way that they fire a Death Ray through the tunnel at Jupiter, without hitting Earth and killing everyone?[1]



This analogy can be nitpicked endlessly, of course. By no means does anything here prove that it's a valid one. You can argue that just a wee bit of misalignment won't destroy the world, or that the AI doesn't need to be dangerous in this way for us to do interesting things with it, or that intelligence isn't really quite that powerful, et cetera.

This post isn't aimed at convincing someone of that; there's a lot of posts that do it already. But if you broadly agree with the premise, but have some difficulty sorting out the exact problems with any given containment scenario, this analogy might help.

Any sufficiently powerful AI system holds a terrifying core of optimization — the ability to implacably rewrite some part of the world according to some specification. It doesn't matter how that power is represented, in what wrapper it's in, where specifically it is aimed, whether it's controlled by an alien sapient entity. As long as it's not aimed exactly where we want it to be, with no leakage, from the very beginning, it will kill us all.

It's its intrinsic property.

  1. ^

    Also, Earth has no atmosphere in that scenario. Probably your employer's fault too. But at least that means a well-aimed beam wouldn't hit the air and explode everything anyway.

5 comments

Comments sorted by top scores.

comment by Towards_Keeperhood (Simon Skade) · 2022-06-18T21:05:30.934Z · LW(p) · GW(p)

I really like this analogy!

Also worth noting that some idiot may just play around with death ray technology without aiming it...

comment by Jeff Rose · 2022-06-11T20:26:17.721Z · LW(p) · GW(p)

An AGI that lacks volition is incomparably safer.  In particular, it is very highly unlikely to render humanity extinct.  In addition, absent volition it is possible to prevent an AGI from doing too much harm by moving more slowly and carefully, having breakpoints, having advanced modeled outputs etc.    

It will still be dangerous in the sense that many powerful technologies are dangerous, but not uniquely so. 

Replies from: ZT5, Richard_Kennaway
comment by ZT5 · 2022-06-12T00:00:16.682Z · LW(p) · GW(p)

I think the point of this post is "a powerful enough optimization process kills you (and everyone else) anyway".

As soon as you give it a command the AI has "volition" in the sense that it is optimizing some output that affects the world.

comment by Richard_Kennaway · 2022-06-24T16:18:04.890Z · LW(p) · GW(p)

Here is a breakdown of deaths by causes, worldwide, for every year from 1990 to 2019. The overwhelming majority do not involve volition. The first category that does, suicide, accounted for less than 2% of all deaths in each of those years. Homicide was always under 1%. Conflict and Terrorism together are below 1% in every year but one. Alcohol disorders and Drug disorders might be regarded as having volitional causes, but their contribution is similarly insignificant.

So at least 97% of all deaths do not happen because of anyone's volition. I am not seeing in this an argument for the safety of excluding volition, whatever that is, from a system.

I say "whatever that is," because while it should be clear what I mean in using the word above about people, it is not clear what it means when applied to an artificial system. We do not have a gears-level model that we can use to impute it or not to any given system.