The map of "Levels of defence" in AI safety

post by turchin · 2017-12-12T10:44:17.968Z · LW · GW · Legacy · 5 comments

Contents

5 comments

One of the main principles of engineering safety is multilevel defence. When a nuclear bomb accidentally fell from the sky in the US, 3 of 4 defence levels failed. The last one prevented the nuclear explosion: https://en.wikipedia.org/wiki/1961_Goldsboro_B-52_crash

Multilevel defence is used a lot in the nuclear industry and includes different systems of passive and active safety, starting from the use of delayed neutrons for the reaction activation and up to control rods, containment building and exclusion zones.  

Here, I present a look at the AI safety from the point of view of multilevel defence. This is mainly based on two of my yet unpublished articles: “Global and local solutions to AI safety” and “Catching treacherous turn: multilevel AI containment system”.  

The special property of the multilevel defence, in the case of AI, is that the biggest defence comes from only the first level, which is AI alignment. Other levels have progressively smaller chances to provide any protection, as the power of self-improving AI will grow after it will break of each next level. So we may ignore all levels after AI alignment, but, oh Houston, we have a problem: based on the current speed of AI development, it seems that powerful and dangerous AI could appear within several years, but AI safety theory needs several decades to be created.

The map is intended to demonstrate a general classification principle of the defence levels in AI safety, but not to list all known ideas on the topic. I marked in “yellow” boxes, which are part of the plan of MIRI according to my understanding.   

I also add my personal probability estimates as to whether each level will work (under the condition that AI risks are the only global risk, and previous levels have failed). 

The principles of the construction of the map are similar to my “plan of x-risks prevention” map and my “immortality map”, which are also based around the idea of the multilevel defence.

pdf: https://goo.gl/XH3WgK 

 

5 comments

Comments sorted by top scores.

comment by UNVRSLWSDM · 2017-12-19T20:31:31.028Z · LW(p) · GW(p)

lovely... Catch Me If You Can, AI style... How about adding a box for "too much power in wrong human hands" (if not already there somehow)? Yes, AI power too.

Because this is the far greatest problem of human civilization. Everything (just staying in the tech domain) is created by very small fraction of the population (who posses that superpower called INTELLIGENCE) , but the society is ruled by a completely different tiny group that posses various predatory superpowers to assume control and power (over the herd of mental herbivores they "consume").

This is not going to end up well, such a system does not scale, has imbalances and systemic risks practically everywhere. Remember, already nuclear weapons were "too much" and we are only lucky that bio$hit is not really effective as a weapon (already gas was nothing much).

We are simply too much animals and too little on that "intelligent" side we often worship. And the coming nanobots will be far worse than nukes, and far less controllable.

comment by RedMan · 2018-01-04T02:00:03.061Z · LW(p) · GW(p)

Rules for an AI:

If an action it takes results in more than N logs of $ worth of damage to humans/kills more than N logs of humans, transfer control of all systems it can provide control inputs to designated backup (human, formally proven safe algorithmic system, etc), power down.

When choosing among actions which affect a system external to it, calculate probable effect on human lives. If probability of exceeding N assigned in rule 1 is greater than some threshold Z, ignore that option, if no options are available, loop.

Most systems would be set to N= 1, Z = 1/10000, giving us five 9s of certainty that the AI won't kill anyone. Some systems (weapons, climate management, emergency management dispatch systems) will need higher N scores and lower Z scores to maintain effectiveness.

JFK had an N of like 9 and a Z score of 'something kind of high', and passed control to Lyndon B Johnson of 'I have a minibar and a shotgun in the car I keep on my farm so I can drive and shoot while intoxicated' fame. We survived that, we will be fine.

Are we done?

Replies from: Lumifer
comment by Lumifer · 2018-01-04T15:47:35.431Z · LW(p) · GW(p)

Are you reinventing Asimov's Three Laws of Robotics?

Replies from: RedMan
comment by RedMan · 2018-01-04T19:17:21.489Z · LW(p) · GW(p)

I hadn't thought about it that way.

I do think that either compiler time flags for the AI system or a second 'monitor' system chained to the AI system in order to enforce the named rules would probably limit the damage.

The broader point is that probabilistic AI safety is probably a much more tractable problem than absolute AI safety for a lot of reasons, to further the nuclear analogy, emergency shutdown is probably a viable safety measure for a lot of the plausible 'paperclip maximizer turns us into paperclips' scenarios.

"I need to disconnect the AI safety monitoring robot from my AI-enabled nanotoaster robot prototype because it keeps deactivating it" might still be the last words a human ever speaks, but hey, we tried.

Replies from: Lumifer
comment by Lumifer · 2018-01-05T16:35:27.114Z · LW(p) · GW(p)

There seems to be a complexity limit to what humans can build. A full GAI is likely to be somewhere beyond that limit.

The usual solution to that problem -- see the EY's fooming scenario -- is to make the process recursive: let a mediocre AI improve itself, and as it gets better it can improve itself more rapidly. Exponential growth can go fast and far.

This, of course, gives rise to another problem: you have no idea what the end product is going to look like. If you're looking at the gazillionth iteration, your compiler flags were probably lost around the thousandth iteration and your chained monitor system mutated into a cute puppy around the millionth iteration...

Probabilistic safety systems are indeed more tractable, but that's not the question. The question is whether they are good enough.