[LINK] Concrete problems in AI safety

post by Stuart_Armstrong · 2016-07-05T21:33:08.341Z · LW · GW · Legacy · 6 comments

From the Google Research blog:

We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

6 comments

Comments sorted by top scores.

comment by Viliam · 2016-07-11T14:33:00.986Z · LW(p) · GW(p)

I like how the examples of the robot failures are... uhm... not like from the Terminator movie. May make some people discuss them more seriously.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2016-07-11T17:52:59.641Z · LW(p) · GW(p)

Yep!

comment by morganism · 2016-07-07T20:27:10.152Z · LW(p) · GW(p)

these folks say that you won't be able to sandbox a AGI, due to the nature of computing itself.

Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

http://arxiv.org/abs/1607.00913v1

But perhaps we could fool it, by poisoning some crucial databases it uses in subtle ways.

DeepFool: a simple and accurate method to fool deep neural networks

http://arxiv.org/abs/1511.04599v3

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2016-07-09T08:13:59.365Z · LW(p) · GW(p)

strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

Sandboxing just requires that you be sure that the sandboxed entity can't send bits outside the system (except on some defined channel, maybe), which is perfectly feasible.

Replies from: Viliam
comment by Viliam · 2016-07-11T14:31:59.152Z · LW(p) · GW(p)

perfectly feasible

Citation needed.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2016-07-11T17:55:18.187Z · LW(p) · GW(p)

In software, it's trivial: create a subroutine with only a very specific output, include the entity inside it. Some precautions are then needed to prevent the entity from hacking out through hardware weaknesses, but that should be doable (using isolation in faraday cage if needed).