[LINK] Concrete problems in AI safety

stuart_armstrong

[LINK] Concrete problems in AI safety

post by Stuart_Armstrong · 2016-07-05T21:33:08.341Z · LW · GW · Legacy · 6 comments

We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

6 comments

Comments sorted by top scores.

comment by Viliam · 2016-07-11T14:33:00.986Z · LW(p) · GW(p)

I like how the examples of the robot failures are... uhm... not like from the Terminator movie. May make some people discuss them more seriously.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-07-11T17:52:59.641Z · LW(p) · GW(p)

Yep!

comment by morganism · 2016-07-07T20:27:10.152Z · LW(p) · GW(p)

these folks say that you won't be able to sandbox a AGI, due to the nature of computing itself.

Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

http://arxiv.org/abs/1607.00913v1

But perhaps we could fool it, by poisoning some crucial databases it uses in subtle ways.

DeepFool: a simple and accurate method to fool deep neural networks

http://arxiv.org/abs/1511.04599v3

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-07-09T08:13:59.365Z · LW(p) · GW(p)

strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

Sandboxing just requires that you be sure that the sandboxed entity can't send bits outside the system (except on some defined channel, maybe), which is perfectly feasible.

Replies from: Viliam

↑ comment by Viliam · 2016-07-11T14:31:59.152Z · LW(p) · GW(p)

perfectly feasible

Citation needed.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2016-07-11T17:55:18.187Z · LW(p) · GW(p)

In software, it's trivial: create a subroutine with only a very specific output, include the entity inside it. Some precautions are then needed to prevent the entity from hacking out through hardware weaknesses, but that should be doable (using isolation in faraday cage if needed).