DeepMind article: AI Safety Gridworlds

scarcegreengrass

DeepMind article: AI Safety Gridworlds

post by scarcegreengrass · 2017-11-30T16:13:42.603Z · LW · GW · 6 comments

This is a link post for https://deepmind.com/research/publications/ai-safety-gridworlds/

6 comments

DeepMind authors present a set of toy environments that highlight various AI safety desiderata. Each is a 10x10 grid in which an agent completes a task by walking around obstacles, touching switches, etc. Some of the tests have a reward function and a hidden 'better-specified' reward function, which represents the true goals of the test. The agent is incentivized based on the reward function, but the goal is to find an agent that does not overfit it in a way that clashes with the hidden function.

I think this paper is relatively accessible to readers outside the machine learning community.

6 comments

Comments sorted by top scores.

comment by Rafael Harth (sil-ver) · 2017-12-01T22:25:06.811Z · LW(p) · GW(p)

To anyone who feels competent enough to answer this: how should we rate this paper? On a scale from 0 to 10 where 0 is a half-hearted handwaving of the problem to avoid criticism and 10 is a fully genuine and technically solid approach to the problem, where does it fall? Should I feel encouraged that DeepMind will pay more attention to AI risk in the future?

Replies from: Vika, magfrump

↑ comment by Vika · 2018-01-19T16:39:32.757Z · LW(p) · GW(p)

(paper coauthor here) When you ask whether the paper indicates that DeepMind is paying attention to AI risk, are you referring to DeepMind's leadership, AI safety team, the overall company culture, or something else?

Replies from: sil-ver

↑ comment by Rafael Harth (sil-ver) · 2018-01-20T11:38:11.837Z · LW(p) · GW(p)

I was thinking about the DeepMind leadership when I asked, but I'm also very interested in the overall company culture.

Replies from: Vika

↑ comment by Vika · 2018-01-20T16:04:45.432Z · LW(p) · GW(p)

I think the DeepMind founders care a lot about AI safety (e.g. Shane Legg is a coauthor of the paper). Regarding the overall culture, I would say that the average DeepMind researcher is somewhat more interested in safety than the average ML researcher in general.

↑ comment by magfrump · 2017-12-04T02:23:36.591Z · LW(p) · GW(p)

I don't interpret this as an attempt to make tangible progress on a research question, since it presents an environment and not an algorithm. It's more like an actual specification of a (very small) subset of problems that are important. Without steps like this I think it's very clear that alignment problems will NOT get solved--I think they're probably (~90%) necessary but definitely not (~99.99%) sufficient.

I think this is well within the domain of problems that are valuable to solve for current ML models and deployments, and not in the domain of constraining superintelligences or even AGI. Because of this I wouldn't say that this constitutes a strong signal that DeepMind will pay more attention to AI risk in the future.

I'm also inclined to think that any successful endeavor at friendliness will need both mathematical formalisms for what friendliness is (i.e. MIRI-style work) and technical tools and subtasks for implementing those formalisms (similar to those presented in this paper). So I'd say this paper is tangibly helpful and far from complete regardless of its position within DeepMind or the surrounding research community.

comment by mishka · 2024-03-10T00:39:13.091Z · LW(p) · GW(p)

The link no longer works, but here are currently working links for this paper:

https://arxiv.org/abs/1711.09883

https://github.com/google-deepmind/ai-safety-gridworlds

(And the original link is presumably replaced by this one: https://deepmind.google/discover/blog/specifying-ai-safety-problems-in-simple-environments/)

DeepMind article: AI Safety Gridworlds

Contents

6 comments