Specification gaming examples in AI

post by Samuel Rødal (samuel-rodal) · 2018-11-10T12:00:29.369Z · LW · GW · 6 comments

This is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml

Contents

6 comments

Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.

6 comments

Comments sorted by top scores.

comment by Said Achmiz (SaidAchmiz) · 2018-11-10T12:05:31.622Z · LW(p) · GW(p)

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

Replies from: Raemon
comment by Raemon · 2019-11-25T23:42:16.940Z · LW(p) · GW(p)

Rereading this a year later and holy christ that example is great and terrifying.

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:13:39.474Z · LW(p) · GW(p)

Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031

Replies from: Vika
comment by Vika · 2018-11-10T18:48:03.818Z · LW(p) · GW(p)

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:11:02.891Z · LW(p) · GW(p)

I noticed this has already been posted to Lesswrong here: https://www.lesswrong.com/posts/AanbbjYr5zckMKde7/specification-gaming-examples-in-ai [LW · GW]

Should I delete the post?

Replies from: habryka4
comment by habryka (habryka4) · 2018-11-10T18:38:52.675Z · LW(p) · GW(p)

Seems fine to leave here, as long as we link to the other place, and the other place links to here.