Specification gaming examples in AI

post by Samuel Rødal (samuel-rodal) · 2018-11-10T12:00:29.369Z · LW · GW · 6 comments

This is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml

Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.


Comments sorted by top scores.

comment by Said Achmiz (SaidAchmiz) · 2018-11-10T12:05:31.622Z · LW(p) · GW(p)

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

comment by Raemon · 2019-11-25T23:42:16.940Z · LW(p) · GW(p)

Rereading this a year later and holy christ that example is great and terrifying.

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:13:39.474Z · LW(p) · GW(p)

Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031

comment by Vika · 2018-11-10T18:48:03.818Z · LW(p) · GW(p)

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:11:02.891Z · LW(p) · GW(p)

I noticed this has already been posted to Lesswrong here: https://www.lesswrong.com/posts/AanbbjYr5zckMKde7/specification-gaming-examples-in-ai [LW · GW]

Should I delete the post?

comment by habryka (habryka4) · 2018-11-10T18:38:52.675Z · LW(p) · GW(p)

Seems fine to leave here, as long as we link to the other place, and the other place links to here.