Specification gaming examples in AI

post by Samuel Rødal (samuel-rodal) · 2018-11-10T12:00:29.369Z · score: 28 (9 votes) · LW · GW · 5 comments

This is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml

Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.

5 comments

Comments sorted by top scores.

comment by Said Achmiz (SaidAchmiz) · 2018-11-10T12:05:31.622Z · score: 12 (6 votes) · LW · GW

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:13:39.474Z · score: 6 (3 votes) · LW · GW

Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031

comment by Vika · 2018-11-10T18:48:03.818Z · score: 4 (2 votes) · LW · GW

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:11:02.891Z · score: 3 (2 votes) · LW · GW

I noticed this has already been posted to Lesswrong here: https://www.lesswrong.com/posts/AanbbjYr5zckMKde7/specification-gaming-examples-in-ai [LW · GW]

Should I delete the post?

comment by habryka (habryka4) · 2018-11-10T18:38:52.675Z · score: 4 (2 votes) · LW · GW

Seems fine to leave here, as long as we link to the other place, and the other place links to here.