Specification gaming examples in AI

This is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml

Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.


comment by Said Achmiz (SaidAchmiz) · 2018-11-10T12:05:31.622Z · LW(p) · GW(p)

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

comment by Raemon · 2019-11-25T23:42:16.940Z · LW(p) · GW(p)

Rereading this a year later and holy christ that example is great and terrifying.

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:13:39.474Z · LW(p) · GW(p)

Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031

comment by Vika · 2018-11-10T18:48:03.818Z · LW(p) · GW(p)

As a result of the recent attention, the specification gaming list has received a number of new submissions, so this is a good time to check out the latest version :).

comment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:11:02.891Z · LW(p) · GW(p)

