Specification gaming examples in AI
post by Samuel Rødal (samuel-rodal)
score: 28 (9 votes) ·
This is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml
Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.
Comments sorted by top scores.
comment by Said Achmiz (SaidAchmiz)
· score: 12 (6 votes) · LW
These are great (and terrifying).
It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:
We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.
Literally “hacking the Matrix to gain superpowers”.