Specification gaming examples in AI
post by Samuel Rødal (samuel-rodal) · 2018-11-10T12:00:29.369Z · LW · GW · 6 commentsThis is a link post for https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml
Contents
6 comments
Interesting list of examples where AI programs gamed the specification, solving the problem in rather creative (or dumb) ways not intended by the programmers.
6 comments
Comments sorted by top scores.
comment by Said Achmiz (SaidAchmiz) · 2018-11-10T12:05:31.622Z · LW(p) · GW(p)
These are great (and terrifying).
It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:
We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.
Literally “hacking the Matrix to gain superpowers”.
Replies from: Raemoncomment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:13:39.474Z · LW(p) · GW(p)
Also recently discussed on Hacker News: https://news.ycombinator.com/item?id=18415031
Replies from: Vikacomment by Samuel Rødal (samuel-rodal) · 2018-11-10T12:11:02.891Z · LW(p) · GW(p)
I noticed this has already been posted to Lesswrong here: https://www.lesswrong.com/posts/AanbbjYr5zckMKde7/specification-gaming-examples-in-ai [LW · GW]
Should I delete the post?
Replies from: habryka4↑ comment by habryka (habryka4) · 2018-11-10T18:38:52.675Z · LW(p) · GW(p)
Seems fine to leave here, as long as we link to the other place, and the other place links to here.