Toy model of the AI control problem: animated version

post by Stuart_Armstrong · 2017-10-10T11:12:07.207Z · LW · GW · Legacy · 2 comments

Crossposted at LessWrong 2.0.

A few years back, I came up with a toy model of the AI control problem. It has a robot moving boxes into a hole, with a slightly different goal than it's human designers, and a security camera to check that it's behaving as it should. The robot learns to block the camera to get its highest reward.

I've been told that the model is useful for explaining the control problem quite a few people, and I've always wanted to program the "robot" and get an animated version of it. Gwern had a live demo, but it didn't illustrate all the things I wanted to.

So I programmed the toy problem in python, and generated a video with commentary.

In this simplified version, the state space is sufficiently small that you can explicitly generate the whole table of Q-values (expected reward for taking an action in a certain state, assuming otherwise optimal policy). Since behaviour is deterministic, this can be updated in dynamical programming, using a full width backup. The number of such backups essentially measures the depth of the robot's predictive ability.

The most interesting depths of planning are:

The code and images can be found here.


Comments sorted by top scores.

comment by turchin · 2017-10-10T17:14:17.412Z · LW(p) · GW(p)

I expected it will jump out and start to replicate all over the world.

comment by Luke_A_Somers · 2017-10-10T12:50:21.139Z · LW(p) · GW(p)

I remember poking at that demo to try to actually get it to behave deceptively - with the rules as he laid them out, the optimal move was to do exactly what the humans wanted it to do!