Checksum Sensor Alignment
post by lsusr · 2022-07-11T03:31:51.272Z · LW · GW · 2 commentsContents
Problem Solution None 2 comments
Problem
Suppose you have a powerful AGI protecting a diamond in a vault[1]. You have various sensors in the room, all monitoring the diamond. There is a camera, watching the diamond. There is a pressure plate under the diamond confirming it is still in its stand. There are a total of 10 sensors. The powerful AGI can hack all 10 of your sensors.
How do you align the AGI such that it does not hack your sensors? You cannot tell (directly) whether the AGI believes the diamond is in the vault. You must describe your instructions to the AGI in terms of the sensor readings.
Solution
Instruct the AGI to perform the simplest series of actions that cause 9 of your sensors to agree and 1 of your sensors to disagree. Suppose (without loss of generality) that the sensor which disagrees is your camera. Either the AGI hacked the camera or the AGI has hacked your other nine sensors. The AGI has probably just hacked your camera. Carefully examine what the AGI did to hack your camera. Fix the vulnerabilities.
Repeat the above process until the AGI can no longer hack your sensors.
This is a variant on ARC's ELK competition [LW · GW]. ↩︎
2 comments
Comments sorted by top scores.
comment by Milli | Martin (Milli) · 2022-07-11T11:56:41.981Z · LW(p) · GW(p)
Very nice solution.
You a word here:
Replies from: lsusrEither the AGI has the camera or the AGI has hacked your other nine sensors.