One weird trick to turn maximisers into minimisers
post by Stuart_Armstrong · 2016-04-22T16:45:02.000Z · LW · GW · 0 commentsContents
No comments
A putative new idea for AI control; index here.
A simple and easy design for a -maximising agent that turns into a -minimising one.
Let be some boolean random variable outside the agent's control, that will be determined at some future time (based on a cosmic event, maybe?). Set it up so that , and for a given utility consider the utility:
- .
Before , the expected value of is , so . Hence the agent is a -maximiser. After , the most likely option is , hence a little bit of evidence to that effect is enough to make into a -minimiser.
This isn't perfect corrigibility --- the agent would be willing to sacrifice a bit of -value (before ) in order to maintain its flexibility after . To combat this effect, we could instead use:
- .
If is large, then the agent is willing to pay very little -value to maintain flexibility. However, the amount of evidence of that it needs to become a -minimiser is equally proportional to , so better be a clear and convincing event.
0 comments
Comments sorted by top scores.