Posts
Comments
Comment by
michael amir (michael-amir) on
Corrigibility ·
2022-06-24T23:20:48.579Z ·
LW ·
GW
Consider the current state of the world A and a "bad" state of the world B (eg, where humans have all become paperclips). For a benign act-based agent to be safe it seems you need to prove that there is no sequence of actions A_2, A_3, ..., A_n, B, such that A_i is always preferable given world state A_i-1, and B would be preferable to A_n. I don't think this is realistically the case.