Posts

Comments

Comment by Rafael Cosman (rafael-cosman) on What is wrong with this approach to corrigibility? · 2022-07-17T01:12:41.279Z · LW · GW

Really appreciate all the thoughtful and substantive comments!! Thanks very much, honestly was exactly what I was hoping for from posting.

Comment by Rafael Cosman (rafael-cosman) on What is wrong with this approach to corrigibility? · 2022-07-17T01:08:52.640Z · LW · GW

If implemented as described, the AI should be exactly indifferent to pushing the button? I guess the AI’s behavior in that situation is not well defined… and if we make the button give expected value minus epsilon reward, then the AI might kill you to stop you from pressing the button (because it wants that epsilon reward!)

So overall I suppose this is a fair criticism of the approach and is possibly what Paul means by issues with precisely balancing!