Posts

Immunity to perverse manipulation, self-scepticism, bounded impact 2018-02-01T09:59:44.000Z

Comments

Comment by IAFF-User-256 (Imported-IAFF-User-256) on Corrigibility through stratified indifference and learning · 2017-09-24T23:10:22.000Z · LW · GW

Edit: found the new post and it doesn't suffer from any of these =P

There's a few calculation errors:

One in the paragrahp "Half a chance of winning the 1:9 lottery": v utility is calculated as 0.525 but should be 0.275: 50% chance v is chosen as utility function × ( 10% chance of lottery win × v utility set to 1 point with the lottery money + 90% chance of no lottery win × v utility stays at 0.5 ) = 0.5 × ( 0.1 × 1 + 0.9 × 0.5 ) = 0.275

This doesn't change anything for the argumentation, but the other error actually turns against the conclusion that "the probability flows from u to v". if you win the lottery (10% chance) then you set up the choice to be u, so this is increasing probability of u by 5%. But in case of losing (90% chance) the probability of getting v only depends on human decision, which is 50/50 so p(u)=0.1×1+0.9×0.5=0.55 and p(v)=0.1×0+0.9×0.5=0.45 and the probability flows from v to u instead.

One other thing seems strange. Like the notion "An AI is currently hesitating between utilities u and v." If its utility function is currently undefined, then why would it want anything, including wanting to optimize for any future functions? It would help to clarify the AIs motivations by stating its starting utility function because isn't that what ultimately determines the indifference compensation required to move from it to a new utility function, be it u or v?