post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2021-10-14T09:54:21.445Z · LW(p) · GW(p)

When preference makes references to self, copying (that doesn't also edit these references) changes the meaning of preference, doesn't preserve it. So if you are expecting copying, reflective consistency could be ensured by reformulating preference to avoid explicit references to self, such as by replacing them with references to a particular person, or to a reference class of people, whether they are yourself or not.

Replies from: None
comment by [deleted] · 2021-10-14T11:14:18.137Z · LW(p) · GW(p)Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2021-10-14T12:31:59.575Z · LW(p) · GW(p)

The reformulation of preference to replace references to self with specific people it already references doesn't change its meaning, so semantically such rewriting doesn't affect alignment. It only affects copying, which doesn't respect semantics of preferences. Other procedures that meddle with minds can disrupt semantics of preference in a way that can't be worked around.

(All this only makes sense for toy agent models, that furthermore have a clear notion of references to self, not for literal humans. Humans don't have preferences in this sense, human preference is a theoretical construct that needs something like CEV to access, the outcome of a properly set up long reflection.)

Replies from: None
comment by [deleted] · 2021-10-15T15:16:17.638Z · LW(p) · GW(p)
comment by JBlack · 2021-10-14T05:38:18.340Z · LW(p) · GW(p)

I'd be more interested in how I can get out of the locked room. If the only way to do that is for one of us to press the button, one of us might eventually press it.

If we cared more about the future of humanity, then we'd probably have to stage a hunger strike (possibly to death) instead, and that would be really unpleasant and still no guarantee that whoever stuck us in this room wouldn't just go and pick someone else to do it.

I know that if I pressed the button that I'd be an awful world dictator and wouldn't even enjoy it, probably even worse at finding someone else who would be any better, and in any event the world would have to immediately deal with the fact that there's a shitload of new tech around that seems to have the primary function of instilling totalitarian control over a civilization. The world would be fucked. If there's some super-intelligence out there planning to lock someone in a room with a clone and a world domination button, don't pick me.

Replies from: None
comment by [deleted] · 2021-10-14T07:57:24.749Z · LW(p) · GW(p)