Clarifying The Malignity of the Universal Prior: The Lexical Update

post by interstice · 2020-01-15T00:00:36.682Z · LW · GW · 2 comments

Contents

3 comments

[UPDATE: looks like the lexical update is real after all; see Paul's comment and my reply]

In Paul's classic post What does the universal prior actually look like? he lays out an argument that the universal prior, if it were to be used for important decisions, would likely be malign, giving predictions that would effectively be under the control of alien consequentialists. He argues for this based on an 'anthropic update' the aliens could make that would be difficult to represent in a short program. We can split this update into two parts: an 'importance update' restricting attention to bits fed into priors used to make important decisions, and what I'm calling a 'lexical update' which depends on the particular variant of the universal prior being used. I still believe that the 'importance update' would be very powerful, but I'm not sure anymore about the 'lexical update'. So in this post I'm going to summarize both in my own words then explain my skepticism towards the 'lexical update'.

As background, note that 'straightforwardly' specifying data such as our experiences in the universal prior will take far more bits than just describing the laws of physics, as you'll also need to describe our location in spacetime, an input method, and a set of Everett branches(!), all of which together will probably take more than 10000 bits(compared to the laws alone which likely only take a few hundred) Thus, any really short program(a few hundred bits, say) that could somehow predict our experiences well would likely have a greater probability according to the universal prior than the 'straightforward' explanation.

Paul's post argues that there likely do exist such programs. I'm going to fix a reference prefix machine U which generates a universal prior. The argument goes:

A) there are many long-running programs with short descriptions according to U, such as our universe.

B) If other programs are like our universe's program, aliens could evolve there and end up taking over their programs.

C) Since their program has high measure in U, the aliens will plausibly have been selected to be motivated to control short programs in U.

D) To control U, the aliens could try to manipulate beings using the universal prior who have control over short programs in U (like us, hypothetically)

E) If the aliens are reasonably motivated to manipulate U, we can sample them doing that with few bits.

F) The aliens will now try to output samples from Q, the distribution over people using the universal prior to make important decisions(decisions impacting short programs in U). They can do this much more efficiently than any 'straightforward' method. For instance, when specifying which planet we are on, the aliens can restrict attention to planets which eventually develop life, saving a great many bits.

G) The aliens can then choose a low-bit broadcast channel in their own universe, so the entire manipulative behavior has a very short description in U.

H) For a short program to compete with the aliens, it would essentially need access to Q. But this seems really hard to specify briefly.

So far I agree. But the post also argues that even a short program that could sample from Q would still lose badly to the aliens, based on what I'm calling a 'lexical update', as follows:

I) In practice most people in U using 'the universal prior' won't use U itself but one of many variants U'(different universal programming languages)

J) Each of those variants U' will have their own Q', the distribution over people making important decisions with U'. Q is then defined as the average over all of those variants(with different U' weighted by simplicity in U)

K) Since short programs in different U' look different from each other, the aliens in those programs will be able to tell which variant U' they are likely to be in.

L) The distributions Q' of people in U using different variants U' all look different. Describing each Q' given Q should take about as many bits as it takes to specify U' using U.

M) But the aliens will already know they are in U', and so can skip that, gaining a large advantage even over Q.

But there's a problem here. In C) it was argued that aliens in short programs in U will be motivated to take over other short programs in U. When we condition on the aliens actually living somewhere short according to U', they are instead motivated to control short programs in U'. This would reduce their motivation to control short programs in U proportionally to the difficulty of describing U in U', and with less motivation, it takes more bits to sample their manipulative behaviors in E). The advantage they gained in L) over Q was proportional to the difficulty of describing U' in U. On average these effects should cancel out, and the aliens' probability mass will be comparable to Q.

The universal prior is still likely malign, as it's probably hard to briefly specify Q, but it no longer seems to me like the aliens would decisively beat Q. I still feel pretty confused about all this so comments pointing out any mistakes or misinterpretations would be appreciated.

2 comments

Comments sorted by top scores.

comment by paulfchristiano · 2020-01-15T17:35:28.874Z · LW(p) · GW(p)

I still tentatively think the lexical update works, but it's been a while and I might be missing something.

I'll follow your convention that our universe is U-simple, our universal prior is U', and so the aliens' universe is U'-simple (I think---sorry if I got confused and that's not what you mean).

If we sample from a universe that is U'-simple, then:

  • Assume the aliens care about U'-simplicity. They will preferentially sample from U', and so have U'(our world) mass on our world. Within that, they will correctly guess that the machine they are supposed to control is using U' as its prior. That is, they basically pay U'(our world) * P(us|someone using U' to predict).
  • But our universal prior was also U', wasn't it? So we are also paying U'(our world) to pick out our world. I.e. we pay U'(our world) * P(someone making important predictions | our world) * P(someone using U' to predict | someone making important predictions) * P(us|someone using U' to predict).
  • I don't see any program whose behavior depends on U(world) for the "real" simplicity prior U according to which our world is simple (and that concept seems slippery).

Does that seem right?

Replies from: interstice, interstice
comment by interstice · 2020-01-16T20:44:36.073Z · LW(p) · GW(p)

Thanks, that makes sense. Here is my rephrasing of the argument:

Let the 'importance function' take as inputs machines and , and output all places where is being used as a universal prior, weighted by their effect on -short programs. Suppose for the sake of argument that there is some short program computing ; this is probably the most 'natural' program of this form that we could hope for.

Even given such a program, we'll still lose to the aliens: in , directly specifying our important decisions on Earth using will require both and to be fed into , costing bits, then bits to specify us. For the aliens, getting them to be motivated to control -short programs costs bits, but then they can skip directly to specifying us given , so they save bits over the direct explanation. So the lexical update works.

(I went wrong in thinking that the aliens would need to both update their notion of importance to match ours *and* locate our world; but if we assume the 'importance function' exists then the aliens can just pick out our world using our notion of importance)