Am I confused about the "malign universal prior" argument?

nostalgebraist

Am I confused about the "malign universal prior" argument?

post by nostalgebraist · 2024-08-27T23:17:22.779Z · LW · GW · 5 comments

This is a question post.

  UP-using "universes" and simulatable "universes"
  Some thoughts that one might have
    On slowdown
    On efficiency
  The common thread
None
  Answers
    15 Jeremy Gillen
    11 evhub
    10 Thane Ruthenis
    4 justinpombrio
    3 Cole Wyeth
    2 Noosphere89
    1 philip_b
    1 Signer
None
5 comments

In a 2016 blog post, Paul Christiano argued that the universal prior (hereafter "UP") may be "malign." His argument has received a lot of follow-up discussion, e.g. in

Mark Xu's The Solomonoff Prior is Malign [LW · GW]
Charlie Steiner's The Solomonoff prior is malign. It's not a big deal. [LW · GW]

among other posts.

This argument never made sense to me. The reason it doesn't make sense to me is pretty simple, but I haven't seen it mentioned explicitly in any of the ensuing discussion.

This leaves me feeling like either I am misunderstanding the argument in a pretty fundamental way, or that there is a problem with the argument that has gotten little attention from the argument's critics (in which case I don't understand why).

I would like to know which of these is the case, and correct my misunderstanding if it exists, hence this post.

(Note: In 2018 I wrote a comment on the original post where I tried to state one of my objections to my argument, though I don't feel I expressed myself especially well there.)

UP-using "universes" and simulatable "universes"

The argument for malignity involves reasoning beings, instantiated in Turing machines (TMs), which try to influence the content of the UP in order to affect other beings who are making decisions using the UP.

Famously, the UP is uncomputable.

This means the TMs (and reasoning beings inside the TMs) will not be able to use^[1] the UP themselves, or simulate anyone else using the UP. At least not if we take "using the UP" in a strict and literal sense.

Thus, I am unsure how to interpret claims (which are common in presentations of the argument) about TMs "searching for universes where the UP is used" or the like.

For example, from Mark Xu's "The Solomonoff Prior is Malign [LW · GW]":

In particular, this suggests a good strategy for consequentialists: find a universe that is using a version of the Solomonoff prior that has a very short description of the particular universe the consequentialists find themselves in.

Or, from Christiano's original post:

So the first step is getting our foot in the door—having control over the parts of the universal prior that are being used to make important decisions.
This means looking across the universes we care about, and searching for spots within those universe where someone is using the universal prior to make important decisions. In particular, we want to find places where someone is using a version of the universal prior that puts a lot of mass on the particular universe that we are living in, because those are the places where we have the most leverage.
Then the strategy is to implement a distribution over all of those spots, weighted by something like their importance to us (times the fraction of mass they give to the particular universe we are in and the particular channel we are using). That is, we pick one of those spots at random and then read off our subjective distribution over the sequence of bits that will be observed at that spot (which is likely to involve running actual simulations).

What exactly are these "universes" that are being searched over? We have two options:

They are not computable universes. They permit hypercomputation that can leverage the "actual" UP, in its full uncomputable glory, without approximation.
They are computible universes. Thus the UP cannot be used in them. But maybe there is some computible thing that resembles or approximates the UP, and gets used in these universes.

Option 1 seems hard to square with the talk about TMs "searching for" universes or "simulating" universes. A TM can't do such things to the universes of option 1.

Hence, the argument is presumably about option 2.

That is, although we are trying to reason about the content of the UP itself, the TMs are not "searching over" or "simulating" or "reasoning about" the UP or things containing the UP. They are only doing these things to some other object, which has some (as-yet unspecified) connection to the UP, such as "approximating" the UP in some sense.

But now we face some challenges, which are never addressed in presentations of the argument:

The argument is about the content of the "actual" UP, not the content of some computable approximation.

If the reasoning beings are considering -- and trying to influence -- some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP (whatever that means) for the influences upon it to "bubble up to" the UP itself.
The behavior of the TMs obviously affects the UP. But it's not so obvious that the behavior of the TMs can affect the other, UP-related thing that the TMs able to simulate.

In other words, the TMs can affect the UP, but it doesn't seem like they have the resources to figure out what sorts of effects they prefer and disprefer. And on the other hand, there may be something for which they can do this preference reasoning, but we haven't established that they can affect that other thing.

Some thoughts that one might have

What sort of thing is this not-UP -- the thing that the TMs can simulate and search over?

I don't know; I have never seen any discussion of the topic, and haven't thought about it for very long. That said, here are a few seemingly obvious points about it.

On slowdown

Suppose that we have a TM, with a whole world inside it, and some reasoning beings inside that world.

These beings are aware of some computable, but vaguely "UP-like," reasoning procedure that they think is really great.

In order to be "UP-like" in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.

(This procedure needs to differ from the UP by using a computable weighting function for the the TMs. It should also be able to return results without having to wait for eternity as the non-halting TMs do their not-halting. The next section will say more about the latter condition.)

Now they want to search through computable universes (by simulation) to look for ones where the UP-esque procedure is being used.

What does it look like when they find one? At this point, we have

A TM, which I'll call the "outer" TM, containing...
- ...a universe that includes our reasoning beings, and a bunch of irrelevant galaxies and stuff, along with...
  - ...one special part that is simulating a second universe, which (the second universe) includes a bunch of irrelevant galaxies and stuff, along with...
    - ...one special part that implements the UP-like procedure, and thus runs a bunch of TMs that aren't the same as the outer TM, along with...
      - ...one special part that is simply the outer TM again (and from here on the whole thing repeats indefinitely, with more slowdown every time we go around the loop)

Each level of nesting incurs some slowdown relative to just running the "relevant" part of the thing that is being nested, because some irrelevant stuff has to come along for the ride.

It takes many many clock-ticks of the outer TM to advance the copy of it several levels down, because we have to spend a lot of time on irrelevant galaxies and on other TMs involved in the procedure.

(There is also a extra "constant factor" from the fact that we have to wait for the outer TM to evolve life, etc., before we get to the point where it starts containing a copy at all.)

So I don't see how the guys in the outer TM would be able to advance their simulation up to the point where something they can control is being "read off," without finding that in fact this read-off event occurred in their own distant past, and hence is no longer under their control.

To riff on this: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed, so it may put high weight on TMs that do very long-running things like simulating universes that simulate other universes.

Fine -- but once we start talking about a universe that is simulating itself (in order to reason about UP-like objects that involve it), speed starts to matter for a different reason. If you are simulating yourself, it is always with some slowdown, since you contain parts other than the simulator. You'll never be able to "catch up with yourself" and, e.g., read your own next action off of the simulation rather than choosing it in the ordinary manner.

It's possible that there are ways around this objection, even if it's valid in principle. For instance, maybe the reasoning beings can make inferences about the future behavior of the procedure-users, jumping ahead of the slow simulation.

It's easy to imagine how this might work for "finding the output channel," since you can just guess that a channel used once will be re-used again. But it would be much harder to decide what one's preferred output actually is at "future" points not yet reached in the simulation; here one would effectively need to do futurism about the world in which the procedure is being used, probably on an extremely long time horizon.

On efficiency

There are results showing that the UP (or Solomonoff Induction) are in some sense optimal. So it is easy to wind up thinking that, if some procedure is a good idea, it must be (in some sense) an "approximation of" these things.

But the kind of "approximation" involved does not look (in hand-wavey terms) like the ideal thing (UP or SI), plus some unbiased "approximation noise."

The ways that one would deviate from the ideal, when making a practically useful procedure, have certain properties that the ideal itself lacks. In the hand-wavey statistical analogy, the "noise" is not zero-mean.

I noted above that the "UP-like procedure" will need to use a computible weighting function. So, this function can't be Kolmogorov complexity.

And indeed, if one is designing a procedure for practical use, one probably wouldn't want anything like Kolmogorov complexity. All else being equal, one doesn't want to sit around for ages waiting for a TM to simulate a whole universe, even if that TM is "simple." One probably wants to prioritize TMs that can yield answers more quickly.

As noted above, in practice one never has an infinite amount of time to sit around waiting for TMs to (not) halt, so any method that returns results in finite time will have to involve some kind of effective penalty on long-running TMs.

But one may wish to be even more aggressive about speed than simply saying "I'm only willing to wait this long, ignore any TM that doesn't halt before then." One might want one's prior to actively prefer fast TMs over slow ones, even within the range of TMs fast enough that you're willing to wait for them. That way, if at any point you need to truncate the distribution and only look at the really high-mass TMs, the TMs you are spared from running due to the truncation are preferentially selected to be ones you don't want to run (because they're slow).

These points are not original, of course. Everyone talks about the speed prior.

But now, return to our reasoning beings in a TM, simulating a universe, which in turn uses a procedure that's great for practical purposes.

The fact that the procedure is "great for practical purposes" is crucial to the beings' motivation, here; they expect the procedure to actually get used in practice, in the world they're simulating. They expect this because they think it actually is a great idea -- for practical purposes -- and they expect the inner creatures of the simulation to notice this too.

Since the procedure is great for practical purposes, we should expect that it prioritizes efficiently computable TMs, like the speed prior does.

But this means that TMs like the "outer TM" in which our beings live -- which are simple (hence UP cares about them) but slow, having to simulate whole universes with irrelevant galaxies and all before they can get to the point -- are not what the "great for practical purposes" procedure cares about.

Once again: the malignity argument involves the fact that the UP puts high weight on simple TMs, but doesn't care about speed. This is true of the UP. But it is a count against using the UP, or anything like it, for practical purposes.

And so we should not expect the UP, or anything like it, to get used in practice by the kinds of entities we can simulate and reason about.

We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.

The common thread

It feels like there is a more general idea linking the two considerations above.

It's closely related to the idea I presented in When does rationality-as-search have nontrivial implications? [LW · GW].

Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.

If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.

As stated this seems not quite right, since you might have some approximate knowledge of the search that suffices for your control purposes, yet is "less powerful" than the search as a whole.

For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."

Maybe such a gap is possible! It would be very interesting if so.

But this question -- which seems like the question on which the whole thing turns -- is not addressed in any of the treatments I've seen of the malignity argument. Instead, these treatments speak casually of TMs "simulating universes" in which someone is "using" the UP, without addressing where in the picture we are to put the "slack" -- the use of merely-approximate reasoning -- that is necessary for the picture to describe something possible at all.

What am I missing?

^{^}
For simplicity, I mostly avoid mentioning Solomonoff Induction in this post, and refer more broadly to "uses" of the UP, whatever these may be.

Answers

answer by Jeremy Gillen · 2024-08-28T14:59:04.831Z · LW(p) · GW(p)

(This reminded me of a couple of arguments I've had in the past, in person. I think you are missing something. But I've previously failed to communicate this thing. I hope I'm not misinterpreting your point, and sorry if this comment comes across as frustrated at some points.)

For anything like the malignity argument to work, we need this kind of "gap" to exist -- a gap between the power needed to actually use the UP (or the speed prior, or whatever), and the power needed to merely "understand them well enough for control purposes."
Maybe such a gap is possible! It would be very interesting if so.

Such a gap is so common that I'm worried I've missed your point. There are ~always a range of algorithms that solve the same problem and have very different levels of efficiency. This is clearly true of algorithms that predict the next bit. You are correct that a malign hypothesis needs to be running an algorithm that is more efficient than the outer algorithm.

Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.

I think this is easy to imagine. I'm an expert who is among 10 experts recruited to advise some government on making a decision. I can guess some of the signals that the government will use to choose who among us to trust most. I can guess some of the relative weaknesses of fellow experts. I can try to use this to manipulate the government into taking my opinion more seriously. I don't need to create a clone government and hire 10 expert clones in order to do this.

If you are powerful enough to reason about the search (and do this well enough for steering), then in some sense the search is unnecessary -- one could delete all the other elements of the search space, and just consult you about what the search might have done.

It's true that if an induction algorithm is maximally compute efficient, then it shouldn't have daemon problems. Because there is no way for a daemon to do better prediction than alternative hypotheses. But... actual algorithms that we build aren't usually compute optimal, so there's a risk they will find more compute efficient algorithms internally. (I'm not sure I understood what you're saying here, so tell me if this is a non sequitur).

The argument is about the content of the "actual" UP, not the content of some computable approximation.

If the reasoning beings are considering -- and trying to influence -- some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP (whatever that means) for the influences upon it to "bubble up to" the UP itself.

You seem to be saying the behavior of a computable approximation is unrelated to the behavior of the idealization? Like, of course there are differences between reality and idealizations. Paul mentions this a few times in the original post, that the connection to real world algorithms and consequences isn't clear. But I think you're missing the whole point of doing theory work about idealized models. Theorizing about idealizations is easier (and possible). A common pattern for theorists is to work out the consequences of an idealized theory, then to apply this theory to reality, they try to approximately adjust for the most important differences between the theory and reality.

A good example is worst case runtime analysis. It's very useful for predicting real world runtime. In some situations, it's far too pessimistic (or optimistic!). But in those situations, there's always a reason, there's some factor that the worst case analysis isn't taking into account. And with some familiarity with the field, you get to know these factors and how and when we can correct for them when transferring your knowledge to the real world.

Back to induction, specifically this line:

some computable thing that isn't the UP, we need to determine whether this thing has the right kind of relationship to the UP

The reasoning beings inside the hypothesis are trying to make their computable approximation as similar to the UP as possible. Sure, they might make mistakes, or be limited in some way by what algorithms are possible. But, in our lack of knowledge about the details, we don't have to get stuck in confusion about how exactly they might create an approximation. We can just assume they did a good job (as an idealization/approximation). This is the standard approximate way of predicting the consequences of competent agents. This is part of making an idealized theory.

If we later discover that all algorithms that are designed to predict the next bit in the real world have a particular property (i.e. are biased toward fast computations), then we can redo the malign prior theory in light of that knowledge. Maybe you think "biased toward fast computations" is obviously true of all induction algorithms? (it's definitely not true, consider the runtime of theories in physics).

The UP is malign idea is the idea of optimization daemons applied directly to Solomonoff inductors. (Note the line: "When heavy optimization pressure on a system crystallizes it into an optimizer—especially one that’s powerful, or more powerful than the previous system, or misaligned with the previous system—we could term the crystallized optimizer a “daemon” of the previous system").

If you wanted to see how optimization daemons could show up in more practical algorithms, you'll probably end up at RFLO [LW · GW] (although there are other situations where optimization daemons can show up that don't quite fit the RFLO description).

↑ comment by nostalgebraist · 2024-08-29T19:35:12.554Z · LW(p) · GW(p)

I hope I'm not misinterpreting your point, and sorry if this comment comes across as frustrated at some points.

I'm not sure you're misinterpreting me per se, but there are some tacit premises in the background of my argument that you don't seem to hold. Rather than responding point-by-point, I'll just say some more stuff about where I'm coming from, and we'll see if it clarifies things.

You talk a lot about "idealized theories." These can of course be useful. But not all idealizations are created equal. You have to actually check that your idealization is good enough, in the right ways, for the sorts of things you're asking it to do.

In physics and applied mathematics, one often finds oneself considering a system that looks like

some "base" system that's well-understood and easy to analyze, plus
some additional nuance or dynamic that makes things much harder to analyze in general -- but which we can safely assume has much smaller effects that the rest of the system

We quantify the size of the additional nuance with a small parameter . If $ϵ$ is literally 0, that's just the base system, but we want to go a step further: we want to understand what happens when the nuance is present, just very small. So, we do something like formulating the solution as a power series in $ϵ$ , and truncating to first order. (This is perturbation theory, or more generally asymptotic analysis.)

This sort of approximation gets better and better as $ϵ$ gets closer to 0, because this magnifies the difference in size between the truncated terms (of size $O (ϵ^{2})$ and smaller) and the retained $O (ϵ)$ term. In some sense, we are studying the $ϵ \to 0$ limit.

But we're specifically interested in the behavior of the system given a nonzero, but arbitrarily small, value of $ϵ$ . We want an approximation that works well if $ϵ = 10^{- 6}$ , and even better if $ϵ = 10^{- 12}$ , and so on. We don't especially care about the literal $ϵ = 0$ case, except insofar as it sheds light on the very-small-but-nonzero behavior.

Now, sometimes the $ϵ \to 0$ limit of the "very-small-but-nonzero behavior" simply is the $ϵ = 0$ behavior, the base system. That is, what you get at very small $ϵ$ looks just like the base system, plus some little $O (ϵ)$ -sized wrinkle.

But sometimes – in so-called "singular perturbation" problems – it doesn't. Here the system has qualitatively different behavior from the base system given any nonzero $ϵ$ , no matter how small.

Typically what happens is that $ϵ$ ends up determining, not the magnitude of the deviation from the base system, but the "scale" of that deviation in space and/or time. So that in the limit, you get behavior with an $O (1)$ -sized difference from the base system's behavior that's constrained to a tiny region of space and/or oscillating very quickly.

Boundary layers in fluids are a classic example. Boundary layers are tiny pockets of distinct behavior, occurring only in small $ϵ$ -sized regions and not in most of the fluid. But they make a big difference just by being present at all. Knowing that there's a boundary layer around a human body, or an airplane wing, is crucial for predicting the thermal and mechanical interactions of those objects with the air around them, even though it takes up a tiny fraction of the available space, the rest of which is filled by the object and by non-boundary-layer air. (Meanwhile, the planetary boundary layer is tiny relative to the earth's full atmosphere, but, uh, we live in it.)

In the former case ("regular perturbation problems"), "idealized" reasoning about the $ϵ = 0$ case provides a reliable guide to the small-but-nonzero behavior. We want to go further, and understand the small-but-nonzero effects too, but we know they won't make a qualitative difference.

In the singular case, though, the $ϵ = 0$ "idealization" is qualitatively, catastrophically wrong. If you make an idealization that assumes away the possibility of boundary layers, then you're going to be wrong about what happens in a fluid – even about the big, qualitative, $O (1)$ stuff.

You need to know which kind of case you're in. You need to know whether you're assuming away irrelevant wrinkles, or whether you're assuming away the mechanisms that determine the high-level, qualitative, $O (1)$ stuff.

Back to the situation at hand.

In reality, TMs can only do computable stuff. But for simplicity, as an "idealization," we are considering a model where we pretend they have a UP oracle, and can exactly compute the UP.

We are justifying this by saying that the TMs will try to approximate the UP, and that this approximation will be very good. So, the approximation error is an $O (ϵ)$ -sized "additional nuance" in the problem.

Is this more like a regular perturbation problem, or more like a singular one? Singular, I think.

The $ϵ = 0$ case, where the TMs can exactly compute the UP, is a problem involving self-reference. We have a UP containing TMs, which in turn contain the very same UP.

Self-referential systems have a certain flavor, a certain "rigidity." (I realize this is vague, sorry, I hope it's clear enough what I mean.) If we have some possible behavior of the system $X$ , most ways of modifying $X$ (even slightly) will not produce behaviors which are themselves possible. The effect of the modification as it "goes out" along the self-referential path has to precisely match the "incoming" difference that would be needed to cause exactly this modification in the first place.

"Stable time loop"-style time travel in science fiction is an example of this; it's difficult to write, in part because of this "rigidity." (As I know from experience :)

On the other hand, the situation with a small-but-nonzero $ϵ$ is quite different.

With literal self-reference, one might say that "the loop only happens once": we have to precisely match up the outgoing effects ("UP inside a TM") with the incoming causes ("UP^[1] with TMs inside"), but then we're done. There's no need to dive inside the UP that happens within a TM and study it, because we're already studying it, it's the same UP we already have at the outermost layer.

But if the UP inside a given TM is merely an approximation, then what happens inside it is not the same as the UP we have at the outermost layer. It does not contain not the same TMs we already have.

It contains some approximate thing, which (and this is the key point) might need to contain an even more coarsely approximated UP inside of its approximated TMs. (Our original argument for why approximation is needed might hold, again and equally well, at this level.) And the next level inside might be even more coarsely approximated, and so on.

To determine the behavior of the outermost layer, we now need to understand the behavior of this whole series, because each layer determines what the next one up will observe.

Does the series tend toward some asymptote? Does it reach a fixed point and then stay there? What do these asymptotes, or fixed points, actually look like? Can we avoid ever reaching a level of approximation that's no longer $O (ϵ)$ but $O (1)$ , even as we descend through an $O (1 / ϵ)$ number of series iterations?

I have no idea! I have not thought about it much. My point is simply that you have to consider the fact that approximation is involved in order to even ask the right questions, about asymptotes and fixed point and such. Once we acknowledge that approximation is involved, we get this series structure and care about its limiting behavior; this qualitative structure is not present at all in the idealized case where we imagine the TMs have UP oracles.

I also want to say something about the size of the approximations involved.

Above, I casually described the approximation errors as $O (ϵ)$ , and imagined an $ϵ \to 0$ limit.

But in fact, we should not imagine that these errors can come as close to zero as we like. The UP is uncomptuable, and involves running every TM at once^[2]. Why would we imagine that a single TM can approximate this arbitrarily well?^[3]

Like the gap between the finite and the infinite, or between polynomial and exponential runtime, gap between the uncomptuable and the comptuable is not to be trifled with.

Finally: the thing we get when we equip all the TMs with UP oracles isn't the UP, it's something else. (As far as I know, anyway.) That is, the self-referential quality of this system is itself only approximate (and it is by no means clear that the approximation error is small – why would it be?). If we have the UP at the bottom, inside the TMs, then we don't have it at the outermost layer. Ignoring this distinction is, I guess, part of the "idealization," but it is not clear to me why we should feel safe doing so.

^{^}
The thing outside the TMs here can't really be the UP, but I'll ignore this now and bring it up again at the end.
^{^}
In particular, running them all at once and actually using the outputs, at some ("finite") time at which one needs the outputs for making a decision. It's possible to run every TM inside of a single TM, but only by incurring slowdowns that grow without bound across the series of TMs; this approach won't get you all the information you need, at once, at any finite time.
^{^}
There may be some result along these lines that I'm unaware of. I know there are results showing that the UP and SI perform well relative to the best computable prior/predictor, but that's not the same thing. Any given computable prior/predictor won't "know" whether or not it's the best out of the multitude, or how to correct itself if it isn't; that's the value added by UP / SI.

Replies from: jeremy-gillen

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-30T13:07:26.311Z · LW(p) · GW(p)

Great explanation, you have found the crux. I didn't know such problems were called singular perturbation problems.

If I thought that reasoning about the UP was definitely a singular perturbation problem in the relevant sense, then I would agree with you (that the malign prior argument doesn't really work). I think it's probably not, but I'm not extremely confident.

Your argument that it is a singular perturbation problem is that it involves self reference. I agree that self-reference is kinda special and can make it difficult to formally model things, but I will argue that it is often reasonable to just treat the inner approximations as exact.

The reason is: Problems that involve self reference are often easy to approximate by using more coarse-grained models as you move deeper.

One example as an intuition pump is an MCTS chess bot. In order to find a good move, it needs to think about its opponent thinking about itself, etc. We can't compute this (because its exponential, not because its non-computable), but if we approximate the deeper layers by pretending they move randomly (!), it works quite well. Having a better move distribution works even better.

Maybe you'll object that this example isn't precisely self-reference. But the same algorithm (usually) works for finding a nash equilibria on simultaneous move games, which do involve infinitely deep self reference.

And another more general way of doing essentially the same thing is using a reflective oracle. Which I believe can also be used to describe a UP that can contain infinitely deep self-reference (see the last paragraph of the conclusion).^[1] I think the fact that Paul worked on this suggests that he did see the potential issues with self-reference and wanted better ways to reason formally about such systems.

To be clear, I don't think any of these examples tells us that the problem is definitely a regular perturbation problem. But I think these examples do suggest that assuming that it is regular is a very reasonable place to start, and probably tells us a lot about similar, more realistic, systems.

On the gap between the computable and uncomputable: It's not so bad to trifle a little. Diagonalization arguments can often be avoided with small changes to the setup, and a few of Paul's papers are about doing exactly this.

And the same argument works for a computable prior. E.g. we could make a prior over a finite set of total turing machines, such that it still contained universes with clever agents.

Why would we imagine that a single TM can approximate this arbitrarily well?

If I remember correctly, a single TM definitely can't approximate it arbitrarily well. But my argument doesn't depend on this.

^{^}
Don't trust me on this though, my understanding of reflective oracles is very limited.

Replies from: LGS

↑ comment by LGS · 2024-08-30T20:55:26.631Z · LW(p) · GW(p)

Thanks for the link to reflective oracles!

On the gap between the computable and uncomputable: It's not so bad to trifle a little. Diagonalization arguments can often be avoided with small changes to the setup, and a few of Paul's papers are about doing exactly this.

I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. This is what vexed logicians in the early 20th century: no matter how you change your formal system, you won't be able to avoid Godel's incompleteness theorems.

There is a trick that reliably gets you out of such paradoxes, however: switch to probabilistic mixtures. This is easily seen in a game setting: in rock-paper-scissors, there is no deterministic Nash equilibrium. Switch to mixed strategies, however, and suddenly there is always a Nash equilibrium.

This is the trick that Paul is using: he is switching from deterministic Turing machines to randomized ones. That's fine as far as it goes, but it has some weird side effects. One of them is that if a civilization is trying to predict the universal prior that is simulating itself, and tries to send a message, then it is likely that with "reflexive oracles" in place, the only message it can send is random noise. That is, Paul shows reflexive oracles exist in the same way that Nash equilibria exist; but there is no control over what the reflexive oracle actually is, and in paradoxical situations (like rock-paper-scissors) the Nash equilibrium is the boring "mix everything together uniformly".

The underlying issue is that a universe that can predict the universal prior, which in turn simulates the universe itself, can encounter a grandfather paradox. It can see its own future by looking at the simulation, and then it can do the opposite. The grandfather paradox is where the universe decides to kill the grandfather of a child that the simulation predicts.

Paul solves this by only letting it see its own future using a "reflexive oracle" which essentially finds a fixed point (which is a probability distribution). The fixed point of a grandfather paradox is something like "half the time the simulation shows the grandchild alive, causing the real universe to kill the grandfather; the other half the time, the simulation shows the grandfather dead and the grandchild not existing". Such a fixed point exists even when the universe tries to do the opposite of the prediction.

The thing is, this fixed point is boring! Repeat this enough times, and it eventually just says "well my prediction about your future is random noise that doesn't have to actually come true in your own future". I suspect that if you tried to send a message through the universal prior in this setting, the message would consist of essentially uniformly random bits. This would depend on the details of the setup, I guess.

Replies from: jeremy-gillen, sharmake-farah

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-30T21:08:40.554Z · LW(p) · GW(p)

I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. ...
There is a trick that reliably gets you out of such paradoxes, however: switch to probabilistic mixtures.

Fair enough, the probabilistic mixtures thing was what I was thinking of as a change of setup, but reasonable to not consider it such.

the message would consist of essentially uniformly random bits

I don't see how this is implied. If a fact is consistent across levels, and determined in a non-paradoxical way, can't this become a natural fixed point that can be "transmitted" across levels? And isn't this kind of knowledge all that is required for the malign prior argument to work?

Replies from: LGS

↑ comment by LGS · 2024-08-30T21:21:24.385Z · LW(p) · GW(p)

The problem is that the act of leaving the message depends on the output of the oracle (otherwise you wouldn't need the oracle at all, but you also would not know how to leave a message). If the behavior of the machine depends on the oracle's actions, then we have to be careful with what the fixed point will be.

For example, if we try to fight the oracle and do the opposite, we get the "noise" situation from the grandfather paradox.

But if we try to cooperate with the oracle and do what it predicts, then there are many different fixed points and no telling which the oracle would choose (this is not specified in the setting).

It would be great to see a formal model of the situation. I think any model in which such message transmission would work is likely to require some heroic assumptions which don't correspond much to real life.

Replies from: jeremy-gillen

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-30T23:00:39.842Z · LW(p) · GW(p)

If the only transmissible message is essentially uniformly random bits, then of what value is the oracle?

I claim the message can contain lots of information. E.g. if there are 2^100 potential actions, but only 2 fixed points, then 99 bits have been transmitted (relative to uniform).

The rock-paper-scissors example is relatively special, in that the oracle can't narrow down the space of actions at all.

The UP situation looks to me to be more like the first situation than the second.

Replies from: LGS

↑ comment by LGS · 2024-08-31T01:48:39.070Z · LW(p) · GW(p)

It would help to have a more formal model, but as far as I can tell the oracle can only narrow down its predictions of the future to the extent that those predictions are independent of the oracle's output. That is to say, if the people in the universe ignore what the oracle says, then the oracle can give an informative prediction.

This would seem to exactly rule out any type of signal which depends on the oracle's output, which is precisely the types of signals that nostalgebraist was concerned about.

Replies from: jeremy-gillen

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-31T18:08:43.784Z · LW(p) · GW(p)

That can't be right in general. Normal nash equilibria can narrow down predictions of actions. E.g. competition game. This is despite each player's decision being dependent on the other player's action.

Replies from: LGS

↑ comment by LGS · 2024-08-31T19:17:35.412Z · LW(p) · GW(p)

That's fair, yeah

We need a proper mathematical model to study this further. I expect it to be difficult to set up because the situation is so unrealistic/impossible as to be hard to model. But if you do have a model in mind I'll take a look

↑ comment by Noosphere89 (sharmake-farah) · 2025-01-26T15:38:12.944Z · LW(p) · GW(p)

One caveat to this quote below is that Godel's first incompleteness theorem relies on the assumption of the formal system being recursively enumerable, and if we drop this requirement, then we can get a consistent and complete description of say, first order arithmetic.

More here:

https://en.wikipedia.org/wiki/Gödel's_incompleteness_theorems#Effective_axiomatization

I strongly disagree with this: diagonalization arguments often cannot be avoided at all, not matter how you change the setup. This is what vexed logicians in the early 20th century: no matter how you change your formal system, you won't be able to avoid Godel's incompleteness theorems.

↑ comment by faul_sname · 2024-08-29T23:05:58.909Z · LW(p) · GW(p)

Suppose that there is some search process that is looking through a collection of things, and you are an element of the collection. Then, in general, it's difficult to imagine how you (just you) can reason about the whole search in such a way as to "steer it around" in your preferred direction.

I think this is easy to imagine. I'm an expert who is among 10 experts recruited to advise some government on making a decision. I can guess some of the signals that the government will use to choose who among us to trust most. I can guess some of the relative weaknesses of fellow experts. I can try to use this to manipulate the government into taking my opinion more seriously. I don't need to create a clone government and hire 10 expert clones in order to do this.

The other 9 experts can also make guesses about which the signals the government will use and what the relative weaknesses of their fellow experts are, and the other 9 experts can also act on those guesses. So in order to reason about what the outcome of the search will be, you have to reason about both yourself and also about the other 9 experts, unless you somehow know that you are much better than the other 9 experts at steering the outcome of the search as a whole. But in that case only you can steer the search . The other 9 experts would fail if they tried to use the same strategy you're using.

Replies from: jeremy-gillen, Thane Ruthenis

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-30T13:38:55.305Z · LW(p) · GW(p)

unless you somehow know that you are much better than the other 9 experts at steering the outcome of the search as a whole. But in that case only you can steer the search . The other 9 experts would fail if they tried to use the same strategy you're using.

Okay if you accept this modified scenario where one expert knows they are much better than the other 9, then this is sufficient as a scenario that nostalgebraist claimed was difficult to imagine. So that's enough to prove the point I was trying to make.

But the original example works too. It's just a simultaneous move game. It'll be won by whichever player is best at playing the game. It's clearly possible to play the game well, despite the self-reference involved with thinking about how to play better.

↑ comment by Thane Ruthenis · 2024-08-30T03:04:22.820Z · LW(p) · GW(p)

Consider a different problem: a group of people are posed some technical or mathematical challenge. Each individual person is given a different subset of the information about the problem, and each person knows what type of information every other participant gets.

Trivial example: you're supposed to find the volume of a pyramid, you (participant 1) are given its height and the apex angles for two triangular faces, participant 2 is given the radius of the sphere on which all of the pyramid's vertices lie and all angles of the triangular faces, participant 3 is given the areas of all faces, et cetera.

Given this setup, if you're skilled at geometry, you can likely figure out which of the participants can solve the problem exactly, which can only put upper and lower bounds on the volume, and what those upper/lower bounds are for each participant. You don't need to model your competitors' mental states: all you need to do is reason about the object-level domain, plus take into account what information they have. No infinite recursion happens, because you can abstract out the particulars of how others' minds work.

This works assuming that everyone involved is perfectly skilled at geometry: that you don't need to predict what mistakes the others would make (which would depend on the messy details of their minds).

Speculatively, this would apply to deception as well. You don't necessarily need to model others' brain states directly. If they're all perfectly skilled at deception, you can predict what deceptions they'd try to use and how effective they'd be based on purely objective information: the sociopolitical landscape, their individual skills and comparative advantages, et cetera. You can "skip to the end": predict everyone playing their best-move-in-circumstances-where-everyone-else-plays-their-best-move-too.

Objectively, the distribution of comparative advantages is likely very different, so even if everyone makes their best move, some would hopelessly lose. (E. g., imagine if one of the experts is a close friend of a government official and the other is a controversial figure who'd been previously judged guilty of fraud.)

Speculatively, similar works for the MUP stuff. You don't actually need to model the individual details of other universes. You can just use abstract reasoning to figure out what kinds of universes are dense across Tegmark IV, figure out what (distributions over) entities inhabit them, figure out (distributions over) how they'd reason, and what (distributions over) simulations they'd run, and to what (distribution over the) output this process converges given the objective material constraints involved. Then take actions that skew said distribution-over-the-output in a way you want.

Again, this is speculative: I don't know that there are any math proofs that this is possible. But it seems plausible enough that something-like-this might work, and my understanding is that the MUP argument (and other kinds of acausal-trade setups) indeed uses this as a foundational assumption. (I. e., it assumes that the problem is isomorphic (in a relevant sense) to my pyramid challenge above.)

(IIRC, the Acausal Normalcy [LW · GW] post outlines some of the relevant insights, though I think it doesn't precisely focus on the topic at hand.)

answer by evhub · 2024-08-28T01:53:27.223Z · LW(p) · GW(p)

Fwiw I think this is basically correct, though I would phrase the critique as "the hypothetical is confused" rather than "the argument is wrong." My sense is that arguments for the malignity of uncomputable priors just really depend on the structure of the hypothetical: how is it that you actually have access to this uncomputable prior, if it's an approximation what sort of approximation is it, and to what extent will others care about influencing your decisions in situations where you're using it?

answer by Thane Ruthenis · 2024-08-29T09:59:17.193Z · LW(p) · GW(p)

Here's my understanding of the whole thing:

"Malign universal prior" arguments basically assume a setup in which we have an agent with a big dumb hard-coded module whose goal is to find this agent's location in Tegmark IV. (Or maybe perform some other important task that requires reasoning about Tegmark IV, but let's run with that as the example.)
The agent might be generally intelligent, the Solomonoff-induction-approximating module might be sophisticated in all kind of ways, but it's "dumb" or "naive" in an important sense: it's just trying to generate the best-guess distribution over the universes the agent is in, no matter their contents, then blindly acts on it.
Importantly, this process doesn't necessarily involve actually running any low-level simulations of other universes. Generally intelligent/abstract reasoning, some steps of which might literally replicate the reasoning steps of Paul's post, would also fit the bill.
The MUP argument is that this is sufficient for alien consequentialists to take advantage. The agent is asking, "where am I most likely to be?", and the alien consequentialists are skewing the distribution such that the most likely correct answer is "simulation-captured by acausal aliens" or whatever.
- (And then the malign output is producing "predictions" about the future of the agent's universe like "the false vacuum collapse is going to spontaneously trigger in the next five minutes unless you perform this specific sequence of actions that happen to rewrite your utility function in such-and-such ways", and our big dumb agent is gormlessly buying this, and its "real" non-simulation-captured instance rewrites itself accordingly.)
Speed prior vs. complexity prior: a common guess regarding the structure of Tegmark IV is that this is how it works, it penalizes K-complexity but doesn't care how much memory/compute it needs to allocate to run a universe. If that is true, then any sufficiently good approximation of Solomonoff induction – any sufficiently good procedure for getting an answer to "where am I most likely to be?", including abstract reasoning – would take this principle into account, and bump up the probability of being in low-complexity universes.

This all seems to check out to me. Admittedly I didn't actually confirm this with any proponents of the argument, though.

(Probably also worth stating that I don't think the MUP is in any way relevant to real life. AI progress doesn't seem to be on the track where it features AGIs that use big dumb "where am I?" modules. E. g., if an AGI is born of anything like an RL-trained LLM, seems unlikely that its "where am I?" reasoning would be naive in the relevant sense. It'd be able to "manually" filter out universes with malign consequentialists, given good decision theory. You know, like we can.

The MUP specifically applies to highly abstract agent-foundations designs where we hand-code each piece, that currently don't seem practically tractable at all.)

↑ comment by nostalgebraist · 2024-08-30T23:22:50.008Z · LW(p) · GW(p)

Thanks.

I admit I'm not closely familiar with Tegmark's views, but I know he has considered two distinct things that might be called "the Level IV multiverse":

a "mathematical universe" in which all mathematical constructs exist
a more restrictive "computable universe" in which only computable things exist

(I'm getting this from his paper here.)

In particular, Tegmark speculates that the computable universe is "distributed" following the UP (as you say in your final bullet point). This would mean e.g. that one shouldn't be too surprised to find oneself living in a TM of any given K-complexity, despite the fact that "almost all" TMs have higher complexity (in the same sense that "almost all" natural numbers are greater than any given number ).

When you say "Tegmark IV," I assume you mean the computable version -- right? That's the thing which Tegmark says might be distributed like the UP. If we're in some uncomputable world, the UP won't help us "locate" ourselves, but if the world has to be computable then we're good^[1].

With that out of the way, here is why this argument feels off to me.

First, Tegmark IV is an ontological idea, about what exists at the "outermost layer of reality." There's no one outside of Tegmark IV who's using it to predict something else; indeed, there's no one outside of it at all; it is everything that exists, full stop.

"Okay," you might say, "but wait -- we are all somewhere inside Tegmark IV, and trying to figure out just which part of it we're in. That is, we are all attempting to answer the question, 'what happens when you update the UP on my own observations?' So we are all effectively trying to 'make decisions on the basis of the UP,' and vulnerable to its weirdness, insofar as it is weird."

Sure. But in this picture, "we" (the UP-using dupes) and "the consequentialists" are on an even footing: we are both living in some TM or other, and trying to figure out which one.

In which case we have to ask: why would such entities ever come to such a destructive, bad-for-everyone (acausal) agreement?

Presumably the consequentalists don't want to be duped; they would prefer to be able to locate themselves in Tegmark IV, and make decisions accordingly, without facing such irritating complications.

But, by writing to "output channels"^[2] in the malign manner, the consequentalists are simply causing the multiverse to be the sort of place where those irritating complications happen to beings like them (beings in TMs trying to figure out which TM they're in) -- and what's more, they're expending time and scarce resources to "purchase" this undesirable state of affairs!

In order for malignity to be worth it, we need something to break the symmetry between "dupes" (UP users) and "con men" (consequentialists), separating the world into two classes, so that the would-be con men can plausibly reason, "I may act in a malign way without the consequences raining down directly on my head."

We have this sort of symmetry-breaker in the version of the argument that postulates, by fiat, a "UP-using dupe" somewhere, for some reason, and then proceeds to reason about the properties of the (potentially very different, not "UP-using"?) guys inside the TMs. A sort of struggle between conniving, computable mortals and overly-innocent, uncomputable angels. Here we might argue that things really will go wrong for the angels, that they will be the "dupes" of the mortals, who are not like them and who do not themselves get duped. (But I think this form of the argument has other problems, the ones I described in the OP.)

But if the reason we care about the UP is simply that we're all in TMs, trying to find our location within Tegmark IV, then we're all in this together. We can just notice that we'd all be better off if no one did the malign thing, and then no one will do it^[3].

In other words, in your picture (and Paul's), we are asked to imagine that the computable world abounds with malign, wised-up consequentialist con men, who've "read Paul's post" (i.e. re-derived similar arguments) and who appreciate the implications. But if so, then where are the marks? If we're not postulating some mysterious UP-using angel outside of the computable universe, then who is there to deceive? And if there's no one to deceive, why go to the trouble?

^{^}
I don't think this distinction actually matters for what's below, I just mention it to make sure I'm following you.
^{^}
I'm picturing a sort of acausal I'm-thinking-about-you-thinking-about me situation in which, although I might never actually read what's written on those channels (after all I am not "outside" Tegmark IV looking in), nonetheless I can reason about what someone might write there, and thus it matters what is actually written there. I'll only conclude "yeah that's what I'd actually see if I looked" if the consequentialists convince me they'd really pull the trigger, even if they're only pulling the trigger for the sake of convincing me, and we both know I'll never really look.
^{^}
Note that, in the version of this picture that involves abstract generalized reasoning rather than simulation of specific worlds, defection is fruitless: if you are trying to manipulate someone who is just thinking about whether beings will do X as a general rule, you don't get anything out of raising your hand and saying "well, in reality, I will!" No one will notice; they aren't actually looking at you, ever, just at the general trend. And of course "they" know all this, which raises "their" confidence that no one will raise their hand; and "you" know that "they" know, which makes "you" less interested in raising that same hand; and so forth.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-08-31T04:07:09.731Z · LW(p) · GW(p)

When you say "Tegmark IV," I assume you mean the computable version -- right?

Yep.

We have this sort of symmetry-breaker in the version of the argument that postulates, by fiat, a "UP-using dupe" somewhere, for some reason

Correction: on my model, the dupe is also using an approximation of the UP, not the UP itself. I. e., it doesn't need to be uncomputable. The difference between it and the con men is just the naivety of the design. It generates guesses regarding what universes it's most likely to be in (potentially using abstract reasoning), but then doesn't "filter" these universes; doesn't actually "look inside" and determine if it's a good idea to use a specific universe as a model. It doesn't consider the possibility of being manipulated through it; doesn't consider the possibility that it contains daemons.

I. e.: the real difference is that the "dupe" is using causal decision theory, not functional decision theory.

We can just notice that we'd all be better off if no one did the malign thing, and then no one will do it

I think that's plausible: that there aren't actually that many "UP-using dupes" in existence, so the con men don't actually care to stage these acausal attacks.

But: if that is the case, it's because the entities designing/becoming powerful agents considered the possibility of con men manipulating the UP, and so made sure that they're not just naively using the unfiltered (approximation of the) UP.

That is: yes, it seems likely that the equilibrium state of affairs here is "nobody is actually messing with the UP". But it's because everyone knows the UP could be messed with in this manner, so no-one is using it (nor its computationally tractable approximations).

It might also not be the case, however. Maybe there are large swathes of reality populated by powerful yet naive agents, such that whatever process constructs them (some alien evolution analogue?), it doesn't teach them good decision theory at all. So when they figure out Tegmark IV and the possibility of acausal attacks/being simulation-captured, they give in to whatever "demands" are posed them. (I. e., there might be entire "worlds of dupes", somewhere out there among the mathematically possible.)

That said, the "dupe" label actually does apply to a lot of humans, I think. I expect that a lot of people, if they ended up believing that they're in a simulation and that the simulators would do bad things to them unless they do X, would do X. The acausal con men would only care to actually do it, however, if a given person is (1) in the position where they could do something with large-scale consequences, (2) smart enough to consider the possibility of simulation-capture, (3) not smart enough to ignore blackmail.

Replies from: nostalgebraist

↑ comment by nostalgebraist · 2024-08-31T18:15:47.051Z · LW(p) · GW(p)

Cool, it sounds we basically agree!

But: if that is the case, it's because the entities designing/becoming powerful agents considered the possibility of con men manipulating the UP, and so made sure that they're not just naively using the unfiltered (approximation of the) UP.

I'm not sure of this. It seems at least possible that we could get an equilibrium where everyone does use the unfiltered UP (in some part of their reasoning process), trusting that no one will manipulate them because (a) manipulative behavior is costly and (b) no one has any reason to expect anyone else will reason differently from them, so if you choose to manipulate someone else you're effectively choosing that someone else will manipulate you.

Perhaps I'm misunderstanding you. I'm imagining something like choosing one's one decision procedure in TDT, where one ends up choosing a procedure that involves "the unfiltered UP" somewhere, and which doesn't do manipulation. (If your procedure involved manipulation, so would your copy's procedure, and you would get manipulated; you don't want this, so you don't manipulate, nor does your copy.) But you write

the real difference is that the "dupe" is using causal decision theory, not functional decision theory

whereas it seems to me that TDT/FDT-style reasoning is precisely what allows us to "naively" trust the UP, here, without having to do the hard work of "filtering." That is: this kind of reasoning tells us to behave so that the UP won't be malign; hence, the UP isn't malign; hence, we can "naively" trust it, as though it weren't malign (because it isn't).

More broadly, though -- we are now talking about something that I feel like I basically understand and basically agree with, and just arguing over the details, which is very much not the case with standard presentations of the malignity argument. So, thanks for that.

Replies from: Thane Ruthenis

↑ comment by Thane Ruthenis · 2024-08-31T19:17:40.063Z · LW(p) · GW(p)

I'm not sure of this. It seems at least possible that we could get an equilibrium where everyone does use the unfiltered UP (in some part of their reasoning process), trusting that no one will manipulate them because (a) manipulative behavior is costly and (b) no one has any reason to expect anyone else will reason differently from them, so if you choose to manipulate someone else you're effectively choosing that someone else will manipulate you.

Fair point! I agree.

↑ comment by Jeremy Gillen (jeremy-gillen) · 2024-08-30T13:52:40.672Z · LW(p) · GW(p)

Probably also worth stating that I don't think the MUP is in any way relevant to real life.

I think it's relevant because it illustrates an extreme variant of a very common problem, where "incorrectly specified" priors can cause unexpected behavior. It also illustrates the daemon problem, which I expect to be very relevant to real life.

A more realistic and straightforward example of the "incorrectly specified prior" problem: If the prior on an MCTS value head isn't strong enough, it can overfit to value local instrumental goals too highly. Now your overall search process will only consider strategies that involve lots of this instrumental goal. So you end up with an agent that looks like it terminally values e.g. money, even though the goal in the "goal slot" is exactly correct and doesn't include money.

answer by justinpombrio · 2024-08-28T15:01:46.180Z · LW(p) · GW(p)

Here's a simple argument that simulating universes based on Turing machine number can give manipulated results.

Say we lived in a universe much like this one, except that:

The universe is deterministic
It's simulated by a very short Turing machine
It has a center, and
That center is actually nearby! We can send a rocket to it.

So we send a rocket to the center of the universe and leave a plaque saying "the answer to all your questions is Spongebob". Now any aliens in other universes that simulate our universe and ask "what's in the center of that universe at time step 10^1000?" will see the plaque, search elsewhere in our universe for the reference, and watch Spongebob. We've managed to get aliens outside our universe to watch Spongebob.

I feel like it would be helpful to speak precisely about the universal prior. Here's my understanding.

It's a partial probability distribution over bit strings. It gives a non-zero probability to every bit string, but these probabilities add up to strictly less than 1. It's defined as follows:

That is, describe Turing machines by a binary code, and assign each one a probability based on the length of its code, such that those probabilities add up to exactly 1. Then magically run all Turing machines "to completion". For those that halt leaving a bitstring on their tape, attribute the probability of that Turing machine to that bitstring. Now we have a probability distribution over bitstrings, though the probabilities add up to less than one because not all of the Turing machines halted.

You cannot compute this probability distribution, but you can compute lower bounds on the probabilities of its bitstrings. (The Nth lower bound is the probability distribution you get from running the first N TMs for N steps.)

Call a TM that halts poisoned if its output is determined as follows:

The TM simulates a complex universe full of intelligent life, then selects a tiny portion of that universe to output, erasing the rest.
That intelligent life realizes this might happen, and writes messages in many places that could plausibly be selected.
It works, and the TM's output is determined by what the intelligent life it simulated chose to leave behind.

If we approximate the universal prior, the probability contribution of poisoned TMs will be precisely zero, because we don't have nearly enough compute to simulate a poisoned TM until it halts. However, if there's an outer universe with dramatically more compute available, and it's approximating the universal prior using enough computational power to actually run the poisoned TMs, they'll effect the probability distribution of the bitstrings, making bitstrings with the messages they choose to leave behind more likely.

So I think Paul's right, actually (not what I expected when I started writing this). If you approximate the UP well enough, the distribution you see will have been manipulated.

↑ comment by justinpombrio · 2024-08-28T20:06:54.932Z · LW(p) · GW(p)

Very curious what part of this people think is wrong.

Replies from: hairyfigment

↑ comment by hairyfigment · 2024-08-29T01:31:37.611Z · LW(p) · GW(p)

I don't see how any of it can be right. Getting one algorithm to output Spongebob wouldn't cause the SI to watch Spongebob -even a less silly claim in that vein would still be false. The Platonic agent would know the plan wouldn't work, and thus wouldn't do it.

Since no individual Platonic agent could do anything meaningful alone, and they plainly can't communicate with each other, they can only coordinate by means of reflective decision theory. That's fine, we'll just assume that's the obvious way for intelligent minds to behave. But then the SI works the same way, and knows the Platonic agents will think that way, and per RDT it refuses to change its behavior based on attempts to game the system. So none of this ever happens in the first place.

(This is without even considering the serious problems with assuming Platonic agents would share a goal to coordinate on. I don't think I buy it. You can't evolve a desire to come into existence, nor does an arbitrary goal seem to require it. Let me assure you, there can exist intelligent minds which don't want worlds like ours to exist.)

answer by Cole Wyeth · 2024-08-31T14:54:30.261Z · LW(p) · GW(p)

The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.” Of course in practice different universes would use more or less accurate versions with more or less compromises for efficiency - I think your basic argument holds up insofar as there isn’t a clear mechanism for precise manipulation through the universal distribution. It’s conceivable that some high level actions such as “make it very clear that we prefer this set of moral standards in case anyone with cosmopolitan values simulates are universe” would be preferred based on the malign-universal prior argument.

↑ comment by nostalgebraist · 2024-08-31T17:46:16.904Z · LW(p) · GW(p)

The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.”

Thanks for bringing this up.

However, I'm skeptical that lower semi-computability really gets us much. While there is a TM that converges to the UP, we have no (computable) way of knowing how close the approximation is at any given time. That is, the UP is not "estimable" as defined in Hutter 2005.

So if someone hands you a TM and tells you it lower semi-computes the UP, this TM is not going to be of much use to you: its output at any finite time could be arbitrarily bad for whatever calculation you're trying to do, and you'll have no way of knowing.

In other words, while you may know the limiting behavior, this information tells you nothing about what you should expect to observe, because you can never know whether you're "in the limit yet," so to speak. (If this language seems confusing, feel free to ignore the previous sentence -- for some reason it clarifies things for me.)

(It's possible that this non-"estimability" property is less bad than I think for some reason, IDK.)

I'm less familiar with the probabilistic Turing machine result, but my hunch is that one would face a similar obstruction with that TM as well.

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2024-08-31T18:02:12.498Z · LW(p) · GW(p)

Yes, I mostly agree with everything you said - the limitation with the probabilistic Turing machine approach (it's usually equivalently described as the a priori probability and described in terms of monotone TM's) is that you can get samples, but you can't use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It's possible that these approximations would be good enough in practice though.

answer by Noosphere89 · 2025-01-26T15:23:30.821Z · LW(p) · GW(p)

I think the malign universal prior doesn't work, but for different reasons than people think.

Link below:

https://www.lesswrong.com/posts/tDkYdyJSqe3DddtK4/alexander-gietelink-oldenziel-s-shortform#w2M3rjm6NdNY9WDez [LW(p) · GW(p)]

answer by philip_b · 2024-08-28T09:40:12.967Z · LW(p) · GW(p)

Instead of inspecting all programs in the UP, just inspect all programs with length less than n. As n becomes larger and larger, this covers more and more of the total probability mass in the up and the total probability mass covered this way approaches 1. What to do about the non-halting programs? Well, just run all the programs for m steps, I guess. I think this is the approximation of UP that is implied.

answer by Signer · 2024-08-28T06:01:07.310Z · LW(p) · GW(p)

In order to be “UP-like” in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.

Why? The procedure just need to do some reasoning, constrained by UP and outer TM. And then UP-beings can just simulate this fast reasoning without problems of self-simulation.

Yes, AI that practically uses UP may fail to predict whether UP-beings simulate it in the center of their universe or on the boundary. But the point is that the more correct AI is in its reasoning, the more control UP-beings have.

Or you can not create AI that thinks about UP. But that's denying the assumption.

5 comments

Comments sorted by top scores.

comment by Measure · 2024-08-28T13:49:49.995Z · LW(p) · GW(p)

We (i.e. "reasoning beings in computable universes") can influence the UP, but we can't reason about it well enough to use that influence. Meanwhile, we can reason about things that are more like the speed prior -- but we can't influence them.

Did one of these can/can't pairs get flipped?

comment by Jonas Hallgren · 2024-08-28T12:19:08.109Z · LW(p) · GW(p)

I have actually never properly understood the universal prior argument in the first place and just seeing this post made me able to understand parts of it now so thank you for writing it!

comment by Charlie Steiner · 2024-08-28T02:25:46.608Z · LW(p) · GW(p)

I'll admit, my mental image for "our universe + hypercomputation" is a sort of webnovel premise, where we're living in a normal computable universe until one day by fiat an app poofs into existence on your phone that lets you enter a binary string or file and instantaneously get the next bit with minimum description length in binary lambda calculus. Aside from the initial poofing and every usage of the app, the universe continues by its normal rules.

But there's probably simpler universes (by some handwavy standard) out there that allow enough hypercomputation that they can have agents querying minimum description length oracles, but not so much that agents querying MDL oracles can no longer be assigned short codes.

comment by Argon Gruber (argon-gruber) · 2024-08-29T19:55:16.800Z · LW(p) · GW(p)

What is a "universal prior"?

Replies from: Mo Nastri

↑ comment by Mo Putera (Mo Nastri) · 2024-08-31T13:41:49.737Z · LW(p) · GW(p)

See https://arbital.com/p/universal_prior/

Am I confused about the "malign universal prior" argument?

Contents

UP-using "universes" and simulatable "universes"

Some thoughts that one might have

On slowdown

On efficiency

The common thread

Answers

5 comments