The Absolute Self-Selection Assumption

paulfchristiano

The Absolute Self-Selection Assumption

post by paulfchristiano · 2011-04-11T15:25:56.262Z · LW · GW · Legacy · 44 comments

  Why Anthropic Reasoning?
  The Absolute Self-Selection Assumption
  Recovering Intuitive Anthropics
  Problem #1: Infinite Cosmologies
  Problem #2: Splitting Simulations
  Problem #3: The Born Probabilities
None
45 comments

There are many confused discussions of anthropic reasoning, both on LW and in surprisingly mainstream literature. In this article I will discuss UDASSA, a framework for anthropic reasoning due to Wei Dai. This framework has serious shortcomings, but at present it is the only one I know which produces reasonable answers to reasonable questions; at the moment it is the only framework which I would feel comfortable using to make a real decision.

I will discuss 3 problems:

1. In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains). How do you assign a measure to the copies of yourself when the uniform distribution is unavailable? Do you rule out spatially or temporally infinite universes for this reason?

2. Naive anthropics ignore the substrate on which a simulation is running and count how many instances of a simulated experience exist (or how many distinct versions of that experience exist). These beliefs are inconsistent with basic intuitions about conscious experience, so we have to abandon something intuitive.

3. The Born probabilities seem mysterious. They can be explained (as well as any law of physics can be explained) by UDASSA.

Why Anthropic Reasoning?

When I am trying to act in my own self-interest, I do not know with certainty the consequences of any particular decision. I compare probability distributions over outcomes: an action may lead to one outcome with probability 1/2, and a different outcome with probability 1/2. My brain has preferences between probability distributions built into it.

My brain is not built with the machinery to decide between different universes each of which contains many simulations I care about. My brain can't even really grasp the notion of different copies of me, except by first converting to the language of probability distributions. If I am facing the prospect of being copied, the only way I can grapple with it is by reasoning "I have a 50% chance of remaining me, and a 50% chance of becoming my copy." After thinking in this way, I can hope to intelligently trade-off one copy's preferences against the other's using the same machinery which allows me to make decisions with uncertain outcomes.

In order to perform this reasoning in general, I need a better framework for anthropic reasoning. What I want is a probability distribution over all possible experiences (or "observer-moments"), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.

I am going to leave many questions unresolved. I don't understand continuity of experience or identity, so I am simply not going to try to be selfish (I don't know how). I don't understand what constitutes conscious experience, so I am not going to try and explain it. I have to rely on a complexity prior, which involves an unacceptable arbitrary choice of a notion of complexity.

The Absolute Self-Selection Assumption

A thinker using Solomonoff induction searches for the simplest explanation for its own experiences. It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.

As humans using Solomonoff induction, we go on to argue that this external lawful universe is real, and that our conscious experience is a consequence of the existence of certain substructure in that universe. The absolute self-selection assumption discards this additional step. Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.

By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.

This requires specifying a notion of complexity. I will choose a universal computable distribution over strings for now, to mimic conventional Solomonoff induction as closely as possible (and because I know nothing better). The resulting theory is called UDASSA, for Universal Distribution + ASSA.

Recovering Intuitive Anthropics

Suppose I create a perfect copy of myself. Intuitively, I would like to weight the two copies equally. Similarly, my anthropic notion of "probability of an experience" should match up with my intuitive notion of probability. Fortunately, UDASSA recovers intuitive anthropics in intuitive situations.

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe. If there are two copies of me in the universe, then the experience of each can be described in the same way: (U, x1) and (U, x2) are descriptions of approximately equal complexity, so I weight the experience of each copy equally. The total experience of my copies is weighted twice as much as the total experience of an uncopied individual.

Part of x is a description of how to navigate the randomness of the universe. For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.

Problem #1: Infinite Cosmologies

Modern physics is consistent with infinite universes. An infinite universe contains infinitely many observers (infinitely many of which share all of your experiences so far), and it is no longer sensible to talk about the "uniform distribution" over all of them. You could imagine taking a limit over larger and larger volumes, but there is no particular reason to suspect such a limit would converge in a meaningful sense. One solution that has been suggested is to choose an arbitrary but very large volume of spacetime, and to use a uniform distribution over observers within it. Another solution is to conclude that infinite universes can't exist. Both of these explanations are unsatisfactory.

UDASSA provides a different solution. The probability of an experience depends exponentially on the complexity of specifying it. Just existing in an infinite universe with a short description does not guarantee that you yourself have a short description; you need to specify a position within that infinite universe. For example, if your experiences occur 34908172349823478132239471230912349726323948123123991230 steps after some naturally specified time 0, then the (somewhat lengthy) description of that time is necessary to describe your experiences. Thus the total measure of all observer-moments within a universe is finite.

Problem #2: Splitting Simulations

Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn't.

In the first case, we have to accept that some computer simulations count for more, even if they are running the same simulation (or we have to de-duplicate the set of all experiences, which leads to serious problems with Boltzmann brains). In this case, we are faced with the problem of comparing different substrates, and it seems impossible not to make arbitrary choices.

In the second case, we have to accept that the operation of dividing the 2 atom thick computer has moral value, which is even worse. Where exactly does the transition occur? What if each layer of the 2 atom thick computer can run independently before splitting? Is physical contact really significant? What about computers that aren't physically coherent? What two 1 atom thick computers periodically synchronize themselves and self-destruct if they aren't synchronized: does this synchronization effectively destroy one of the copies? I know of no way to accept this possibility without extremely counter-intuitive consequences.

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn't change.

Problem #3: The Born Probabilities

A quantum mechanical state can be described as a linear combination of "classical" configurations. For some reason we appear to experience ourselves as being in one of these classical configurations with probability proportional the coefficient of that configuration squared. These probabilities are called the Born probabilities, and are sometimes described either as a serious problem for MWI or as an unresolved mystery of the universe.

What happens if we apply UDASSA to a quantum universe? For one, the existence of an observer within the universe doesn't say anything about conscious experience. We need to specify an algorithm for extracting a description of that observer from a description of the universe.

Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.

Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.

Using either A or B, we can describe a single experience by specifying a random seed, and picking out that experience within the classical configuration output by A or B using that random seed. If this is the shortest explanation of an experience, the probability of an experience is proportional to the number of random seeds which produce classical configurations containing it.

The universe as we know it is typical for an output of A but completely improbable as an output of B. For example, the observed behavior of stars is consistent with almost all observations weighted according to algorithm A, but with almost no observations weighted according to algorithm B. Algorithm A constitutes an immensely better description of our experiences, in the same sense that quantum mechanics constitutes an immensely better description of our experiences than classical physics.

You could also imagine an algorithm C, which uses the same selection as algorithm B to point to the Everett branch containing a physicist about to do an experiment, but then uses algorithm A to describe the experiences of the physicist after doing that experiment. This is a horribly complex way to specify an experience, however, for exactly the same reason that a Solomonoff inductor places very low probability on the laws of physics suddenly changing for just this one experiment.

Of course this leaves open the question of "why the Born probabilities and not some other rule?" Algorithm B is a valid way of specifying observers, though they would look exactly as foreign as observes with different rules of physics (Wei Dai has suggested that the structures specified by algorithm B are not even self-aware as justification for the Born rule). The fact that we are described by algorithm A rather than B is no more or less mysterious than the fact that the laws of physics are like so instead of some other way.

In the same way that we can retroactively justify our laws of physics by appealing to their elegance and simplicity (in a sense we don't yet really understand) I suspect that we can justify selection according to algorithm A rather than algorithm B. In an infinite universe, algorithm B doesn't even work (because the sum of the inner products of the universal wavefunction with the classical configurations is infinite) and even in a finite universe algorithm B necessarily involves the additional step of normalizing the probability distribution or else producing nonsense. Moreover, algorithm A is a nicer mathematical object than algorithm B when the evolution of the wavefunction is unitary, and so the same considerations that suggest elegant laws of physics suggest algorithm A over B (or some other alternative).

Note that this is not the core of my explanation of the Born probabilities; in UDASSA, choosing a selection procedure is just as important as describing the universe, and so some explicit sort of observer selection is a necessary part of the laws of physics. We predict the Born rule to hold in the future because it has held in the past, just like we expect the laws of physics to hold in the future because they have held in the past.

In summary, if you use Solomonoff induction to predict what you will see next based on everything you have seen so far, your predictions about the future will be consistent with the Born probabilities. You only get in trouble when you use Solomonoff induction to predict what the universe contains, and then get bogged down in the question "Given that the universe contains all of these observers, which one should I expect to be me?"

44 comments

Comments sorted by top scores.

comment by Wei Dai (Wei_Dai) · 2011-04-12T06:34:43.337Z · LW(p) · GW(p)

The post mentioned some problems/issues with this approach that remain to be resolved. Here are some additional ones.

My brain has preferences between probability distributions built into it.

Your brain is built to intuitively grapple with distribution over future experiences, like your example "I have a 50% chance of remaining me, and a 50% chance of becoming my copy." Unfortunately UDASSA doesn't give you that. It only gives you a distribution over observer-moments in an absolute sense (hence the "A" in ASSA), and there is no good way to convert such a distribution into a distribution over future experiences. (Suppose you're copied at time 0, then the "copy" is copied again at time 1. Under UDASSA this is entirely unproblematic, but it doesn't tell you whether you should anticipate being the "original" at time 2 with probability 1/2 or 1/3.) The "pure" UDASSA position would be that there is no such thing as "remaining me" or "becoming my copy", and you just have to make your choices using the distribution over observer-moments without "linking" the observer-moments together in any way.

What I want is a probability distribution over all possible experiences (or "observer-moments"), so that I can use my existing preferences to make intelligent decisions in a universe with more than one observer I care about.

Do you consider this probability distribution an objective measure of how much each observer-moment exists? Or is it just a (possibly approximate) measure of how much you care about each observer-moment? I'm still going back and forth on these two positions myself. See What Are Probabilities, Anyway? where I go into this distinction a bit more. (The former is what I usually mean when I say UDASSA. Perhaps we could call the latter UDT-UMC for Updateless Decision Theory w/ Universal Measure of Care, unless someone has a better name for it. :)

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify.

Does this not seem counterintuitive to you? Suppose you find out you are living in a simulation on a 2 atom thick computer, and the simulation-keeper gives you a choice of (a) moving to a 1 atom thick computer, or (b) flipping a coin and shutting down the simulation or not based on the coin flip, would you really be indifferent? Under UDT-UMC, we can say that how much we care about an observer-moment is related to its "probability" under UD, but not necessarily exactly equal and could be influenced by other factors. If we accept the complexity of value thesis, then there is no reason why the measure of care has to be maximally simple, right? (This post is also related.)

comment by wnoise · 2011-04-11T19:33:01.391Z · LW(p) · GW(p)

In an infinite universe, there are infinitely many copies of you (infinitely many of which are Boltzmann brains).

This is a meme I keep seeing, and it's just not true. You need a lot more assumptions to justify that, such as "randomly generated", or very very strong versions of the cosmological principle.

The real line is infinite, but there's only one copy of the number 7.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2011-04-11T20:12:37.270Z · LW(p) · GW(p)

The randomness of quantum mechanics is enough to guarantee under very weak conditions that, in most Everett branches, there are infinitely many copies of any pattern which occurs with positive probability.

The paper I linked justifies this assumption for one set of cosmological beliefs.

Also, though I made this claim as fact, you could generously consider it to be the assumption of the least convenient possible world. Are you sufficiently confident that there are only finitely many copies of you that you are OK with anthropics that would collapse if there were infinitely many copies?

Replies from: wnoise

↑ comment by wnoise · 2011-04-11T21:20:10.751Z · LW(p) · GW(p)

So you're going with "randomly generated". Which is fine, but it needs to be spelled out.

there are infinitely many copies of any pattern which occurs with positive probability.

You need to be very careful pulling intuitions about randomness from the finite case and applying it to the infinite case. In particular, it is no longer true that just because something happened, it has a positive probability. Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked. And we can pick an infinite number of times and never encounter a duplicate.

the least convenient possible world

I'm not attacking this assumption in order to attack your final conclusion, I'm just attacking this assumption.

Replies from: Cyan, Perplexed, paulfchristiano

↑ comment by Cyan · 2011-04-11T22:22:47.761Z · LW(p) · GW(p)

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked.

I have actually never observed a real number picked at random. I have often observed rational numbers picked at pseudo-random, though.

Replies from: wnoise

↑ comment by wnoise · 2011-04-12T00:59:01.589Z · LW(p) · GW(p)

Observing a Geiger counter near a piece of radioactive material was one of the highlights of my undergraduate physics labs. And the time distribution of clicks is random in the same sense that the OP was using.

Replies from: Sniffnoy, Cyan

↑ comment by Sniffnoy · 2011-04-12T01:58:27.981Z · LW(p) · GW(p)

I think the bigger problem is not randomness vs. pseudorandomness, but rather the question of whether uncountable probability spaces actually exist in physical situations.

Replies from: wnoise

↑ comment by wnoise · 2011-04-12T05:00:10.538Z · LW(p) · GW(p)

I believe they do for the same reasons I take seriously the existence of other Everett branches. In fact the mapping is rather straightforward: I can't observe or directly interact with them in full generality, but the laws governing them and what I can observe are so very much simpler than laws that excise the unobservable ones. Whether I can actually exhibit most real numbers is besides the point.

Replies from: Cyan

↑ comment by Cyan · 2011-04-12T05:11:24.066Z · LW(p) · GW(p)

Is there a demonstration that a physics based on the computables is more complex than a physics based on the reals?

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-12T05:51:41.238Z · LW(p) · GW(p)

Is there a demonstration that a physics based on the computables is more complex than a physics based on the reals?

This is a complicated question. In practice, it is difficult in this particular context to measure what we mean by more or less complicated. A Blum-Shub-Smale machine which is essentially the equivalent of a Turing machine but for real numbers can do anything a regular Turing machine can do. This would suggest that physics based on the real is in general capable of doing more. But in terms of describing rules, it seems that physics based on the reals is simpler. For example, trying to talk about points in space is a lot easier when one can have any real coordinate rather than any computable coordinate. If one wants to prove something about some sort of space that only has computable coordinates the easiest thing is generally to embed it in the corresponding real manifold or the like.

↑ comment by Cyan · 2011-04-12T03:29:05.446Z · LW(p) · GW(p)

As Sniffnoy notes, the bigger problem is about the observation of an actual real number. Any observable signal specifying the instant at which the particle triggered the counter has finite information content, unlike a true real number. This includes the signal sent by your ears to your brain.

I shouldn't have mentioned pseudo-random number generation in the grandparent -- it's a red herring.

↑ comment by Perplexed · 2011-04-11T22:59:34.459Z · LW(p) · GW(p)

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked.

Not in a finite amount of time.

Replies from: wnoise, Manfred

↑ comment by wnoise · 2011-04-12T01:00:28.284Z · LW(p) · GW(p)

What do you mean?

↑ comment by Manfred · 2011-04-11T23:23:46.900Z · LW(p) · GW(p)

Drawing from a continuous distribution happens fairly often, so your comment confuses me. Or maybe you'd say that those aren't "really infinite" and are confined to a certain number of bits, but quantum mechanics would be an exception to that.

Replies from: Perplexed

↑ comment by Perplexed · 2011-04-12T01:01:37.214Z · LW(p) · GW(p)

As Cyan pointed out, when you choose a number confined to a certain number of bits, you are actually choosing from among the rationals.

I don't understand your reference to QM. I wasn't objecting to the randomness aspect. I was simply pointing out that to actually receive that randomly chosen real, you will (almost certainly) need to receive an infinite number of bits, and assuming finite channel capacity, that will take an infinite amount of time. So that event you mentioned, the one with an infinitesimal probability (zero probability for all practical purposes) is not going to actually happen (i.e. finish happening).

It was a minor quibble, which I now regret making.

↑ comment by paulfchristiano · 2011-04-11T21:27:31.534Z · LW(p) · GW(p)

Any given real number has probability zero of being picked from the uniform distribution on [0,1) yet one certainly will be picked

I believe there are probably only countably many distinguishable observer moments, in which case this can't happen by countable additivity.

But you are certainly correct, that a lot goes into this assumption. I should be more clear about this; in particular, I should probably add a bunch of "may"'s.

comment by AlephNeil · 2011-04-11T18:51:28.898Z · LW(p) · GW(p)

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe.

It might not be possible to describe U without making some arbitrary choices concerning "co-ordinates" (and other acts of "gauge-fixing"). And then when they're chosen, we're going to want to 'throw them away' once we've located the observer (since the co-ordinates are not physically meaningful and certainly don't form part of the observer's "mental state".)

So really, it's better to talk about a "centred universe" whose co-ordinates are specially chosen to have the observer in the middle, rather than an uncentered ("objective") universe plus a pointer.

Anyway, I still want to know whether being close to a 'landmark' (like a supermassive black hole) is going to significantly increase one's probability. And whether, if tons of copies of you are made and sent far and wide, you should 'anticipate' waking up close to a landmark.

Replies from: cousin_it, paulfchristiano

↑ comment by cousin_it · 2011-04-11T20:36:10.167Z · LW(p) · GW(p)

Your last paragraph sounds like it could describe gravity if we tweaked it enough :-)

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2011-04-11T22:33:19.049Z · LW(p) · GW(p)

There's an entropic theory of gravity.

↑ comment by paulfchristiano · 2011-04-11T20:25:21.344Z · LW(p) · GW(p)

Anyway, I still want to know whether being close to a 'landmark' (like a supermassive black hole) is going to significantly increase one's probability. And whether, if tons of copies of you are made and sent far and wide, you should 'anticipate' waking up close to a landmark.

The theory predicts many artifacts of this form. I don't think that landmarks are too significant, because specifying what "supermassive black hole" means is a little complicated, but for very easily specified landmarks it would be the case.

comment by Manfred · 2011-04-11T18:49:34.573Z · LW(p) · GW(p)

The "Born Probabilities" section was 11 dang paragraphs of "they're the best fit to our observations and Occam's razor." :(

For example, if the last (truly random) coin I saw flipped came up heads, then in order to specify my experiences you need to specify the result of that coin flip. An equal number of equally complex descriptions point to the version of me who saw heads and the version of me who saw tails.

This is not necessarily true. The sequence HHHHHHHHHH has a lower Kolmogorov complexity than HTTTTHTHTT. So this weighting of observers by complexity has observable consequences in that we will see simpler strings more often than a uniform distribution would predict. But we don't, which makes this idea unlikely.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2011-04-11T20:37:59.212Z · LW(p) · GW(p)

The "Born Probabilities" section was 11 dang paragraphs of "they're the best fit to our observations and Occam's razor." :(

It was 8 paragraphs of "Here is why Occam's razor is entitled to explain the Born probabilities just like the rest of physics." Insofar as the Born probabilities are mysterious at all, this is what needs to be resolved. Do you disagree?

This is not necessarily true. The sequence HHHHHHHHHH has a lower Kolmogorov complexity than HTTTTHTHTT. So this weighting of observers by complexity has observable consequences in that we will see simpler strings more often than a uniform distribution would predict. But we don't, which makes this idea unlikely.

Your reasoning applies verbatim to Solomonoff induction itself, which is the first clue that someone has thought through it before. In fact, I strongly suspect that Solomonoff thought through it.

What you are saying is that truly random processes are rare under the Solomonoff prior. But it should be clear that the total mass on random processes is comparable to the total mass on deterministic processes. So we should not be surprised in general to find ourselves in a universe in which random processes exist. Once we have observed a phenomenon to be random in the past, switching from randomness to some simple law (like always output H) is unlikely for the same reason that arbitrarily changing the laws of physics is unlikely.

Replies from: Manfred

↑ comment by Manfred · 2011-04-11T21:12:08.500Z · LW(p) · GW(p)

Do you disagree?

Yes, but then I never thought they were relatively mysterious anyhow, for the reasons you describe. They're a natural law, and that's what science is for. Neither have I ever heard any physics professors or textbooks say they're mysterious. An "explanation" of the Born probabilities would be deriving them, and some other parts of quantum mechanics, from a simpler underlying framework.

What you are saying is that truly random processes are rare under the Solomonoff prior. But it should be clear that the total mass on random processes is comparable to the total mass on deterministic processes.

"Comparable," but not the same. Qualitative estimates are not enough here.

switching from randomness to some simple law (like always output H) is unlikely for the same reason that arbitrarily changing the laws of physics is unlikely.

Nope. Changing from random to simple would reduce the size of the turing machine needed to generate the output, because a specific random string needs a lot of specification but a run of heads does not. This lowers the complexity and makes it more likely by your proposed prior. The reason that this is bad for your proposed prior and not for Solomonoff induction is because one is about your experience and one is about just the universe. So even in a multiverse where all of you "happen," thus satisfying Solomonoff induction, your prior adds this extra weighting that makes it more likely for you to observe HHHHHHHHHH.

Replies from: lmm

↑ comment by lmm · 2013-10-11T21:39:32.219Z · LW(p) · GW(p)

Short PRNGs seem to exist, and a Turing machine that could produce my subjective experiences up until now would seem to need one already. So I don't think it's necessarily the case that the Turing machine to output a description of an Everett branch in which I observe HHHHHH after a bunch of random-like events is shorter than the one to output a description of an Everett branch in which I observe HTTHHHT after a bunch of random-like events.

comment by AgentME · 2019-05-18T23:38:23.660Z · LW(p) · GW(p)

Consider a computer which is 2 atoms thick running a simulation of you. Suppose this computer can be divided down the middle into two 1 atom thick computers which would both run the same simulation independently. We are faced with an unfortunate dichotomy: either the 2 atom thick simulation has the same weight as two 1 atom thick simulations put together, or it doesn't.

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify.

I think the answer is that the 2-atom thick computer does not automatically have twice as much measure as a 1-atom thick computer. I think you're assuming that in the (U, x) pair, x is just a plain coordinate that locates a system (implementing an observer moment) in 4D spacetime plus Everett branch path. Another possibility is that x is a program for finding a system inside of a 4D spacetime and Everett tree.

Imagine a 2-atom thick computer (containing a mind) which will lose a layer of material and become 1-atom thick if a coin lands on heads. If x were just a plain coordinate, then the mind should expect the coin to land on tails with 2:1 odds, because its volume is cut in half in the heads outcome, and only half as many possible x bit-strings now point to it, so its measure is cut in half. However, if x is a program, then the program can begin with a plain coordinate for finding an early version of the 2-atom thick computer, and then contain instructions for tracking the system in space as time progresses. (The only "plain coordinates" the program would need from there would be a record of the Everett branches to follow the system through.) The locator x would barely need to change to track a future version of the mind after the computer shrinks in thickness compared to if the computer didn't shrink, so the mind's measure would not be affected much.

If the 2-atom thick computer split into two 1-atom thick computers, then you can imagine (U, x) where x is a locator for the 2-atom thick computer before the split, and (U, x1) and (U, x2) where x1 and x2 are locators for the different copies of the computer after the split. x1 and x2 differ from x by pointing to a future time (and record of some more Everett branches but I'm going to ignore that for this) and to differing indexes of which side of the split of the system to track at the time of the split. The measure of the computer is split into the different future copies, but this isn't just because each copy is half of the volume of the original, and does not imply that a 2-atom thick computer shrinking into 1-atom of thickness halves the measure. In the shrinking case, the program x does not need to contain an index about which side of the computer to track: the program contains code to track the computational system, and doesn't need much nudging to keep tracking the computational system when the edge of the material starts transforming into something else not recognized as the computational system. It's only in the case where both halves resemble the computational system enough to continue to be tracked that measure is split.

comment by steven0461 · 2011-04-11T20:06:28.165Z · LW(p) · GW(p)

Jacques Mallah's paper on the Many Computations Interpretation seems relevant here.

comment by reallyeli · 2021-11-21T18:55:06.429Z · LW(p) · GW(p)

Should

serious problems with Boltzmann machines

instead read

serious problems with Boltzmann brains

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2021-11-21T21:37:03.621Z · LW(p) · GW(p)

Yes, thanks.

comment by TheOtherDave · 2011-04-11T17:43:44.280Z · LW(p) · GW(p)

I sheepishly admit to not having followed this particularly well on the first read-through.

That said, it seems very well-structured, so I suspect that my inability to follow it is a symptom of not having sufficient familiarity with its prerequisites.

In any event, the sentence:

I am simply not going to try to be selfish (I don't know how).

....in context, was worth the price of admission of the entire essay.

comment by Dmytry · 2012-02-18T08:18:14.860Z · LW(p) · GW(p)

UDASSA implies that simulations on the 2 atom thick computer count for twice as much as simulations on the 1 atom thick computer, because they are easier to specify. Given a description of one of the 1 atom thick computers, then there are two descriptions of equal complexity that point to the simulation running on the 2 atom thick computer: one description pointing to each layer of the 2 atom thick computer. When a 2 atom thick computer splits, the total number of descriptions pointing to the experience it is simulating doesn't change.

But those 2 descriptions are going to be nearly identical to each other. Shouldn't two descriptions that differ by very little, together, be less than two descriptions that differ a lot? It seems to make very little sense to me to give same weight to 10 beings each of which is unique, and to 10 beings which differ by 4 bits, especially when those bit are not going to propagate through into rest of the being.

Surely, most of us would strongly prefer a world where you have different people, to a world where one person is running on a very thick and inefficient computer.

comment by Armok_GoB · 2011-04-14T20:18:28.975Z · LW(p) · GW(p)

I still don't get why people have to use all these indirect abstractions like measure rather than just thinking in ambient control on the multiverse directly.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2012-02-12T23:56:30.484Z · LW(p) · GW(p)

Because they need to define their preferences.

Replies from: Armok_GoB

↑ comment by Armok_GoB · 2012-02-13T12:13:44.991Z · LW(p) · GW(p)

Not really. Just treat goal uncertainty as any other uncertainty about who you are, and ontological uncertainty like any other kind of logical uncertainty.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2012-02-13T12:26:31.101Z · LW(p) · GW(p)

Goal uncertainty is not about who you are, it's about what should be done. Figuring it out might be a task for the map, but accuracy of the map (in accomplishing that task) is measured in how well it captures value, not in how well it captures itself.

Replies from: Armok_GoB

↑ comment by Armok_GoB · 2012-02-13T17:41:17.498Z · LW(p) · GW(p)

"Hi, this is a note from your past self. For reasons you must not know, your memory has been blanked and your introspective subroutines disabled, including knowledge of what your goals are, a change wich will be reversed by entering a password which can be found in [hard to reach location X], now go get it! Hurry!"

comment by [deleted] · 2011-04-12T21:23:48.867Z · LW(p) · GW(p)

Consider the randomized algorithm A: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its squared inner product with the universal wavefunction.

Consider the randomized algorithm B: compute the state of the universe at time t, then sample a classical configuration with probability proportional to its inner product with the universal wavefunction.

Algorithm A is arguably far, far simpler than Algorithm B, because the component

probability proportional to its squared inner product with the universal wavefunction.

is arguably simpler than the component

probability proportional to its inner product with the universal wavefunction.

The difference is the simplicity of normalization, which you need to perform in order to find the probability density. If I recall correctly (and see reference below), normalization of the classical wavefunction satisfying the Schroedinger equation is relatively easy with respect to squared inner product (modulus squared), because all you have to do is find a single constant which normalizes the wavefunction at any particular time (your choice). Once that has been done, then the wavefunction remains normalized forever, with respect to the modulus squared, i.e., with respect to Algorithm A.

I haven't checked the math, but I would be flabbergasted if normalization with respect to Algorithm B were anything like that simple. On the contrary, I would expect to need to find a new constant for each moment in time.

As long as we are reasoning from simplicity, which you seem to be doing, then this seems to provide us with a strong reason to favor Algorithm A over Algorithm B.

reference:

if a wave-function is initially normalized then it stays normalized as it evolves in time according to Schrödinger's equation.

comment by Vladimir_Nesov · 2011-04-11T18:00:57.526Z · LW(p) · GW(p)

Not being careful in making descriptive statements:

My brain has preferences between probability distributions built into it.

As humans using Solomonoff induction, we go on to argue that

Fundamental mental entities:

Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences.

Unsubstantiated claims:

The shortest description of me is a pair (U, x), where U is a description of my universe and x is a description of where to find me in that universe.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2011-04-11T20:21:34.656Z · LW(p) · GW(p)

Not being careful in making descriptive statements:

I don't understand how these descriptive statements could be made more careful. In the first statement, I go on to explain exactly what I mean as well as I can. Do you not think my description refers to a function your brain performs? In the second statement, you are objecting to my use of "we" instead of giving a list of people? (e.g., me, Yudkowsky, Solomonoff...)

Fundamental mental entities:

As long as I don't understand what consciousness is, it seems this problem is unavoidable. Should we not talk about anthropics until we solve the problem of consciousness? That seems like a bad option, since we may well have to make choices about simulations long before then.

Unsubstantiated claims:

My claim is better substantiated than the claim that Solomonoff induction is a reasonable thing to do for a human scientist. Admittedly that may not be the case, but its pretty well accepted here and has been argued at great length by many other thinkers (e.g., Solomonoff).

comment by Vladimir_Nesov · 2011-04-11T17:18:56.630Z · LW(p) · GW(p)

My brain has preferences between probability distributions built into it.

Mine doesn't. Where can get a patch?

comment by D_Malik · 2015-02-27T12:53:31.247Z · LW(p) · GW(p)

The first link in your post is broken (Hal Finney's entire site seems to be down) but there's a mirror here.

comment by skepsci · 2012-02-13T13:24:23.482Z · LW(p) · GW(p)

It eventually learns that the simplest explanation for its experiences is the description of an external lawful universe in which its sense organs are embedded and a description of that embedding.

That's the simplest explanation for our experiences. It may or may not be the simplest explanation for the experiences of an arbitrary sentient thinker.

Rather than supposing that the probability of a certain universe depends on the complexity of that universe, it takes as a primitive object a probability distribution over possible experiences. By the same reasoning that led a normal Solomonoff inductor to accept the existence of an external universe as the best explanation for its experiences, the least complex description of your conscious experience is the description of an external lawful universe and directions for finding the substructure embodying your experience within that substructure.

Unless I'm misunderstanding you, you're saying that we should start with an arbitrary prior (which may or may not be the same as Solomonoff's universal prior). If you're starting with an arbitrary prior, you have no idea what the best explanation for your experiences is going to be, because it depends on the prior. According to some prior, it's a Giant lookup table. According to some prior, you're being emulated by a supercomputer in a universe whose physics is being emulated at the elementary particle level by hand calculations performed by an immortal sentient being (with an odd utility function), who lives in an external lawful universe.

Of course, the same will be true if you take the standard universal prior, but define Kolmogorov complexity relative to a sufficiently bizarre universal Turing machine (of which there are many). According to the theory, it doesn't matter because over time you will predict your experiences with greater and greater accuracy. But you never update the relative credences you give to different models which make the same predictions, so if you started off thinking that the simulation of the simulation of the simulation was a better model than simply discarding the outer layers and taking the innermost level, you will forever hold the unfalsifiable belief that you live in an inescapable Matrix, even as you use your knowledge to correctly model reality and use your model to maximize your personal utility function (or whatever it is Solomonoff inductors are supposed to do).

comment by Jonathan_Graehl · 2011-04-11T22:48:24.733Z · LW(p) · GW(p)

On first skim - what's a "classical configuration"? There are 3000 or so Google results (in conjunction with "Born") but I don't immediately see an answer.

Replies from: Manfred

↑ comment by Manfred · 2011-04-12T06:12:21.163Z · LW(p) · GW(p)

The thing that does what he says is a basis state. You shouldn't read too much into his description - they're not classical, for one thing.

comment by Scott Alexander (Yvain) · 2011-04-11T21:49:18.323Z · LW(p) · GW(p)

Thanks for this.

The Born probability explanation sounds a lot like Scott Aaronson's explanation for why the moon is round: because if it weren't, we would not be ourselves, but rather entities exactly like ourselves except that they live in a universe with a square moon.

I don't know whether that's an argument against that explanation, or whether this is one of those cases where the reductio ad absurdum turns out to be true.

comment by reallyeli · 2021-11-21T18:53:59.196Z · LW(p) · GW(p)

The Absolute Self-Selection Assumption

Contents

44 comments