# The Born Probabilities

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-05-01T05:50:53.000Z · LW · GW · Legacy · 83 comments**Previously in series**: Decoherence is Pointless**Followup to**: Where Experience Confuses Physicists

One serious mystery of decoherence is where the Born probabilities come from, or even what they are probabilities *of.* What does the integral over the squared modulus of the amplitude density have to do with anything?

This was discussed by analogy in "Where Experience Confuses Physicists", and I won't repeat arguments already covered there. I will, however, try to convey exactly what the puzzle *is,* in the real framework of quantum mechanics.

A professor teaching undergraduates might say: "The probability of finding a particle in a particular position is given by the squared modulus of the amplitude at that position."

This is oversimplified in several ways.

First, for continuous variables like position, amplitude is a density, not a point mass. You integrate over it. The integral over a single point is zero.

(Historical note: If "observing a particle's position" invoked a mysterious event that squeezed the amplitude distribution down to a delta point, or flattened it in one subspace, this would give us a different future amplitude distribution from what decoherence would predict. All interpretations of QM that involve quantum systems jumping into a point/flat state, which are both testable and have been tested, have been falsified. The universe does not have a "classical mode" to jump into; it's all amplitudes, all the time.)

Second, a single observed particle doesn't *have* an amplitude distribution. Rather the system containing yourself, plus the particle, plus the rest of the universe*,* may approximately *factor *into the multiplicative product of (1) a sub-distribution over the particle position and (2) a sub-distribution over the rest of the universe. Or rather, the particular blob of amplitude that you happen to be in, can factor that way.

So what could it mean, to associate a "subjective probability" with a component of one *factor* of a combined amplitude distribution that happens to factorize?

Recall the physics for:

(Human-BLANK * Sensor-BLANK) * (Atom-LEFT + Atom-RIGHT)

=>

(Human-LEFT * Sensor-LEFT * Atom-LEFT) + (Human-RIGHT * Sensor-RIGHT * Atom-RIGHT)

Think of the whole process as reflecting the good-old-fashioned distributive rule of algebra. The initial state can be decomposed—note that this is an *identity*, not an evolution—into:

(Human-BLANK * Sensor-BLANK) * (Atom-LEFT + Atom-RIGHT)

=

(Human-BLANK * Sensor-BLANK * Atom-LEFT) + (Human-BLANK * Sensor-BLANK * Atom-RIGHT)

We assume that the distribution factorizes. It follows that the term on the left, and the term on the right, initially differ only by a multiplicative factor of Atom-LEFT vs. Atom-RIGHT.

If you were to *immediately* take the multi-dimensional integral over the squared modulus of the amplitude density of that whole system,

Then the *ratio* of the all-dimensional integral of the squared modulus over the left-side term, *to* the all-dimensional integral over the squared modulus of the right-side term,

Would equal the *ratio* of the lower-dimensional integral over the squared modulus of the Atom-LEFT, *to* the lower-dimensional integral over the squared modulus of Atom-RIGHT,

For essentially the same reason that if you've got (2 * 3) * (5 + 7), the ratio of (2 * 3 * 5) to (2 * 3 * 7) is the same as the ratio of 5 to 7.

Doing an integral over the squared modulus of a complex amplitude distribution in N dimensions doesn't change that.

There's also a rule called "unitary evolution" in quantum mechanics, which says that quantum evolution never changes the *total* integral over the squared modulus of the amplitude density.

So if you assume that the initial left term and the initial right term evolve, without overlapping each other, into the final LEFT term and the final RIGHT term, they'll have the same ratio of integrals over etcetera as before.

What all this says is that,

If some roughly independent Atom has got a blob of amplitude on the left of its factor, and a blob of amplitude on the right,

Then, after the Sensor senses the atom, and *you* look at the Sensor,

The integrated squared modulus of the whole LEFT blob, and the integrated squared modulus of the whole RIGHT blob,

Will have the same ratio,

As the ratio of the squared moduli of the original Atom-LEFT and Atom-RIGHT components.

This is why it's important to remember that apparently individual particles have amplitude distributions that are *multiplicative factors* within the total *joint* distribution over *all* the particles.

If a whole gigantic human experimenter made up of quintillions of particles,

Interacts with one teensy little atom whose amplitude *factor* has a big bulge on the left and a small bulge on the right,

Then the resulting amplitude distribution, in the *joint* configuration space,

Has a big amplitude blob for "human sees atom on the left", and a small amplitude blob of "human sees atom on the right".

And what *that* means, is that the Born probabilities seem to be about *finding yourself in a particular blob,* not *the particle being in a particular place.*

But what does the integral over squared moduli have to do with anything? On a straight reading of the data, you would always find yourself in both blobs, every time. How can you find yourself in one blob with greater probability? What are the Born probabilities, probabilities *of*? Here's the map—where's the territory?

I don't know. It's an open problem. Try not to go funny in the head about it.

This problem is even worse than it looks, because the squared-modulus business is *the only non-linear rule in all of quantum mechanics.* Everything else—*everything* else—obeys the linear rule that the evolution of amplitude distribution A, plus the evolution of the amplitude distribution B, equals the evolution of the amplitude distribution A + B.

When you think about the weather in terms of clouds and flapping butterflies, it may not *look* linear on that higher level. But the amplitude distribution for weather (plus the rest of the universe) is linear on the only level that's fundamentally real.

Does this mean that the squared-modulus business *must* require additional physics beyond the linear laws we know—that it's *necessarily* futile to try to derive it on any higher level of organization?

But even this doesn't follow.

Let's say I have a computer program which computes a sequence of positive integers that encode the successive states of a sentient being. For example, the positive integers might describe a Conway's-Game-of-Life universe containing sentient beings (Life is Turing-complete) or some other cellular automaton.

Regardless, this sequence of positive integers represents the time series of a discrete universe containing conscious entities. Call this sequence Sentient(n).

Now consider another computer program, which computes the negative of the first sequence: -Sentient(n). If the computer running Sentient(n) instantiates conscious entities, then so too should a program that computes Sentient(n) and then negates the output.

Now I write a computer program that computes the sequence {0, 0, 0...} in the obvious fashion.

This sequence happens to be equal to the sequence Sentient(n) + -Sentient(n).

So does a program that computes {0, 0, 0...} necessarily instantiate as many conscious beings as both Sentient programs put together?

Admittedly, this isn't an exact analogy for "two universes add linearly and cancel out". For that, you would have to talk about a universe with linear physics, which excludes Conway's Life. And then in this linear universe, two states of the world both containing conscious observers—world-states equal but for their opposite sign—would have to cancel out.

It doesn't work in Conway's Life, but it works in our own universe! Two quantum amplitude distributions can contain components that *cancel each other out,* and this demonstrates that the number of conscious observers in *the sum of two distributions*, need not equal the sum of conscious observers *in each distribution separately.*

So it actually *is* possible that we could pawn off *the only non-linear phenomenon in all of quantum physics* onto a better understanding of consciousness. The question "How many conscious observers are contained in an evolving amplitude distribution?" has obvious reasons to be non-linear.

(!)

Robin Hanson has made a suggestion along these lines.

(!!)

Decoherence is a physically continuous process, and the interaction between LEFT and RIGHT blobs may never actually become *zero.*

So, Robin suggests, any blob of amplitude which gets small enough, becomes dominated by stray flows of amplitude from many larger worlds.

A blob which gets too small, cannot sustain coherent inner interactions—an internally driven chain of cause and effect—because the amplitude flows are dominated from outside. Too-small worlds fail to support computation and consciousness, or are ground up into chaos, or merge into larger worlds.

Hence Robin's cheery phrase, "mangled worlds".

The cutoff point will be a function of the squared modulus, because unitary physics preserves the squared modulus under evolution; if a blob has a certain total squared modulus, future evolution will preserve that integrated squared modulus so long as the blob doesn't split further. You can think of the squared modulus as the amount of amplitude available to internal flows of causality, as opposed to outside impositions.

The seductive aspect of Robin's theory is that quantum physics wouldn't need *interpreting.* You wouldn't have to stand off beside the mathematical structure of the universe, and say, "Okay, now that you're finished computing all the mere numbers, I'm furthermore telling you that the squared modulus is the 'degree of existence'." Instead, when you run any program that computes the *mere numbers,* the program *automatically* contains people who experience the same physics we do, with the same probabilities.

A major problem with Robin's theory is that it seems to predict things like, "We should find ourselves in a universe in which ~~lots of~~ very few decoherence events have already taken place," which tendency does not seem especially apparent.

The main thing that would support Robin's theory would be if you could show from first principles that mangling does happen; and that the cutoff point is somewhere around the median amplitude density (the point where half the total amplitude density is in worlds above the point, and half beneath it), which is apparently what it takes to reproduce the Born probabilities in any particular experiment.

What's the probability that Hanson's suggestion is right? I'd put it under fifty percent, which I don't think Hanson would disagree with. It would be much lower if I knew of a single alternative that seemed equally... reductionist.

But *even if* Hanson is wrong about what causes the Born probabilities, I would guess that the final answer still comes out *equally non-mysterious*. Which would make me feel very silly, if I'd embraced a more mysterious-seeming "answer" up until then. As a general rule, it is questions that are mysterious, not answers.

When I began reading Hanson's paper, my initial thought was: *The math isn't beautiful enough to be true.*

By the time I finished processing the paper, I was thinking: *I don't know if this is the real answer, but the real answer has got to be at least this normal.*

This is still my position today.

Part of *The Quantum Physics Sequence*

Next post: "Decoherence as Projection"

Previous post: "Decoherent Essences"

## 83 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

## comment by steven · 2008-05-01T11:05:48.000Z · LW(p) · GW(p)

*You wouldn't have to stand off beside the mathematical structure of the universe, and say, "Okay, now that you're finished computing all the mere numbers, I'm furthermore telling you that the squared modulus is the 'degree of existence'."*

Instead, you'd have to stand off beside the mathematical structure of the universe, and say, "Okay, now that you're finished computing all the mere numbers, I'm furthermore telling you that the world count is the 'degree of existence'."

## comment by Paul_Crowley · 2008-05-01T11:48:04.000Z · LW(p) · GW(p)

Roland: yes, at least one. Where did you give up and why?

## comment by RobinHanson · 2008-05-01T12:03:20.000Z · LW(p) · GW(p)

*A major problem with Robin's theory is that it seems to predict things like, "We should find ourselves in a universe in which lots of decoherence events have already taken place," which tendency does not seem especially apparent.*

Actually the theory suggests we should find ourselves in a state with near the *least* feasible number of past decoherence events. Yes, it is not clear if this in fact holds, and yes I'd put the chance of something like mangled worlds being right as more like 1/4 or 1/3.

## comment by eddie · 2008-05-01T13:53:32.000Z · LW(p) · GW(p)

Thanks to Eliezer's QM series, I'm starting to have enough background to understand Robin's paper (kind of, maybe). And now that I do (kind of, maybe), it seems to me that Robin's point is completely demolished by Wallace's points about decoherence being continuous rather than discrete and therefore there being no such thing as a number of discrete worlds to count.

There seems to be nothing to resolve between the probabilities given by measure and the probabilities implied by world count if you simply say that measure *is* probability.

Eliezer objects. We're interpreting. We're adding something outside the mathematics.

I fail to see the problem.

If we're to accept that particles moving like billiard balls are an illusion, and configuration space is real, and blobs of amplitude are real, and time evolution of amplitude within configuration space according to the wave equations is real, and that configurations and amplitude and wave equations are fundamental parts of reality, because that's the best model we've come up with that agrees with experimental observation... *why not accept that the modulus-squared law is real and fundamental, too?*

It certainly agrees with experimental observations, and doesn't seem any less desirable a part of our model of reality than configurations, amplitude blobs, and wave equations.

I wish someone would explain the problem more clearly, although if Eliezer's explanations so far haven't cleared it up for me yet, perhaps nothing will.

Replies from: jschulter## ↑ comment by jschulter · 2010-10-08T06:15:32.269Z · LW(p) · GW(p)

why not accept that the modulus-squared law is real and fundamental, too?

Reading through this, and Hanson's quick overview page of mangled worlds, I was wondering the same thing myself. For some reason though, seeing you ask the question I hadn't quite verbalized put the answer right on the tip of my tongue: for the same reason Einstein was so sure of General Relativity. The modulus squared law conflicts with a regularity in the form that the fundamental laws seem to take, specifically their linear evolution, and Eliezer puts stock in that regularity. In fact, he does so sufficiently to let him elevate any theory which accounts for the data while holding the regularity far above those that don't, similar to how Einstein picked GR out of hypothesis space.

The benefit of the mangled worlds interpretation is that while the universe-amplitude-blobs do have measure (a non-linear element), it is irrelevant to what actually happens. It really only comes into play when trying to *understand* the interaction between the universe-amplitude-blobs, but it doesn't play a part in actually *describing* that interaction. For example, the possible mangling of a world of small measure would be described by normal linear quantum evolution, but since the calculations are not very nice, we can determine whether it would be mangled using that measure. Thus, we are using the measure as a mathematical shortcut to determine generalized behavior, but all evolution is linear, and observations can be explained without the **extra** hypothesis that "measure *is* probability".

## comment by Stephen · 2008-05-01T14:37:45.000Z · LW(p) · GW(p)

Eddie,

My understanding of Eli's beef with the Born rule is this (he can correct me if I'm wrong): the Born rule appears to be a bridging rule in fundamental physics that directly tells us something about how qualia bind to the universe. This seems odd. Furthermore, if the binding of qualia to the universe is given by a separate fundamental bridging rule independent of the other laws of physics, then the zombie world really is logically possible, or in other words epiphenomenalism is true. (Just postulate a universe with all the laws of physics except Born's bridging rule. Such a universe is, as far as we know, logically consistent.) Eli argues against epiphenomenalism on the grounds that if epiphenomenalism is true, then the correlation between beliefs (which are qualia) with our statements and actions (which are physical processes) is just a miraculous coincidence.

What follows are my own comments as opposed to a summary of what I believe Eli thinks:

Why can't the correlation between physical states and beliefs arise by an arrow of causation that goes from the physical states to the beliefs? In this case epiphenomenalism would be true (since qualia have no effect on the physical world), but the correlation would not be a coincidence (since the physical world directly causes qualia). I think the objection to this is that if there really is a bridging law, then the coincidence remains that it is such a reasonable bridging law. That is, what we say we experience and physically act as though we experience actually matches (usually) what we do experience, as opposed to relating to what we do experience in some arbitrarily scrambled way. If qualia bind to some higher emergent level having to do with information processing, then it seems non-coincidental that the bridging law is reasonable. (Because the things it is mapping between seem to have a close and clear relationship.) However, the Born rule seems to suggest that the bridging rule is at the level of fundamental physics.

Maybe if we could derive the Born rule as a property of the information processing performed by a quantum universe the mystery would go away.

Replies from: diegocaleiro## ↑ comment by diegocaleiro · 2009-07-27T10:44:26.113Z · LW(p) · GW(p)

"Eli argues against epiphenomenalism on the grounds that if epiphenomenalism is true, then the correlation between beliefs (which are qualia) with our statements and actions (which are physical processes) is just a miraculous coincidence."

Supposing he does, I must point out that it is false to say that beliefs are qualia. In fact, beliefs are part of the intentional stance. That is well worked out in Dennett's book by the same name.

The intentional level can be accounted for in physical terms (See for instance "Kinds of Minds" by Dennett to see how intentionality unfolds from genes to amoebas to Karl Popper.

One could insist on being a phenomenal realist, and say that beliefs are both an intentional interpretation of a physical system that can be accounted for without the aid of qualia, and furthermore that there was another aspect of beliefs that is the experiential aspect, the qualia-ness of them.

Even holding such a position, one needs only to explain our beliefs as long as they are physically causally effective upon the world (for instance causing us to talk about qualia, beliefs, etc..).

So if there are beliefs as intentional descriptions of organisms, AND in addition beliefs as qualia, the second kind is UTTERLY unexplainable by its very nature.

There is no need to account for them, because we have no reason to believe they exist, since if they did, they would not figure in our theories, being causally inneficient.

## comment by Nick_Tarleton · 2008-05-01T14:44:32.000Z · LW(p) · GW(p)

None of the confusion over duplication and quantum measures seems unique to beings with qualia; any Bayesian system capable of anthropic reasoning, it would seem, should be surprised the universe is orderly. So maybe either the confusion is separate from and deeper than experience, or AIXItl has qualia.

## comment by ME3 · 2008-05-01T15:13:35.000Z · LW(p) · GW(p)

As I understand it (someone correct me if I'm wrong), there are two problems with the Born rule: 1) It is non-linear, which suggests that it's not fundamental, since other fundamental laws seem to be linear

2) From my reading of Robin's article, I gather that the problem with the many-worlds interpretation is: let's say a world is created for each possible outcome (countable or uncountable). In that case, the vast majority of worlds should end up away from the peaks of the distribution, just because the peaks only occupy a small part of any distribution.

Robin's solution seems to me equivalent to the Quantum Spaghetti Monster eating the unlikely worlds that we find ourselves not to end up in. The key line is "sudden and thermodynamically irreversible." Actually, that should be enough to bury the theory since aren't fundamental physical laws thermodynamically neutral?

We could probably eliminate this distraction of consciousness, couldn't we? I mean, let's say that Mathematica version 5000 comes out in a few centuries and in addition to its other symbolic algebra capabilities, it comes with a physical-law-prover: you ask it questions and it sets up experiments to answer those questions. So you ask it about quantum mechanics, it does a bunch of double-slit-experiments in a robotic lab, and gives you the answer, which includes the Born rule. Consciousness was never involved.

Actually it seems to me like this whole business of quantum probabilities is way overrated (for the non-physicist), because it only really manifests itself in cleverly constructed experiments . . . right? I mean, setting aside exactly how Born's rule derives from the underlying physics, is there any reason to believe that we would learn anything new by finding out?

Replies from: AgentME## ↑ comment by AgentME · 2018-09-09T03:59:40.761Z · LW(p) · GW(p)

The observer's consciousness is still involved. Imagine that the Born rule isn't a law of the universe itself, but of consciousness. The universe evaluates all branches. Consciousness follows the branches in weights following the Born rule. The conscious observer always finds themselves down a series of branches that were selected by the Born rule, and it's easy for them to take measurements to confirm this. The Mathematica 5000 machine that's come down this series of branches has made measurements from experiments and has found that the Born rule has held. It only comes up with this result because this is the version of the machine that has followed the observer's consciousness through the branches. In the raw universe, most worlds have the Mathematica 5000 machine finding that Born's rule does not hold; these aren't the worlds that conscious observers usually find themselves in though.

## comment by Stephen · 2008-05-01T15:19:32.000Z · LW(p) · GW(p)

Nick: I don't understand the connection to quantum mechanics.

The argument that I commonly see relating quantum mechanics to anthropic reasoning is deeply flawed. Some people seem to think that many worlds means there are many "branches" of the wavefunction and we find ourselves in them with equal probability. In this case, they argue, we should expect to find ourselves in a disorderly universe. However, this is exactly what the Born rule (and experiment!) does not say. Rather, the Born rule says that we are only likely to find ourselves in states with large amplitude. Also, standard quantum mechanics allows the probabilities to fall on a continuum. They aren't arrived at by counting, so the whole concept of counting branches is not standard QM anyway.

(I don't know whether you hold this view, but it is a common misconception that should be addressed at some point anyway.)

## comment by Caledonian2 · 2008-05-01T15:21:35.000Z · LW(p) · GW(p)

In this case epiphenomenalism would be true (since qualia have no effect on the physical world), but the correlation would not be a coincidence (since the physical world directly causes qualia).

But the nature of the experiences we claimed to have would not depend in any way on the properties of these hypothetical 'qualia'. There would be no event in the physical world that would be affected by them - they would not, in fact, exist.

Epiphenomenalism is never true, because it contains a contradiction in terms.

## comment by Psy-Kosh · 2008-05-01T18:45:04.000Z · LW(p) · GW(p)

Here's a different question which may be relevant: why unitary transforms?

That is, if you didn't in the first place know about the Born rule, what would be a (even semi) intuitive justification for the restriction that all "reasonable" transforms/time evolution operators have to conserve the squared magnitude?

Given the Born rule, it seems rather obvious, but the Born rule itself is what is currently appears to be suspiciously out of place. So, if that arises out of something more basic, then why the unitary rule in the first place?

## comment by eddie · 2008-05-01T18:57:16.000Z · LW(p) · GW(p)

Stephen, thanks for your thoughts on Eli's thoughts. I'm going to have to think on them further - after all these helpful posts I can pretend I understand quantum mechanics, but pretending to understand how conscious minds perceive a single point in configuration space instead of blobs of amplitude is going to take more work.

I will point out, though, that the question of how consciousness is bound to a particular branch (and thus why the Born rule works like it does) doesn't seem that much different from how consciousness is tied to a particular point in time or to a particular brain when the Spaghetti Monster can see all brains in all times and would have to be given extra information to know that my consciousness seems to be living in *this* particular brain at *this* particular time.

Finally: *"it is a common misconception that should be addressed at some point anyway"* - it appears to me that Robin's paper is based on this same misconception, or something like it: the Born rule (and experiment!) give one result while counting worlds gives another, therefore we have to add a new rule ("worlds that are too small get mangled") in order to make counting worlds match experiment. Whereas without the misconception we wouldn't be counting worlds in the first place. Do you think I'm understanding Robin's position and/or QM correctly?

## comment by Stephen · 2008-05-01T19:08:37.000Z · LW(p) · GW(p)

"Given the Born rule, it seems rather obvious, but the Born rule itself is what is currently appears to be suspiciously out of place. So, if that arises out of something more basic, then why the unitary rule in the first place?"

While not an answer, I know of a relevant comment. Suppose you assume that a theory is linear and preserves some norm. What norm might it be? Before addressing this, let's say what a norm is. In mathematics a norm is defined to be some function on vectors that is only zero for the all zeros vector, and obeys the triangle inequality: the norm of a+b is no more than the norm of a plus the norm of b. The functions satisfying these axioms seem to capture everything that we would intuitively regard as some sort of length or magnitude.

The Euclidian norm is obtained by summing the squares of the absolute values of the vector components, and then taking the square root of the result. The other norms that arise in mathematics are usually of the type where you raise the each of the absolute values of the vector components to some power p, then sum them up, and then take the pth root. The corresponding norm is called the p-norm. (Does somebody know: are all the norms invariant under permutation of the indices p-norms?) Scott Aaronson proved that for any p other than 1 or 2, the only norm-preserving linear transformations are the permutations of the components. If you choose the 1-norm, then the sum of the absolute values of the components are preserved, and the norm preserving transformations correspond to the stochastic matrices. This is essentially probability theory. If you choose the 2-norm then the Euclidean length of the vectors is preserved, and the allowed linear transformations correspond to the unitary matrices. This is essentially quantum mechanics. (Scott always hastens to add that his theorem about p-norms and permutations was probably known by mathematicians for a long time. The new part is the application to foundations of QM.)

Replies from: flexive## ↑ comment by flexive · 2015-08-24T22:43:31.933Z · LW(p) · GW(p)

Scott Aaronson proved that for any p other than 1 or 2, the only norm-preserving linear transformations are the permutations of the components.

This seems to be true, but with the small note that you should add multipication of the coordinates by -1 [by any number from unit circle if the space is taken over complex numbers] and their compositions with permutations to the allowed isomorphisms. Never heard about this though, interesting.

However this does not generalize to all the norms. As Douglas noted below one can imagine norm simply as a central-symmetric convex body. And there are plenty of those. Now if we can fix a finite subgroup of space rotations and symmetries that strictly contains all the coordinate permutations and central-symmetry then we are done, since one can simply take convex hull of the orbit of some point as your desired norm. Symmetries and rotations of regular 100-gon on the plane would work for example.

If you choose the 1-norm, then the sum of the absolute values of the components are preserved, and the norm preserving transformations correspond to the stochastic matrices.

Hmm, something fishy is going with signs in the whole argument and here I am completely lost. What if I take 2x2 matrix with all entries equal to 1/2 and a vector (1/2, -1/2)? Probably the full formulation by Scott would help. Does anybody have a link?

Replies from: RichardKennaway## ↑ comment by RichardKennaway · 2015-08-24T23:15:45.615Z · LW(p) · GW(p)

Probably the full formulation by Scott would help. Does anybody have a link?

This.

Replies from: flexive## ↑ comment by flexive · 2015-08-27T19:37:50.706Z · LW(p) · GW(p)

Thank you.

Nice paper. Signs are treated accurately there of course. However call to "formal functions" in the end of the proof seems wacky at best. Formalizing it looks harder to me than the initial statement. At this point it should be easier to just look at the smoothness degrees of the norm on x_i = 0 hyperplanes.

If anybody knows what was meant, however, please clarify.

## comment by Stephen · 2008-05-01T19:22:47.000Z · LW(p) · GW(p)

"I will point out, though, that the question of how consciousness is bound to a particular branch (and thus why the Born rule works like it does) doesn't seem that much different from how consciousness is tied to a particular point in time or to a particular brain when the Spaghetti Monster can see all brains in all times and would have to be given extra information to know that my consciousness seems to be living in *this* particular brain at *this* particular time."

Agreed!

More generally, it seems to me that many objections people raise about the foundations of QM apply equally well to classical physics when you really think about it.

However, I think Eli's objection to the Born rule is different. The special weird thing about quantum mechanics as currently understood is that Born's rule seems to suggest that the binding of qualia is a separate rule in fundamental physics.

## comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-05-01T19:28:56.000Z · LW(p) · GW(p)

Psy-Kosh, the amplitudes of everything everywhere could be changing by a constant modulus and phase, without it being noticed. But if it were possible for you to carry out some physical process that changed the squared modulus of the LEFT blob as a whole, without splitting it and without changing the squared modulus of the RIGHT blob, then you would be able to use this physical process to change the ratio of the squared moduli of LEFT and RIGHT, hence control the outcome of arbitrary quantum experiments by invoking it selectively.

It would be an Outcome Pump.

Controllable unitarity violation wouldn't just let you win the lottery, it would let you communicate faster than light, by forcing a particular outcome in a quantum entanglement, Bell's Inequality type situation.

## comment by Psy-Kosh · 2008-05-01T20:33:58.000Z · LW(p) · GW(p)

Stephen: Thanks. First, not everything corresponding to a length or such obeys that particular rule... consider the Lorenz metric... any "lightlike" vector has a norm of zero, for instance, and yet that particular matric is rather useful physically. :) (admittedly, you get that via the minus sign, and if your norm is such that it treats all the components in some sense equivalently, you don't get that... well, what about norms involving cross terms?)

More to the subject... why is any norm preserved? That is, why only allow norm preserving transforms?

Which brings be to Eliezer:

So? Why does the universe "choose" rules that say "no outcome pump"? That's way up the ladder of stuff built out of other stuff. (as far as communicating faster than light, I'd think "outcome pump" type things are the main 'crazy' result of FTL in the first place)

Actually, I think I didn't communicate my question accurately. You derived it would be an outcome pump by noting it would change the Born derived probabilities (At least, that's my understanding of the significance of you noting that the ratios of the squared magnitudes changing.) But the Born probabilities are already the "odd rule out"... so I wanted to know if there was any other reason/argument you could think of as to why we have norm preservation without appealing to the Born rule. (Does that clarify my question?)

I mean, if I was letting myself use the Born rule, I could just say that the probabilities have to sum to 1, and that hands me the unitaryness. But my whole point was "the restriction to unitary transforms *itself* seems to be related to squared magnitude stuff. So by understanding why that restriction exists in reality, maybe I'd have a better idea where the Born rule is coming from"

## comment by Stephen · 2008-05-01T20:51:42.000Z · LW(p) · GW(p)

Psy-Kosh:

Good example with the Lorentz metric.

Invariance of norm under permutations seems a reasonable assumption for state spaces. On the other hand, I now realize the answer to my question about whether permutation invariance narrows things down to p-norms is no. A simple counterexample is a linear combination of two different p-norms.

I think there might be a good reason to think in terms of norm-preserving maps. Namely, suppose the norms can be anything but the individual amplitudes don't matter, only their ratios do. That is, states are identified not with vectors in the Hilbert space, but rays in the Hilbert space. This is the way von Neumann formulated QM, and it is equivalent to the now more common norm=1 formulation. This also seems to be the formulation Eli was implicitly using in some of his previous posts.

The usual way to formulate QM these days is, rather than ignoring the normalizations of the state vectors, one can instead just decree that the norms must always have a certain value (specifically, 1). Then we can assign meaning to the individual amplitudes rather than only their ratios. It seems likely to me that theories where only the ratios of the "amplitudes" matter, generically can be equivalently formulated as a theory with fixed norm. Thinking that only ratios matter seems a more intuitive starting point.

## comment by Stephen · 2008-05-01T21:14:12.000Z · LW(p) · GW(p)

I'm struck by guilt for having spoken of "ratios of amplitudes". It makes the proposal sound more specific and fully worked-out than it is. Let me just replace that phrase in my previous post with the vaguer notion of "relative amplitudes".

## comment by Psy-Kosh · 2008-05-01T21:36:19.000Z · LW(p) · GW(p)

Stephen: Is the point you're making basically along the lines of "vector as geometric object rather than list of numbers"?

Sure, I buy that. Heck, I'm naturally inclined toward that perspective at this time. (In part because have been studying GR lately)

Aaanyways, so I guess basically what you're saying is that all operators corresponding to time evolution or whatever are just rotations or such in the space? And why the 2-norm instead of, say, the 1-norm? why would the universe "prefer" to preserve the sum of the squared magnitudes rather than the sum of the magnitudes? ie, why is the rule "unitary" rather than "stochastic", for instance? (Well, I have a partial answer for that myself... reversibility. Stochastic isn't necessarally reversible, right? unitary is though, so there is that...)

If I'm understanding what you're trying to say, basically you're saying "it's as if you use any ole transform, then just divide by the factor the norm's been changed by, so you may as well have that 'already in' the transform"... But if the transform isn't some multiple of a unitary transform, then there won't be any single scalar value that takes care of that, right? Why instead of "norm preserving" isn't the rule "any invertable linear transform"?

Or did I completely and utterly misunderstand what you were trying to say?

## comment by Recovering_irrationalist · 2008-05-01T21:39:13.000Z · LW(p) · GW(p)

@Roland: My physics and maths is patchy but I'm still just about following (the posts - some comments are way too advanced) though it is hard work for some bits. Lots of slow re-reading, looking things up and revising old posts, but it's worth it.

If you're determined enough, try reading the posts a few at a time (instead of one a day) starting a few posts before where you got stuck, and *make sure* you "get" each one before you move on, even if it means an hour on another web source studying the thing you don't understand in Eliezer's explanation.

## comment by Stephen · 2008-05-01T21:56:53.000Z · LW(p) · GW(p)

Psy-Kosh:

"Or did I completely and utterly misunderstand what you were trying to say?"

No, you are correctly interpreting me and noticing a gap in the reasoning of my preceeding post. Sorry about that. I re-looked-up Scott's paper to see what he actually said. If, as you propose, you allow invertible but non-norm-preserving time evolutions and just re-adjust the norm afterwards then you get FTL signalling, as well as obscene computational power. The paper is here.

## comment by Peter_Mexbacher · 2008-05-01T22:16:53.000Z · LW(p) · GW(p)

*A major problem with Robin's theory is that it seems to predict things like, We should find ourselves in a universe in which lots of decoherence events have already taken place," which tendency does not seem especially apparent.*

Actually the theory suggests we should find ourselves in a state with near the least feasible number of past decoherence events

I don't understand this - doesn't decoherence occur *all* the time, in every quantum interaction between all amplitudes all the time? So, like for every amptlitude separate enough to be a "particle" in the universe (=factor) every planck time it will decohere with other factors?

Or did I misunderstand something big time here?

Cheers, Peter

Replies from: ramana-kumar## ↑ comment by Ramana Kumar (ramana-kumar) · 2009-10-31T23:46:58.291Z · LW(p) · GW(p)

I'd also love to know the answer to Peter's question... A similar question is whether we should expect all worlds to eventually become mangled (assuming the "mangled worlds" model). I understand "world" to mean "somewhat isolated blob of amplitude in an amplitude distribution" - is that right?

Replies from: Douglas_Knight## ↑ comment by Douglas_Knight · 2009-11-01T00:26:02.870Z · LW(p) · GW(p)

The answer to Peter's question is: no, decoherence doesn't happen with a constant rate and it certainly doesn't happen on the Planck time scale.

The answer to your question is that "managled worlds" is a collapse theory: some worlds get managled and go away, leaving other worlds.

Replies from: ramana-kumar## ↑ comment by Ramana Kumar (ramana-kumar) · 2009-11-01T05:14:46.733Z · LW(p) · GW(p)

Then I'm still unclear about what a world is. Care to explain?

Replies from: ramana-kumar## ↑ comment by Ramana Kumar (ramana-kumar) · 2009-11-01T08:28:40.294Z · LW(p) · GW(p)

Eliezer gave a simpler answer to my question: "yes". (I'm still not sure what yours means.)

Back to Peter's question. What makes you say decoherence doesn't happen on the Planck time scale? Can you explain that further?

Replies from: pengvado## ↑ comment by pengvado · 2009-11-01T14:47:08.914Z · LW(p) · GW(p)

Any given instance of decoherence is an interaction between two or more particles. And all known interactions take rather longer than Planck time.

There probably are enough decoherence events in the universe that at least one occurs somewhere in each Plank timeunit. But that doesn't instantly decohere everything. Other objects remain coherent until they interact with the decohered system, which is limited by the rate at which information propagates (both latency and bandwidth) (unless of course they decohere on their own). i.e. after a blob of amplitude has split, the sub-blobs are only separated along some dimensions of configuration space, and retain the same cross-section along the rest of the dimensions (hence "factors").

Replies from: jschulter## ↑ comment by jschulter · 2010-10-08T06:53:37.331Z · LW(p) · GW(p)

Okay, given one sub-decoherence event per planck time, somewhere in the universe, propagating throughout it at some rate less than or equal to the speed of light...we either have constant (one per planck time or less) full decoherence events after some fixed time as each finishes propagating sufficiently, or we have *no* full decoherence events at all as the sub-decoherences fail to decohere the whole sufficiently.

The latter seems more realistic, especially given the light speed limit, as the expansion of space can completely causally isolate two parts of the universe preventing the propagation of the decoherence.

So, with this understood, we're left to determine how large a portion of the universe has to be decohered to qualify as a "decoherence event" in terms of the many worlds theories which rely on the term. I honestly doubt that, once a suitable determination has been made, the events will be infrequent in almost any sense of the word. It really does seem, given the massive quantities of interactions in our universe(even just the causally linked subspace of it we inhabit), that the frequency of decoherence events should be ridiculously high. And given some basic uniformity assumptions, the rate should be quite regular too.

## comment by Psy-Kosh · 2008-05-01T22:18:22.000Z · LW(p) · GW(p)

Stephen: I don't have a postscript viewer.

Wait, I thought the superpower stuff only happens if you allow nonlinear transforms, not just nonunitary. Let's add an additional restriction: let's actually throw in some notion of locality, but even with the locality, abandon unitaryness. So our rules are "linear, local, invertable" (no rescaling aftarwards... not defining a norm to preserve in the first place)... or does locality necessitate unitarity? (is unitarity a word? Well, you know what I mean. Maybe I should say orthognality instead?)

Well, actually, also same question here I asked Eliezer. If you *didn't* know squared amplitudes corresponded to probability of experiencing a state, would you still be able to derive "nonunitary operator -> superpowers?"

Anyways, let's turn it around again. Let's say we didn't know the Born rule, but we did already know some other way that all state vectors must evolve via a unitary operator.

So from there we may notice sum/integral of squared amplitude is conserved, and that by appropriate scaling, total squared amplitude = 1 always.

Looks like we may even notice that it happens to obey the axioms of probability. (it *looks* like the quanity in question does automatically do so, given only unitary transforms are allowed.)

Is the mere fact that the quantity does "just happen" to obey the axioms of probability, on its own, help us here? Would that at least help answer the "why" for the Born rule? I'd think it would be relevant, but, thinking about it, I don't see any obvious way to go from there to "therefore it's the probability we'll experience something..."

Yep, my confusion is definately shuffled.

hrgflargh... (That's the noise of frustrated curiousity. :D)

## comment by Stephen · 2008-05-01T23:21:14.000Z · LW(p) · GW(p)

"If you *didn't* know squared amplitudes corresponded to probability of experiencing a state, would you still be able to derive "nonunitary operator -> superpowers?""

Scott looks at a specific class of models where you assume that your state is a vector of amplitudes, and then you use a p-norm to get the corresponding probabilities. If you demand that the time evolutions be norm-preserving then you're stuck with permutations. If you allow non-norm-preserving time evolution, then you have to readjust the normalization before calculating the probabilities in order to make them add up to 1. This readjustment of the norm is nonlinear. It results in superpowers. The paper in pdf and other formats is here.

## comment by Psy-Kosh · 2008-05-02T00:37:39.000Z · LW(p) · GW(p)

Stephen: Aaah, okay. And yeah, that's why I said no rescaling.

I mean, if one didn't already have the "probability of experiencing something is linear in p-norm..." thing, would one still be able to argue superpowers?

From your description, it looks like he still has to use the princple of "probability of experiencing something proportional to p-norm" to justify the superpowers thing.

Browsed through the paper, and, if I interpreted it right, that is kinda what it was doing... Assume there's some p-norm corresponding to probability. But maybe I misunderstood.

Eliezer: oh, mind elaborating on 'Historical note: If "observing a particle's position" invoked a mysterious event that squeezed the amplitude distribution down to a delta point, or flattened it in one subspace, this would give us a different future amplitude distribution from what decoherence would predict. All interpretations of QM that involve quantum systems jumping into a point/flat state, which are both testable and have been tested, have been falsified.'? Thanks.

## comment by Douglas_Knight3 · 2008-05-02T05:26:18.000Z · LW(p) · GW(p)

*are all the norms invariant under permutation of the indices p-norms?*

Well, you answered that exact question, but here's a description of all norms (on a finite dimensional real vector space): a norm determines the set of all vectors of norm less than or equal to 1. This is convex and symmetric under inverting sign (if you wanted complex, you'd have to allow multiplication by complex units). It determines the norm: the norm of a vector is the amount you have to scale the set to envelope the vector. Any set satisfying those conditions determines a norm.

So there are a lot of norms out there. eg, you can take a cylinder in 3-space (one of your examples). You could take a hexagon in the plane. This norm allows the interchange of coordinates, but it has a bigger symmetry group, though still finite. (I guess one could write this as max(|x|,|y|,|x-y|))

## comment by Tim_Tyler · 2008-05-15T20:38:27.000Z · LW(p) · GW(p)

Weren't the Born probabilities successfully derived from decision theory for the MWI in 2007 by Deutsch: "Probabilities used to be regarded as the biggest problem for Everett, but ironically, they are now its most powerful success" - http://forum.astroversum.nl/viewtopic.php?p=1649

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-17T18:20:56.556Z · LW(p) · GW(p)

There are a couple of recent papers on this topic:

- A formal proof of the Born rule from decision-theoretic assumptions by David Wallace
- Has the Born rule been proven? by J. Finkelstein

I personally find Finkelstein's response/counterargument convincing.

Replies from: Will_Newsome## ↑ comment by Will_Newsome · 2012-05-05T10:51:52.536Z · LW(p) · GW(p)

Hm, Wei_Dai(2009) seems to have a notion of rationality that is quite permissive if he's *convinced* by Finkelstein. If rationality isn't in fact permissive and instead stringently requires diachronic consistency (exceptionlessness, updatelessness, pre-rational priors) then I don't think Finkelstein's arguments are convincing. And there are positive arguments, e.g. by Derek Parfit, that rationality *is* normatively "thick".

## comment by Dihymo · 2008-06-01T04:24:51.000Z · LW(p) · GW(p)

If anyone can produce a cellular automata model that can create circles like those which relate to the inverse square of distance or the stuff of early wave mechanics, I think I can bridge the MWI view and the one universe of many fidgetings view that I cling to. I know of one other person who has a similar idea, unfortunately his idea has a bizarre quantity which is the square root of a meter.

## comment by Neil_B. · 2008-06-09T02:07:36.000Z · LW(p) · GW(p)

Consider for example what "scattering experiments" show, in a context of imagining that the universe is made of fields and that only "observation" makes a manifestation in a small region of space? I mean, suppose we think of the "observations" as being our detecting the impacts of the "scattered" electrons rather than the scatterings themselves. (IOW, we don't consider "mere" interactions to be observations - whatever that means.) But then why and how did the waves representing the electrons scatter as if off little concentrations when they were interpenetrating? And, what of the finding that electrons are "points" as far as we can tell, from scattering experiments? Note that the scattering is based on imagining one charge "source" being affected by another source's central inverse-square field, nothing that makes a lot of sense in terms of spread-out waves. Note also that the scattering is not a specific "impact" like that of billiard balls, since it is a matter of degree (how close one electron approaches another, still not touching since they don't have extensions with a discontinuity like a hard ball - and the very term "how close" betrays an existing pointness.) And so on ... IOW, it's worse than you think.

On a different note, it is supposed to be impossible to find out certain things about the wave function, like its particular shape. We are supposed to only be able to find out, whether it passed or failed to pass the test for chance of a particular eigenstate (like, a linear polarized photon having a greater chance of passing a linear filter of similar orientation, but we wouldn't be able to find out directly it had been produced with a 20 degree orientation of polarization.) However, I thought of a way to perhaps do such a thing. It involves passing a polarized photon through two half-wave plates over and over, say with reflections. The first plate collects a little bit of average spin from each pass of the photon, due to the inverting of photon spin by such a HWP. The second HWP reverts the photon's spin (superposed value, the "circularity") back to it's original value so it will reenter the first HWP with the same value of circularity each time.

After many passes, angular momentum transfer S should accumulate in the first plate along a range of values. S = 2nC hbar, where n is number of passes, and C is the "circularity" based on how much RH and LH is superposed in that photon. So for example, a photon that came out of a linear pol. filter would show zero net spin in such a device, elliptical photons would show intermediate spin, and CP photons would show full spin of S = 2n hbar. It isn't at all like having eigenstate filters. Having an indication along a range is not supposed to be possible (projection postulate), and is reminiscent of Y. Aharonov's "weak measurement" ideas.

Replies from: timtyler## ↑ comment by timtyler · 2009-08-18T18:56:01.644Z · LW(p) · GW(p)

Re: "If anyone can produce a cellular automata model that can create circles like those which relate to the inverse square of distance"

Producing such a cellular automaton model is trivial. See my:

Gallery:

http://finitenature.com/interference_gallery/

Java CA program that made the images:

## comment by Wei_Dai · 2009-09-17T07:13:16.171Z · LW(p) · GW(p)

My guess is that the Born's Rule is related to the Solomonoff Prior. Consider a program P that takes 4 inputs:

- boundary conditions for a wavefunction
- a time coordinate T
- a spatial region R
- a random string

What P does is take the boundary conditions, use Schrödinger's equation to compute the wavefunction at time T, then sample the wavefunction using the Born probabilities and the random input string, and finally output the particles in the region R and their relative positions.

Suppose this program, along with the inputs that cause it to output the description of a given human brain, is what makes the largest contribution to the probability mass of the bitstring representing that brain in the Solomonoff Prior. This seems like a plausible conjecture (putting aside the fact that quantum mechanics isn't actually the TOE of this universe).

(Does anyone think this is *not* true, or if it is true, has nothing to do with the answer to the mystery of "why squared amplitudes"?)

This idea seems fairly obvious, but I don't recall seeing it proposed by anyone yet. One possible direction to explore is to try to prove that any modification to Born's rule would cause a drastic decrease in the probability that P, given random inputs, would output the description of a sentient being. But I have no idea how to go about doing this. I'm also not sure how to develop this observation/conjecture into a full answer of the mystery.

Replies from: cousin_it, pengvado, Mitchell_Porter, Vladimir_Nesov, Vladimir_Nesov, Wei_Dai## ↑ comment by cousin_it · 2009-09-17T07:24:03.248Z · LW(p) · GW(p)

The Solomonoff prior depends on the encoding of algorithms, the Born rule doesn't. Or am I missing anything?

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-17T17:18:10.914Z · LW(p) · GW(p)

That seems like a general argument against the whole Solomonoff Induction approach. I'd be happy to see the dependence on an encoding of algorithms removed, but until someone finds a way to do so, it doesn't seem to be a deal-breaker. I think my claim should apply to any encoding of algorithms one might use that isn't contrived specifically to make it false.

## ↑ comment by pengvado · 2009-09-17T07:41:35.100Z · LW(p) · GW(p)

Is it possible (I'm not sure it makes sense to ask about easy) under our physics to build an intelligence that optimizes (or at least a structure that propagates itself) according to some metric other than the Born Rule? If not, then it should be anthropically unsurprising that we perceive probability as squared amplitude, even if there is no law of physics to that effect. Otoh if it *is* possible, then you could have a TOE from which you can't derive how to compute probability, and there's *nothing wrong with that*, because then there really is another way to interpret probability that other people in the universe (though of course not in our Everett branch) may be using.

Fair rephrasing?

## ↑ comment by Mitchell_Porter · 2009-09-17T07:44:15.893Z · LW(p) · GW(p)

Hello Wei Dai. Your paradigm is a bit opaque to me. There's a cosmology here which involves programs, program outputs, and probability distributions over each, but I can't tell what's supposed to exist. Just the program outputs? The program outputs *and* the programs? Does the program correspond to "basic physical law", and program output to "the physical world"?

If I try to abstract away from the metaphysical idiosyncrasies, the idea seems to be that Born's rule is true because the worlds which function according to Born's rule are the majority of the worlds in which sentient beings show up. Well, it could be true. But here's an interesting Bohmian fact: if you start out with an ensemble of Bohmian worlds deviating from the Born distribution, they will actually converge on it, solely due to Bohmian dynamics. (See quant-ph/0403034.) So something like the Bohmian equation of motion may actually be the more fundamental fact.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-18T20:17:52.439Z · LW(p) · GW(p)

In general, I think what exists are mathematical structures, which include computations as a subclass.

But here's an interesting Bohmian fact: if you start out with an ensemble of Bohmian worlds deviating from the Born distribution, they will actually converge on it, solely due to Bohmian dynamics.

Thanks for the link. That looks interesting, and I have a couple of questions that maybe you help me with.

*Why*do they converge to the Born distribution? The authors make an analogy with thermal relaxation, but there is a standard explanation of the second law of thermodynamics in terms of sizes of macrostates in configuration space, and I don't see what the equivalent explanation is for Bohmian relaxation.- What about decoherence? Suppose you have a wavefunction that has decohered into two approximately non-interacting branches occupying different parts of configuration space. If you start with a Bohmian world that belongs to one branch, then in all likelihood its future evolution will stay within that branch, right? Now if you take an ensemble of Bohmian worlds that all belong to that branch, how will it converge to the Born distribution, which occupies both branches?
- This is more of an objection to the Bohmian ontology than a question. If you look at Bohmian Mechanics as a computation, it consists of two parts: (1) evolution of the wavefunction, and (2) evolution of a point in configuration space, guided by the wavefunction. But it seems like all of the real work is being done in part 1. If you wanted to simulate a quantum system, for example, it seems sufficient to just do part 1, and then sample the resulting wavefunction according to Born's rule, and part 2 adds more complexity and computational burden without any apparent benefit.

## ↑ comment by Mitchell_Porter · 2009-09-21T05:21:48.324Z · LW(p) · GW(p)

"*Why* do they converge to the Born distribution?"

Let's distinguish two versions of this question. First version: why does a generic non-Born ensemble of Bohmian worlds tend to become Born-like? I think the technical answer is to be found in footnote 9 and the discussion around equation 20. But ultimately I think it will come back to a Liouville theorem *in the space of distributions*. There is some natural metric under which the Born-like distributions are the majority. (Or perhaps it is that non-Born regions are traversed relatively quickly.)

Second version: why does an individual Bohmian world contain a Born distribution of outcomes? This follows from the first part. An individual Bohmian world consists of a universal wavefunction and a quasiclassical trajectory. If you pick just a few of the classical variables, you can construct a corresponding reduced density matrix in the usual fashion, and a reduced Bohmian equation of motion in which the evolution of those variables depends on that density matrix and on influences coming from all the degrees of freedom that were traced over. So when you look at all the instances, within a single Bohmian history, of a particular physical process, you are looking at an ensemble of noisy Bohmian microhistories. The argument above suggests that even if this starts as a non-Born ensemble, it will evolve into a Born-like ensemble. The only complication is the noise factor. But it is at least plausible that in the majority of Bohmian worlds, this nonlocal noise is just noise and does not introduce an anti-Born tendency.

From an all-worlds-exist perspective, which we both favor, I would summarize as follows: (1) the Born distribution is the natural measure on the subset of worlds consisting of the Bohmian worlds (2) most Bohmian worlds will exhibit an *internal* Born distribution of physical outcomes. At present these are conjectures rather than theorems, but I would consider them plausible conjectures in the light of Valentini's work.

"What about decoherence?"

As we've just discussed, Bohmian dynamics both preserves exact Born distributions *and* evolves non-Born distributions towards Born-like distributions (and this is true for *subsystems* of a Bohmian world as well as for the whole). So the sub-ensembles in the decohered branches will preserve or evolve towards Born.

"part 2 adds more complexity and computational burden without any apparent benefit"

This is a complicated matter to discuss, not least because there is an interpretation of Bohmian mechanics, the nomological interpretation, according to which the "wavefunction" is a law of motion and not a thing. In nomological Bohmian mechanics, the configuration is all that exists, evolving according to a nonlocal potential.

## ↑ comment by Vladimir_Nesov · 2009-09-17T08:10:23.443Z · LW(p) · GW(p)

Epistemic hygiene alert!

## ↑ comment by Vladimir_Nesov · 2009-09-17T14:50:13.685Z · LW(p) · GW(p)

Suppose this program, along with the inputs that cause it to output the description of a given human brain, is what makes the largest contribution to the probability mass of the bitstring representing that brain in the Solomonoff Prior.

More specifically, to replace my previous summary comment: the above statement sounds kind-a redeemable, but it's so vague and common-sensually absurd that I think it makes a negative contribution. Things like this need to be said clearly, or not at all. It invites all sorts of kookery, not just with the format of presentation, but in own mind as well.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-17T16:51:21.505Z · LW(p) · GW(p)

Huh, that's a surprising response. I thought that at least the intended meaning would be obvious for someone familiar with the Solomonoff Prior. I guess "vague" I can address by making my claim mathematically precise, but why "common-sensually absurd"?

Replies from: Vladimir_Nesov## ↑ comment by Vladimir_Nesov · 2009-09-17T17:13:40.291Z · LW(p) · GW(p)

Re absurd: It's not clear *why* you would say something like the quote.

## ↑ comment by Wei_Dai · 2009-09-17T17:44:40.717Z · LW(p) · GW(p)

I was hoping that it would trigger an insight in someone who might solve this mystery for me. As I said, I'm not sure how to develop it into a full answer myself (but it might be related to this other vague/possibly-absurd idea).

Perhaps I'm abusing this community by presenting ideas that are half-formed and "epistemically unhygienic", but I expect that's not a serious danger. It seems like a promising direction to explore, that I don't see anyone else exploring (kind of like UDT until recently). I have too many questions I'd like to see answered, and not enough time and ability to answer them all myself.

## ↑ comment by Wei_Dai · 2009-09-20T01:24:11.376Z · LW(p) · GW(p)

I just read in Scott Aaronson's Quantum Computing, Postselection, and Probabilistic Polynomial-Time that if the exponent in the probability rule was anything other than 2, then we'd be able to do postselection without quantum suicide and solve problems in PP. (See Page 6, Theorem 6.) The same is true if quantum mechanics was non-linear.

Given that, my conjecture is implied by one that says "sentience is unlikely to evolve in a world where problems in PP (which is probably strictly harder than PH, which is probably strictly harder than NP) can be easily solved" (presumably because intelligence wouldn't be useful in such a world).

Replies from: Jordan## ↑ comment by Jordan · 2009-09-29T07:04:53.531Z · LW(p) · GW(p)

Interesting. What would such a world look like? I imagine instead of a selection pressure for intelligence there would be a selection pressure for raw memory, so that you could perfectly model any creature with less memory than yourself. It seems that this would be a very intense pressure, since the upper hand is essentially guaranteed superiority, and you would ultimately wind up with galaxy sized computers running through all possible simulations of other galaxy sized computers.

I never put much stock in the simulation hypothesis, because I couldn't see why an entity capable of simulating our universe would derive any value from doing it. This scenario makes me rethink that a little.

In any case, while this is another potential reason why the rule must be 2 in our universe, it still doesn't shed any light on the *mechanism* by which our subjective experience follows this rule.

## ↑ comment by Wei_Dai · 2009-09-29T08:17:45.480Z · LW(p) · GW(p)

What would such a world look like?

I don't know. I don't have a very good understanding of regular quantum computing, much less the non-Born "fantasy" quantum computers that Aaronson used in his paper. But I'm going to guess that your speculation is probably wrong, unless you happen to be an expert in this area. These things tend not to be very intuitive at all.

Replies from: Jordan## comment by ksvanhorn · 2011-02-07T08:19:10.119Z · LW(p) · GW(p)

The Transactional Interpretation of QM resolves the mystery of where this nonlinear squared modulus comes from quite neatly. On that basis alone, I'm surprised that Eliezer doesn't even mention it as a serious rival to MWI.

See http://www.npl.washington.edu/npl/int_rep/tiqm/TI_toc.html

Replies from: endoself## ↑ comment by endoself · 2011-02-07T08:30:46.459Z · LW(p) · GW(p)

Don't the transactional interpretation's followers claim that standard QM gives the wrong result on the Afshar experiment? Or is that not all of them?

Replies from: ksvanhorn## ↑ comment by ksvanhorn · 2011-02-07T08:39:28.126Z · LW(p) · GW(p)

Cramer argues that both Copenhagen and MWI are inconsistent with the results of the Afshar experiment.

Replies from: endoself## ↑ comment by endoself · 2011-02-07T08:51:36.556Z · LW(p) · GW(p)

Yeah, but he's wrong. Almost no physicists accept his argument as mathematically valid. If the transactional interpretation *does* give different results, then it is incompatible with experiment.

## ↑ comment by AlephNeil · 2011-03-25T14:49:18.549Z · LW(p) · GW(p)

Almost no physicists accept his argument as mathematically valid.

If you're talking about the Afshar experiment, Unruh demolished that convincingly. We don't need to take it on trust that Afshar is wrong.

*However*, Afshar and Cramer were only ever arguing about the interpretation of the results of Afshar's experiment, not what those results would be. It would be most unwise to rule out the transactional interpretation just because its inventor subsequently said something foolish.

## ↑ comment by endoself · 2011-03-27T04:18:11.994Z · LW(p) · GW(p)

See the grandparent; Cramer justified the transactional interpretation by saying that it was the only interpretation able to give the correct result for the Afshar experiment. This being wrong removes much of the claimed evidence.

Replies from: AlephNeil## comment by Viktor · 2011-04-05T10:19:24.249Z · LW(p) · GW(p)

First of all - great sequence! I had a lot of 'I see!'-moments reading it. I study physics, but often the clear picture gets lost in the standard approach and one is left with a lot of calculating techniques without any intuitive grasp of the subject. After reading this I became very fond of tutoring the course on quantum mechanics and always tried to give some deeper insight (many of which was taken from here) in addition to just explaining the exercises. If I am correct, the world mangling theory just tries to explain some anomalies, but the rule of squared moduli is well established and can be derived. Let me try an easy explanation: The basic principle is that if one defines how the measurement equipment reacts to all pure states (amplitude 1 for one configuration, 0 for all else), one has no freedom left to define how it reacts to mixed states. I think the only prerequisite is that time evolution is linear. From here one can derive the No-Cloning theorem: Suppose you have two systems, one being in the 'ready to store a copy' state |0> and one having the two possibilities |1> and |2> (and of course every linear combination of those, so a combination of a|1>+b|2> will have an amplitude of a for the configuration |1> and b for |2>). Now you set up some interaction which tries to copy the state of the second system onto the first. So:

- |0>|1> evolves into |1>|1>.
- |0>|2> evolves into |2>|2>. But if we have a combination
- |0>(a|1>+b|2>)=a|0>|1>+b|0>|2>, this will be mapped onto
- a|1>|1>+b|2>|2> and not just clone the state, which would give
- (a|1>+b|2>)(a|1>+b|2>)=a²|1>|1>+ab|1>|2>+ab|2>|1>+b²|2>|2>. So it is not possible to copy the whole state of a system, but it is possible to choose a basis and then copy the state if it is one of the basis vectors. So the basic measurement process would just copy the state of the system onto another system as good as possible (hence the so-called Heisenberg Uncertainty Principle - one has to choose according to which basis the measurement is coupled to the system). From the basis states of the composite system (|0>|x>, |x>|x>, x=1,2) one can construct a scalar product such that every vector has length 1 and they are orthogonal to each other:
- |x>=1, |x>=0 etc. So the time evolution obviously conserves the length of the basis vectors - but since we could also have chosen another basis, it has to conserve also the length of mixed states (this step may be not so rigorous but at least makes the square rule much more plausible that any other). So the state (a|1>|1>+b|2>|2>) has to have length 1 and if we compute it we get
- 1=(|1>+b|2>|2>)=|a|²|1>+|b|²|2>+0=|a|²+|b|². So the squared moduli add to 1 (Pythagoras sends his regards). Furthermore, if the 'original' system had three possibilities, but the copy process mapped
- |0>|1> onto |1>|1>
- |0>|2> onto |2>|2>
- |0>|3> onto |1>|3> (!), we had
- |0>(a|1>+b|2>+c|3>) maps onto |1>(a|1>+c|3>)+b|2>|2>. Mathematically, one can 'trace out' the influence of the original system - graphicly one just sees that the length of the part with the copied system in |1> is the length of the vector a|1>+c|3>, namely |a|²+|c|², while the other part has the length |b|². Thus the Born probabilities are added when grouping states together in the process of copying them - which could be responsible for the connection of the Born rule to the process of creating anticipations and so forth. Of course a measurement and the coupling of our brains to a system is not just copying the states - but the same argumentation holds since every sensitive coupling of another system to the original system can only be defined on some basis - the way the measurement reacts to combinations of states is determined from there and is not open to manipulation. So the Born rule is not a great mystery - although some of the steps may lack some rigor, it is far more plausible than for example just the modulus or some other power of it. I hope this clears up some confusion, Viktor

## comment by Viktor · 2011-04-05T10:22:16.374Z · LW(p) · GW(p)

First of all - great sequence! I had a lot of 'I see!'-moments reading it. I study physics, but often the clear picture gets lost in the standard approach and one is left with a lot of calculating techniques without any intuitive grasp of the subject. After reading this I became very fond of tutoring the course on quantum mechanics and always tried to give some deeper insight (many of which was taken from here) in addition to just explaining the exercises. If I am correct, the world mangling theory just tries to explain some anomalies, but the rule of squared moduli is well established and can be derived. Let me try an easy explanation:

The basic principle is that if one defines how the measurement equipment reacts to all pure states (amplitude 1 for one configuration, 0 for all else), one has no freedom left to define how it reacts to mixed states. I think the only prerequisite is that time evolution is linear. From here one can derive the No-Cloning theorem: Suppose you have two systems, one being in the 'ready to store a copy' state |0> and one having the two possibilities |1> and |2> (and of course every linear combination of those, so a combination of a|1>+b|2> will have an amplitude of a for the configuration |1> and b for |2>). Now you set up some interaction which tries to copy the state of the second system onto the first. So:

- |0>|1> evolves into |1>|1>.
- |0>|2> evolves into |2>|2>.

But if we have a combination

- |0>(a|1>+b|2>)=a|0>|1>+b|0>|2>,

this will be mapped onto

- a|1>|1>+b|2>|2>

and not just clone the state, which would give

- (a|1>+b|2>)(a|1>+b|2>)=a²|1>|1>+ab|1>|2>+ab|2>|1>+b²|2>|2>.

So it is not possible to copy the whole state of a system, but it is possible to choose a basis and then copy the state if it is one of the basis vectors. So the basic measurement process would just copy the state of the system onto another system as good as possible (hence the so-called Heisenberg Uncertainty Principle - one has to choose according to which basis the measurement is coupled to the system). From the basis states of the composite system (|0>|x>, |x>|x>, x=1,2) one can construct a scalar product such that every vector has length 1 and they are orthogonal to each other:

- |x>=1, |x>=0 etc.

So the time evolution obviously conserves the length of the basis vectors - but since we could also have chosen another basis, it has to conserve also the length of mixed states (this step may be not so rigorous but at least makes the square rule much more plausible that any other). So the state (a|1>|1>+b|2>|2>) has to have length 1 and if we compute it we get

- 1=(|1>+b|2>|2>)=|a|²|1>+|b|²|2>+0=|a|²+|b|².

So the squared moduli add to 1 (Pythagoras sends his regards). Furthermore, if the 'original' system had three possibilities, but the copy process mapped

- |0>|1> onto |1>|1>
- |0>|2> onto |2>|2>
- |0>|3> onto |1>|3> (!),

we had

- |0>(a|1>+b|2>+c|3>) --> |1>(a|1>+c|3>)+b|2>|2>.

Mathematically, one can 'trace out' the influence of the original system - graphicly one just sees that the length of the part with the copied system in |1> is the length of the vector a|1>+c|3>, namely |a|²+|c|², while the other part has the length |b|². Thus the Born probabilities are added when grouping states together in the process of copying them - which could be responsible for the connection of the Born rule to the process of creating anticipations and so forth. Of course a measurement and the coupling of our brains to a system is not just copying the states - but the same argumentation holds since every sensitive coupling of another system to the original system can only be defined on some basis - the way the measurement reacts to combinations of states is determined from there and is not open to manipulation. So the Born rule is not a great mystery - although some of the steps may lack some rigor, it is far more plausible than for example just the modulus or some other power of it.

I hope this clears up some confusion,

Viktor

Replies from: Viktor## ↑ comment by Viktor · 2011-04-10T17:41:40.819Z · LW(p) · GW(p)

Hm, just read the article again and saw that many of this was already explained there. But the essential point is that although the full information of a system is given by the amplitude distribution over all possible configurations, this information is not accessible to another system. When we try to couple the system to another (for example, by copying the state), this only respects the pure 'classical' states as described above. Thus it is possible to ask the question 'how much have these two states in common', where one classical state compared with itself gives one and with another one 0. If we want to also be able to compare mixed states, the notion of a scalar product comes in. The squared modulus is just the comparison of a state with itself, which is constantly 1 - obviously, the state has a hell lot in common with itself.

## comment by paulfchristiano · 2011-04-05T12:51:41.408Z · LW(p) · GW(p)

Suppose that the probability of an observer-moment is determined by its complexity, instead of the probability of a universe being determined by its complexity and the probability of an observation within that universe being described by some different anthropic selection.

You can specify a particular human's brain by describing the universal wave function and then pointing to a brain within that wave function. Now the mere "physical existence" of the brain is not relevant to experience; it is necessary to describe precisely how to extract a description of their thoughts from the universal wave function. The significance of the observer moment depends on the complexity of this specification.

How might you specify a brain within the universal wavefunction? The details are slightly technical, but intuitively: describe the universe, specify a random seed to an algorithm which samples classical configurations with probability proportional to the amplitude squared, and then point to the brain within the resulting configuration.

Of course, you could also write down the algorithm which samples classical configurations with probability proportional to the amplitude, or the amplitude cubed, etc. and I would have to predict that all of the observer-moments generated in this way also exist. In the same sense, I would have to predict that all of the observer-moments generated by other laws of physics also exist, with probability decaying exponentially with the complexity of those laws (and notice that observer moments generated according to QM with non-Born probabilities are just as foreign as observer moments generated with wildly different physical theories).

Why do we expect the Born rules to hold when we perform an experiment today? The same reason we expect the same laws of physics that created our universe to continue to apply in our labs. More precisely:

In order to find the blob of amplitude which corresponds to Earth as we know it, you have to use the Born probabilities to sample. If you use some significantly different distribution then physics looks *completely* different. There are probably no stars behaving like we expect stars to behave, atoms don't behave reasonably, etc. So in order to pick out our Earth you need to use the Born probabilities.

You could describe a brain by saying "Use the Born probabilities to find human society, and then use this other sampling method to find a brain" or maybe "Use the Born probabilities everywhere except for this experimental outcome." But this is only true in the same sense that you could specify a configuration for the universe by saying "Use these laws of physics for a while, and then switch to these other laws." We don't expect it because non-uniformity significantly increases complexity.

As far as I can tell, the remaining mystery is the same as "why these laws of physics?" An observation like "If you use the probabilities cubed, you get one messed up universe." would be helpful to this question, as would an observation like "it turns out that there is a simple way to sample configurations with probability proportional to amplitude squared, but not amplitude," but neither observation is any more useful or necessary than "If you used classical probabilities instead of quantum probabilities, you wouldn't have life" or "it turns out that there is a very simple way to describe quantum mechanics, but not classical probabilities."

This question no longer seems mysterious to me; someone would have to give a convincing argument for me to keep thinking about it.

Replies from: cousin_it## comment by AnthonyC · 2011-04-06T17:57:16.345Z · LW(p) · GW(p)

could the flow of amplitude between blobs we normally think of as separated following a measurement possibly explain the quantum field theory prediction/phenomenon of vacuum fluctuations?

Replies from: Manfred## ↑ comment by Manfred · 2011-04-06T18:37:27.665Z · LW(p) · GW(p)

Nope. Vacuum fluctuations happen because the field that tells you whether there's a particle there or not behaves like a quantum thing and not a classical thing, and you end up with a non-boring vacuum state for the same reason atoms have non-boring ground states rather than collapsing in on themselves. Weird as all get out, but not quantum-mechanics-breaking, and measured reasonably well by the Casimir effect (though also *horribly wrong* because of the cosmological constant problem, but that's a problem for quantum gravity to sort out, not one that can be solved by big changes to already-tested parts of quantum mechanics).

## comment by drnickbone · 2012-02-15T01:26:57.535Z · LW(p) · GW(p)

I'm a bit puzzled by the problem here. What's wrong with the interpretation that the Born probabilities just *are* the limiting frequencies in infinite independent repetitions of the same experiment? Further, that these limiting frequencies really are defined because the universe really is spatially infinite, with infinitely many causally isolated regions. There is nothing hypothetical at all about the infinite repetition - it actually happens.

My understanding is that in such a universe model, the Everett-Wheeler version of quantum theory makes a precise prediction: the limiting frequencies will *with certainty* correspond to the Born probabilities because the amplitude vanishes completely over the subspace of Hilbert space where they don't. More formally, the wave function of the universe is in an eigenstate of the relative frequency operator with the eigenvalue equal to the Born probability. Job done, surely?

Is the objection here just that we don't want to believe that the universe is spatially infinite?

Well why on Earth(s) would a MWI fan have any problem with that at all? Is it really any harder to believe that each branch of the wave function describes a strictly infinite universe (but that these infinite universes are all essentially identical, because they all have the correct frequency limits) than to believe that each branch describes a finite universe, and that while some of the branches get the frequency limits right, most of them don't?

Replies from: Viktor## ↑ comment by Viktor · 2012-03-17T23:18:17.752Z · LW(p) · GW(p)

That gave me, if I am not mistaken, the last piece of the puzzle. Let's just take the naive definition of probability - the relative frequency of outcomes as N goes to infinity. Now prepare N systems independently in the state a|0>+b|1>. Now measure one after another - couple the measurement device to the system. At first we have
(a|0>+b|1>)^N * |0>.
Now the first one is measured:
(a|0>+b|1>)^(N-1) * (a|0,0>+b|1,1>)
where the number after the comma denotes the state of the measuring device, which just counts the number of measured ones. After the second measurement we have
(a|0>+b|1>)^(N-2) * (a²|00,0>+ab|01,1>+ab|10,1>+b²|11,2>)
Since the two states ab|01,1> and ab|10,1> are not distinguished by the measurement, the basis should be changed - and this is the crucial point: |01>+|10> has a length of sqrt(2), so if we change the basis to |+>=(|01>+|10>)/sqrt(2), we have
(a|0>+b|1>)^(N-2) * (a²|00,0>+ab*sqrt(2)*|+,1>+b²|11,2>).

The coefficiants are like in the binomial theorem, but note the sqare root!

Continuing, we will get something similar to a binomial distribution:

sum(k=0..N: sqrt(N!/(k!(N-k)!))*a^k * b^(N-k) |...,k>).

Now it remains to prove that for j/N not equal to a² the amplitudes go to zero as N goes to infinity. This is equivalent to the square of the amplitude going to zero (this is just to make the calculation easier, it does not have anything to do with the Born rule). It is, for |...,k>,

ck² = N!/(k!(N-k)!) * a²^k * b²^(N-k)

which becomes a Gaussian distribution for large N, with mean at k=Na² and width Na²b². So at k/N=a²+d it has a value proportional to exp(-(Nd)²/(2Na²b²))=exp(-Nd²/(2a²b²)) --> 0 as N --> inf.

So a time capsule where the records indicate that some quantum experiment has been performed a great number of times and the Born rule is broken will have an amplitude that goes to zero (yeah, I just read Barbour's book).

Replies from: drnickbone## ↑ comment by drnickbone · 2012-03-20T00:28:51.216Z · LW(p) · GW(p)

Yes, this is called the Finkelstein-Hartle theorem (D. Finkelstein, Transactions of the New York Academy of Sciences 25, 621 (1963); J. B. Hartle, Am. J. Phys. 36, 704 (1968)).

This theorem is the basis for constructing a limit operator for the relative frequency when there are *infinitely* many independent repetitions of a measurement, and showing that the product wave-function is an exact eigenstate of the relative frequency operator. Unfortunately, it seems that Hartle's construction of the frequency operator wasn't quite right, and needed to be generalized. (E. Farhi, J. Goldstone, and S. Gutmann, Ann. Phys. 192, 368 (1989)).

Even so, the critics are still picky about the construction. There is a line of criticism that infinite frequency operators can be constructed arbitrarily as functions over Hilbert space, and unless you already know the Born rule, you won't know how to construct one sensibly (so that the Hartle derivation is circular). However this seems unfair, because if you want the relative frequency operator to obey the Kolmogorov axioms of probability then it has to coincide with the Born rule, something which is another long-standing result called Gleason's theorem. (The squared modulus of the amplitude is the *only* function of the measure which follows the axioms of probability.) Hence the full derivation is:

1) (Postulate) If the wavefunction is in an eigenstate of a measurement operator, then the measurement will with certainty have the corresponding eigenvalue.

2) (Postulate) Probability is relative frequency over infinitely many independent repetitions.

3) (Postulate) Relative frequency follows the Kolmogorov axioms of probability.

4) (Gleason's theorem) Relative frequency must converge to the Born rule (squared modulus of amplitude) over infinitely many repetitions, or it won't be able to follow the Kolmogorov axioms.

5) (Hartle's theorem, as strengthened by Farhi et al) There is a unique definition of the relative frequency operator over infinite repetitions, and such that the infinite product state is an eigenstate of the relative frequency operator.

6) (Conclusion) The relative frequency over infinitely many measurements is with certainty the Born probability.

It seem pretty clean to me.

## comment by Alsadius · 2012-07-01T21:42:10.797Z · LW(p) · GW(p)

Perhaps I'm being too simplistic, but I see a decent explanation that doesn't get as far into the weeds as some of the others. It's proportional to the square because both the event being observed *and the observer* need to be in the same universe. If the particle can be in A or B, the odds are:

P(A)&O(A) = A^2

P(B)&O(B) = B^2

P(A)&O(B) = Would be AB, but this is physically impossible.

P(B)&O(A) = Would be AB, but this is physically impossible.

Squares fall out naturally.

Replies from: pragmatist## ↑ comment by pragmatist · 2012-07-01T22:06:45.895Z · LW(p) · GW(p)

There are a number of reasons this solution does not work. Here is one problem with the solution that does not require any discussion of the formalism or interpretation of quantum theory:

According to you, the location of the particle and the location of the observer are correlated (this follows from the fact that some combinations are physically impossible). If that's the case, you can't calculate the probability of the conjunction by multiplying the probabilities of the conjuncts. That only works if the conjuncts are uncorrelated.

More broadly, based on what you propose here I don't think you have sufficient understanding of quantum mechanics to fully appreciate the nature of the problem or the kind of solution that would be required. Your comment suggests several fairly fundamental misunderstandings about the theory. I hope this doesn't come off as impolite or condescending. It's the kind of thing I'd want someone to say to me if they genuinely believed it (although that in itself doesn't entail that it isn't impolite or condescending).

Replies from: Alsadius## ↑ comment by Alsadius · 2012-07-04T22:48:01.179Z · LW(p) · GW(p)

I didn't expect something that simple had escaped everyone's notice(though I suppose I should have said that more explicitly in my post) - I threw it out there because it made sense at first glance and had no immediately obvious problems, not because I figured I had definitely cracked the problem. Easier to see if there's a known response than to try to figure it out myself. So no, I'm not annoyed by your response.

And I do think I see what you're getting at. Oh well, it was worth a shot.