# A note on the description complexity of physical theories

post by cousin_it · 2010-11-09T16:25:21.186Z · score: 19 (30 votes) · LW · GW · Legacy · 184 comments**Followup to:** The prior of a hypothesis does not depend on its complexity

Eliezer wrote:

In physics, you can get absolutely clear-cut issues. Not in the sense that the issues are trivial to explain. [...] But when I say "macroscopic decoherence is simpler than collapse" it is actually

strictsimplicity; you could write the two hypotheses out as computer programs and count the lines of code.

Every once in a while I come across some belief in my mind that clearly originated from someone smart, like Eliezer, and stayed unexamined because after you hear and check 100 correct statements from someone, you're not about to check the 101st quite as thoroughly. The above quote is one of those beliefs. In this post I'll try to look at it closer and see what it really means.

Imagine you have a physical theory, expressed as a computer program that generates predictions. A natural way to define the Kolmogorov complexity of that theory is to find the length of the shortest computer program that *generates* your program, as a string of bits. Under this very natural definition, the many-worlds interpretation of quantum mechanics is almost certainly simpler than the Copenhagen interpretation.

But imagine you refactor your prediction-generating program and make it shorter; does this mean the physical theory has become simpler? Note that after some innocuous refactorings of a program expressing some physical theory in a recognizable form, you may end up with a program that expresses a *different* set of physical concepts. For example, if you take a program that calculates classical mechanics in the Lagrangian formalism, and apply multiple behavior-preserving changes, you may end up with a program whose internal structures look distinctly Hamiltonian.

Therein lies the rub. Do we really want a definition of "complexity of physical theories" that tells apart theories making the same predictions? If our formalism says Hamiltonian mechanics has a higher prior probability than Lagrangian mechanics, which is demonstrably mathematically equivalent to it, something's gone horribly wrong somewhere. And do we even want to define "complexity" for physical theories that don't make any predictions at all, like "glarble flargle" or "there's a cake just outside the universe"?

At this point, the required fix to our original definition should be obvious: cut out the middleman! Instead of finding the shortest algorithm that writes your *algorithm* for you, find the shortest algorithm that outputs the same *predictions*. This new definition has many desirable properties: it's invariant to refactorings, doesn't discriminate between equivalent formulations of classical mechanics, and refuses to specify a prior for something you can never ever test by observation. Clearly we're on the right track here, and the original definition was just an easy fixable mistake.

But this easy fixable mistake... was the entire reason for Eliezer "choosing Bayes over Science" and urging us to do same. The many-worlds interpretation makes the same testable predictions as the Copenhagen interpretation right now. Therefore by the amended definition of "complexity", by the *right and proper* definition, they are equally complex. The truth of the matter is not that they express different hypotheses with equal prior probability - it's that they express the *same* hypothesis. I'll be the first to agree that there are very good reasons to prefer the MWI formulation, like its pedagogical simplicity and beauty, but K-complexity is not one of them. And there may even be good reasons to pledge your allegiance to Bayes over the scientific method, but this is not one of them either.

**ETA:** now I see that, while the post is kinda technically correct, it's horribly confused on some levels. See the comments by Daniel_Burfoot and JGWeissman. I'll write an explanation in the discussion area.

**ETA 2:** done, look here.

## 184 comments

Comments sorted by top scores.

MWI and Copenhagen do not make the same predictions in all cases, just in testable ones. There is a simple program that makes the same predictions as MWI in all cases. There appears to be no comparably simple program that makes the same predictions as Copenhagen in all cases. So, if you gave me some complicated test which could not be carried out today, but on which the predictions of MWI and Copenhagen differed, and asked me to make a prediction about what would happen if the experiment was somehow run (it seems likely that such experiments will be possible at some point in the extremely distant future) I would predict that MWI will be correct with overwhelming probability. I agree that if some other "more complicated" theory made the same predictions as MWI in every case, then K-complexity would not give good grounds to decide between them.

I guess the fundamental disagreement is that you think MWI and Copenhagen are the same theory because discriminating between them is right now far out of reach. But I think the existence of any situation where they make different hypotheses is precisely sufficient to consider them different theories. I don't know why "testable" (meaning testable in practice, not in theory) was thrown in at the last minute, because it does not seem to appear anywhere in the rest of the post.

If instead you are asserting that MWI and Copenhagen make the same theoretically testable predictions, then I disagree as a matter of fact. MWI asserts that interference should be able to occur on arbitrary scales, in particular on the scale of an entire planet or galaxy (even though such interference is spectacularly difficult to engineer and/or will have a very small effect on probability amplitudes), while Copenhagen seems to imply that it cannot occur on any scale larger than a human observer.

I wouldn't be surprised if I am wrong on that question of fact, and it would certainly be good for me to fix my error now if I am.

Copenhagen seems to imply that it [interference] cannot occur on any scale larger than a human observer.

I'm always skeptical when the opponent of an idea informs me of something ridiculous that the idea "seems to imply". If it were a defender of Copenhagen drawing that implication, I would be more likely to join you in hooting with disdain.

I am far from an expert on fundamental physics, but I seem to recall someone once pooh-poohing the notion that QM and Copenhagen are in any sense tied to *human* observers. After all, said this author, we use QM to explain the big bang and there were no observers there. Sorry, I don't remember who it was that made that point, so I can't give you a quote.

Does anyone else remember this better than me?

I say "seems to imply" because its not really clear what Copenhagen does or does not imply, because it doesn't really make predictions in every circumstance.

Copenhagen implies that when we make a measurement, the value we measure is in fact the one true value. In particular, there are not other worlds where the measurement returned other values. This is precisely the thing that distinguishes it from many worlds, which suggests that there are other worlds where the measurement returned other values.

By accepting that only one value of the measurement actually happens, you reject the possibility of one human civilization, where one thing was measured, interfering with a different human civilization, where a different thing was measured (because you don't believe the other human civilization even exists).

That is not a prediction, it is a postdiction, and it is a postdiction that Copenhagen and MWI agree on. And that is, when you make a measurement, the result you get is the only result you get in the world that you are in. Copenhagen and MWI agree on this.

MWI implies that you could interact with other worlds where there was a different outcome, at least in principle. This is not a postdiction because we have never engineered such an intricate situation , and MWI and Copenhagen don't agree on it. In fact this was proposed as a test of MWi vs. Copenhagen; build a quantum computer, upload a human into it, and then run the experiment I described above (to my knowledge, this is actually the first proposed use of a quantum computer).

Actually this isn't really an "in principle" thing. If we ignore gravity (I suspect that if we understood gravity correctly we wouldn't have to) and assume the universe has finitely many degrees of freedom, MWI predicts that all of these world lines will eventually converge in the future. It will just be a long time in the future, after all order in the universe has eroded and entropy is decreasing again. This is clearly not what Copenhagen says. In Copenhagen, it is possible that we never again return to the state of the universe at the big bang. In MWI this is not possible, because all systems with finitely many degrees of freedom are periodic.

Copenhagen seems to imply that it [interference] cannot occur on any scale larger than a human observer.

[...]

I am far from an expert on fundamental physics, but I seem to recall someone once pooh-poohing the notion that QM and Copenhagen are in any sense tied to human observers

Copenhagen implies that *under some circumstances*, interference stops. That's all that can be meant by "collapse". Maybe above some length scale; maybe above some critical mass; maybe above some number of interacting particles -- it's fuzzy on the details. And of course, if that scale happens to be *larger* than, oh, say, a person, then you are branching *and then having your branches destroyed* all the time.

So yes, if *everything* is allowed to interfere as naively implied by the Schrödinger equation, you're not talking about Copenhagen, you're talking about MWI.

Copenhagen implies that under some circumstances, a non-deterministic and non-continuous process happens. (I find the phrase "interference stops" as misleading.) The circumstances aren't defined by some scale, but, as Manfred says in the other reply, by what "observer" means. There is a completely analogous question in MWI, where, under some circumstances, branching occurs.

Another thing is, if *Copenhagen = objective collapse* in the LW parlance, then MWI isn't the only alternative to Copenhagen.

What is the "analogous question" in MWI? I can make predictions in any situation using MWI without answering any such questions, whereas in Copenhagen I make different predictions depending on what notion of "observation" I use. This is one major reason I prefer MWI.

The analogous question is "when does branching occur?" Your predictions in MWI depend on what notion of "observation" you use, it is only less apparent in the formulation. To obtain some meaningful prediction, at least you have to specify what are the observables and what subsystem of the world corresponds to the observer and his mind states.

But I am not completely sure what you are speaking about. Maybe you can give a concrete example where MWI gives a unique answer while collapse formulation doesn't?

Branching is not a physical phenomenon in MWI, it is a way humans talk about normal unitary evolution on large scales. It is not involved in making predictions, just in talking about them.

The typical example distinguishing MWI and Copenhagen is the following. Suppose I build a quantum computer which simulates a human. I then perform the following experiment. I send an electron through a slit, and have the quantum computer measure which slit it went through (that is, I tell the human being simulated which slit it went through). I then stop the electron, and let the simulated human contemplate his observation for a while. Afterwards, it is still possible for me to "uncompute" the simulated human's memory (just as it is in principle possible to uncompute a real human's state) and make him forget which slit the electron went through. The electron then proceeds to the screen and hits it. Is the electron distributed according to an interference pattern, or not?

If you think the answer to that question is obvious in Copenhagen, because the simulated human is obviously not an observer, then suppose instead that I replace the simulated human with a real human, maintained in such a carefully controlled environment that I can uncompute his observation (technically unrealistic, but theoretically perfectly possible).

If the answer to that question is also obvious, suppose I replace the real human with the entire planet earth.

MWI predicts an interference pattern in all of these cases. However, Copenhagen's prediction seems to depend on exactly which of those experiments have "observation" in them. Can a quantum computer observe? Can a single isolated human observe? Can an isolated planet observe? Does collapse occur precisely when "there is no possible way to uncompute the result"? The last would give the same predictions as MWI by design, but is really an astoundingly complex axiom and probably is never satisfied.

It's not about scale, it's about theory of measurement and what "observer" means. If electron 1 bounces off electron 2 in Copenhagen, electron 1 sees electron 2 as "collapsed" into one eigenstate. If electrons bounce in MWI, they see the entire spectrum of each other. However, this is really just a change what we're looking at - whether we want to "observe" all the information about the electron (MWI), or just one instance of that information (Copenhagen). The reason to look at only one instance is because this is what corresponds to what people see - it's what quantum physics looks like from inside.

I'm not aware of any examples of interference that are not explainable by the ordinary interpretation that uses collapse - I think it's likely that some people are interchanging the different ideas of observer and not remembering that to describe entangled states in Copenhagen you need to do more work than that. Once a state is entangled you can't describe a single (Copenhagen) observer by a pure state, which is probably what you're thinking when you think of "destroying the other branches."

What you are describing is to first compute the answer using Many Worlds, and then figure out where to apply the collapse in Copenhagen to not affect anything.

No.

What I am saying is to compute the answer using quantum mechanics.

The way to do it correctly, Copenhagen style, is to say "okay, the electron goes through the plate with two holes in it. But since, from the perspective of the electron, it can't go through two holes, the state of the electron on the other side should be entangled something like |10> + |01>. If we fast-forward to the screen, we get an interference pattern"

The way to do it correctly, MW style, is to say "okay, the electron has equal probability of going through each hole, so let's split into two worlds with equal phase. The detector will then observe the superposition of the two worlds, something like |10> + |01>, except fast-forwarded to the screen, so there should be an interference pattern."

If these two approaches look similar, there's a reason. And it's not that one is cribbing off the other! As you can see, introducing entanglement in the Copenhagen interpretation was definitely not arbitrary, but it is *conceptually trickier* than thinking through the same process using the MWI.

Does your understanding of Copenhagen Quantum Mechanics reject the conclusion of Many Worlds, that the universe is in superposition of many states, many of which can contain people, which can't observe each other?

If not, I think this has become an argument about definitions.

I'm actually pretty sure the Copenhagen Interpretation isn't complete/coherent enough to actually be turned into a computer program. It just waves it's hands around the Measurement Problem. The Occam's Razor justification for people want to make for Many Worlds needs to be made in comparison to the interpretations viable competitors like de Broglie Bohm and company.

I'm actually pretty sure the Copenhagen Interpretation isn't complete/coherent enough to actually be turned into a computer program. It just waves it's hands around the Measurement Problem.

I can't argue with that.

The Occam's Razor justification for people want to make for Many Worlds needs to be made in comparison to the interpretations viable competitors like de Broglie Bohm and company.

Bohm's theory is one of those hidden variable theories, which according to EPR must have something like faster than light signaling?

Bohm's theory is one of those hidden variable theories, which according to EPR must have something like faster than light signaling?

It is a hidden variable theory and as such it is non-local but the non-locality doesn't imply that we can use it for ftl signalling. The main problems are a.) you need to do weird things to it to make it Lorentz invariant and B.) it is less parsimonious than Many Worlds (as all hidden variable theories probably will be since they add more variables!). On the other hand it returns the Born probabilities (ED: which I guess I would argue makes it parsimonious in a different way since it doesn't have this added postulate).

I don't really know enough to make the evaluation for myself. But my sense is that we as a community have done way to much talking about why MWI is better than CI (it obviously is) and not nearly enough thinking about the other alternatives.

Does your understanding of Copenhagen Quantum Mechanics reject the conclusion [...] that the universe is in superposition of many states?

Yes, it does. The Copenhagen interpretation says that when you observe the universe, your observation becomes right, and your model of the world should make a 100% certain retrodiction about what just happened. This is mathematically equivalent to letting the a MWI modeler know which world (or set of worlds with the same eigenstate of the observable) they're in at some time.

However, in Copenhagen, the universe you observe is all there "is." If I observe the electron with spin up, there is no other me that observes it with spin down. The probabilities in Copenhagen are more Bayesian than frequentist. Meanwhile in MWI the probabilities are frequencies of "actual" people measuring the electron. But since there is no such thing as an outside observer of the universe (that's the point), the difference here doesn't necessarily mean this *isn't* an argument about definitions. :P

Your Copenhagen Interpretation looks like starting with Many Worlds, and then rejecting the implied invisible worlds as an additional assumption about reality.

My Copenhagen interpretation (the one I use to demonstrate ideas about the Copenhagen interpretation, not necessarily the interpretation I use when thinking about problems) looks like the Copenhagen Interpretation. And yes, it is close to what you said. But it's not quite that simple, since all the math is preserved because of stuff like entanglement.

Oh, I'm all in favor of MWI. I just don't think we should claim that it makes different predictions from Copenhagen based simply on our scorn for Copenhagen.

Surely that isn't the reason:

Very good point about large-scale interference. If it's true, it makes me update in favor of MWI.

while Copenhagen seems to imply that it cannot occur on any scale larger than a human observer

Copenhagen doesn't imply that. The collapse happens as a result of interaction between the observer and the observed system, which can be an atom or an entire gallaxy.

Copenhagen doesn't imply that. The collapse happens as a result of interaction between the observer and the observed system, which can be an atom or an entire gallaxy.

It has been my experience that there is not consensus amongst professed supporters of the Copenhagen Interpretation about what causes collapse, whether an observer is involved, and what an observer is. Given that, I might handle this "interpretation" by letting it split the probability between the possibilities that result from different concepts of collapse, and then note that it assigns less probability than Many Worlds to the actual outcome.

But, to avoid being unfair to your particular understanding of collapse, what does prase::Copenhagen say an observer is?

what does prase::Copenhagen say an observer is?

prase::Copenhagen::observer is probably a fundamental entity, not definable without use of "observe", "observable" or perhaps "consciousness". (In fact, prase::Copenhagen doesn't necessarily imply that the collapse is real rather than an effective practical way how to describe reality; the above holds for those variants of Copenhagen which insist on the existence of an objective collapse.)

If interaction with a human, under the conditions normally present in a laboratory, are sufficient to prevent interference, then I see no sensible interpretation where two worlds full of human observers are not prevented from interfering. Perhaps I should have been more clear---by larger, I meant a scale which includes human observers, such as the entire earth, not just one which is much larger than a human.

I don't understand your first sentence, and even can't specifically say why. What do you mean by an interpretation where human observers are prevented from interfering?

On the other hand, I would agree that if the collapse is objective, then we should be able to detect collapse induced by observers other than ourselves and experimentally tell apart observers from non-observers. But the standard use of "Copenhagen" doesn't imply objective collapse, see e.g. Wikipedia.

Suppose a scientist measures some qbit's state to be 0. My understanding is that, whatever version of Copenhagen you adhere to, you no longer believe that there is another version of the scientist somewhere who has measured 1. Maybe this is wrong because of the distinction between objective and subjective collapse, but then I have absolutely no idea what distinguishes Copenhagen from many worlds. In particular, I assume that Copenhagen implies that there aren't many worlds.

Now, assume that after observing this 0, the scientist told the entire world and influenced the course of world events, having a significant effect on billions of human observers. According to my understanding, Copenhagen says there is only a single version of earth which actually exists---the one in which the scientist observed 0 and told everyone about it.

According to many worlds, there are multiple versions of earth---one in which the scientist observed 0, and one in which the scientist observed 1. Many worlds says that it is possible for these different versions of earth to interfere with each other, in exactly the same way that the worlds where the electron went through the left slit and where the electron went through the right slit can interfere. However, because the earth is chock full of physical observers to measure which state the earth is in, Copenhagen seems to say that there is only one version of the earth and so there certainly can't be any interference.

My understanding is that, whatever version of Copenhagen you adhere to, you no longer believe that there is another version of the scientist somewhere who has measured 1.

Depends. From the outside observer's point of view, he can be in a superposition of [has measured 0] and [has measured 1]. From the scientist's point of view, the collapse has happened.

...but then I have absolutely no idea what distinguishes Copenhagen from many worlds.

That's why it's called "interpretation". It's the way how we speak about it, and some untestable statements about consciousness perhaps with some philosophical implications, which make the whole difference. Of course, an objective collapse is a different thing, but I don't believe much Copenhagenists today believe in an objective collapse.

According to many worlds, there are multiple versions of earth---one in which the scientist observed 0, and one in which the scientist observed 1.

Such statement can be misleading. There is one version of Earth, but the individual observers see only certain projections. The difference between MWI and single-world interpretations is that MWI says that all projections are experienced.

Do we really want a definition of "complexity of physical theories" that tells apart theories making the same predictions?

Yes. As you said, simpler theories have certain advantages over complex theories, such as possibility of deeper understanding of what's going on. Of course, in that case we shouldn't exactly optimize K-complexity of their presentation, we should optimize informal notion of simplicity or ease of understanding. But complexity of specification is probably useful evidence for those other metrics that are actually useful.

The error related to your preceding post would be to talk about varying probability of differently presented equivalent theories, but I don't remember that happening.

Yeah, I guess the preceding post needs some obvious amendments in light of this post (though the general point still stands). I hope people are smart enough to see them anyway.

I just don't understand what sense it makes for a perfect Bayesian to distinguish between equivalent theories. Is it still honestly about "degrees of belief", or is it now about those other informal properties that you list?

I just don't understand what sense it makes for a perfect Bayesian to distinguish between equivalent theories.

No sense. It's a correct thing to do if depth of understanding of these theories is valuable and one is not logically omnipotent, but using complexity-leading-to-improbability to justify this principle would be cargo cult Bayesianism.

The prior probability of a simple explanation is inherently greater than the prior probability of a complex explanation.

If all evidence/observation confirm both explanations equally, then the simple explanation still is on the lead: because it started out with a higher prior probability.

Do we really want a definition of "complexity of physical theories" that tells apart theories making the same predictions?

If you look at the definition of the Solomonoff prior, you'll notice that it's actually a weighted sum over many programs that produce the desired output. This means that a potentially large number of programs, corresponding in this case to different formulations of physics, combine to produce the final probability of the data set.

So what's really happening is that all formulations that produce identical predictions are effectively collapsed into an equivalence class, which has a higher probability than any individual formulation.

Yeah, I know. The post was about deconstructing Eliezer's argument in favor of MWI, not about breaking the Solomonoff prior.

Given that I already explained that it makes sense to say the MW formulation contributes more probability to the equivalency class than Collapse formulations, it seems that your deconstruction is deconstructed.

I'll be quite happy if, as a result of my post, popular opinion on LW shifts from thinking that "MWI deserves a higher prior degree of belief because it's simpler, which makes a clear-cut case for Bayes over Science" to thinking that "MWI contributes more probability to the equivalence class than Collapse formulations". Some people already have it clear, like you. Others don't.

OK, as long as we remember that MWI and Collapse Formulations are not really in the same equivalence class, and that the implied invisible can have implications on utility.

A note: I am pretty sure that paul's claim (that MWI predicts more interference than typical QM) is false, and comes from not considering entanglement (which is understandable, because entanglement is hard). For example, if you collapsed the wavefunction of an electron in the two slit experiment improperly (by not keeping the state entangled after going through the slits), you would predict no interference.

This "entanglement" is just Many Worlds applied to a subsystem rather than the whole universe. If you allow the entire universe to be involved in the entanglement, you are really talking about Many Worlds by another name. If you only allow subsystems to be entangled, you will make different predictions than Many Worlds.

If you allow the entire universe to be involved in the entanglement, you are really talking about Many Worlds by another name.

The other name being "quantum mechanics." :D

Yes, if the typical interpretation of QM said anything about not allowing n-particle entangled states, it would be inconsistent with the math of quantum mechanics. But it doesn't, so it isn't. (Note that some people have made their own interpretations that violate this, e.g. consciousness causes collapse. They were wrong.)

Your "fix" seems problematic too, if it doesn't allow belief in the implied invisible

Took me more than a day to parse your objection, and it seems to be valid and interesting. Thanks.

I'm not sure yet.

What does cousin_it mean by

find the shortest algorithm that outputs the same predictions.

Does this algorithm necessarily model the predictions (in any fashion), or just list them? If the predictions are being modeled -- then they'll either predict or not predict the implied invisible.

If the predictions are not being modeled -- I just don't see how you can get an algorithm to output the right list without an internal model.

This comment on this page is relevant... For example, I think I agree with this:

In this case, the look-up table is essentially the program-that-lists-the-results, and the algorithm is the shortest description of how to get them. The equivalence is because, in some kind of sense, process and results imply each other. In my mind, this a bit like some kind of space-like-information and time-like-information equivalence, or as that between a hologram and the surface it's projected from.

**[deleted]**· 2010-11-09T17:15:34.674Z · score: 5 (7 votes) · LW(p) · GW(p)

"Therein lies the rub. Do we really want a definition of "complexity of physical theories" that tells apart theories making the same predictions? "

Yes.

"Evolution by natural selection occurs" and "God made the world and everything in it, but did so in such a way as to make it look *exactly* as if evolution by natural selection occured" make the same predictions in all situations.

You can do perfectly good science with either hypothesis, but the latter postulates an extra entity - it's a less useful way of thinking about things precisely because it's more complex. It adds an extra cognitive load.

Selecting theories by their Kolmogrov complexity is just another way of saying we're using Occam's Razor. If you have two theories with the same explanatory power and making the same predictions, then you want to use the simpler one - not because it's more likely to be 'true', but because it allows you to think more clearly.

then you want to use the simpler one - not because it's more likely to be 'true', but because it allows you to think more clearly.

Congratulations, you have now officially broken with Bayesianism and become a heretic. Your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think. Surely you can imagine all manner of ugly scenarios if that were the case.

then you want to use the simpler one - not because it's more likely to be 'true', but because it allows you to think more clearly.

Congratulations, you have now officially broken with Bayesianism and become a heretic. Your degree of belief in (prior probability of) a hypothesis should not depend on how clearly it allows you to think. Surely you can imagine all manner of ugly scenarios if that were the case.

Preferring to *use* a simpler theory doesn't require believing it to be more probable than it is. Expected utility maximization to the rescue.

Never mind usefulness, it seems to me that "Evolution by natural selection occurs" and "God made the world and everything in it, but did so in such a way as to make it look exactly as if evolution by natural selection occured" are *not* the same hypothesis, that one of them is true and one of them is false, that it is simplicity that leads us to say which is which, and that we do, indeed, prefer the simpler of two theories that make the same predictions, rather than calling them the same theory.

While my post was pretty misguided (I even wrote an apology for it), your comment looks even more misguided to me. In effect, you're saying that between Lagrangian and Hamiltonian mechanics, at most one can be "true". And you're also saying that which of them is "true" depends on the programming language we use to encode them. Are you sure you want to go there?

In effect, you're saying that between Lagrangian and Hamiltonian mechanics, at most one can be "true".

We may even be able to observe which one. Actually, I am pretty sure that if I looked closely at QM and these two formulations, I would go with Hamiltonian mechanics.

Ah, but which Hamiltonian mechanics is the true one: the one that says real numbers are infinite binary expansions, or the one that says real numbers are Dedekind cuts? I dunno, your way of thinking makes me queasy.

Sorry - I wrote an incorrect reply and deleted it. Let me think some more.

That point of view has far-reaching implications that make me uncomfortable. Consider two physical theories that are equivalent in every respect, except they use different definitions of real numbers. So they have a common part C, and theory A is the conjunction of C with "real numbers are Dedekind cuts", while theory B is the conjunction of C with "real numbers are infinite binary expansions". According to your and Eliezer's point of view as I understand it right now, at most one of the two theories can be "true". So if C (the common part) is "true", then ordinary logic tells us that at most one definition of the real numbers can be "true". Are you really, really sure you want to go there?

I think there's a distinction that should be made explicit between "a theory" and "our human mental model of a theory." The theory is the same, but we rightfully try to interpret it in the simplest possible way, to make it clearer to think about.

Usually, two different mental models necessarily imply two different theories, so it's easy to conflate the two, but sometimes (in mathematics especially) that's just not true.

Hmmm. But the very first posting in the sequences says something about "making your beliefs pay rent in expected experience". If you don't expect different experiences in choosing between the theories, it seems that you are making an unfalsifiable claim.

I'm not totally convinced that the two theories *do not* make different predictions in some sense. The evolution theory pretty much predicts that we are not going to see a Rapture any time soon, whereas the God theory leaves the question open. Not exactly "different predictions", but something close.

Both theories are trying to pay rent on the same house; that's the problem here, which is quite distinct from neither theory paying rent at all.

Clever. But ...

If theories A and B pay rent on the same house, then the theory (A OR B) pays enough rent so that the stronger theory A need pay no additional rent at all. Yet you seem to prefer A to B, and also to (A OR B).

(A OR B) is more probable than A, but if A is much more probable than B, then saying "(A OR B)" instead of "A" is leaving out information.

Let's say A = (MWI is correct) and B = (Copenhagen)

The equivalent of "A OR B" is the statement "either Copenhagen or MWI is correct", and I'm sure everyone here assigns "A OR B" a higher prior than either A or B separately.

But that's not really a theory, it's a disjunction between two different theories, so ofcourse we want to understand which of the two is actually the correct one. Not sure what your objection is here.

EDITED to correct a wrong term.

Not sure what your objection is here.

I'm not sure I have one. It is just a little puzzling how we might reconcile two things:

- EY's very attractive intuition that of two theories making the same predictions, one is true and the other ... what? False? Wrong? Well, ... "not quite so true".
- The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the
*truth*of a statement is to be found through its observable consequences.

ETA: Bayes's rule only deals with the fraction of reality-space spanned by a sentence, never with the number of characters needed to express the sentence.

There's a useful heuristic to solve tricky questions about "truths" and "beliefs": reduce them to questions about decisions and utilities. For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes trivial once you introduce any payoff structure. Maybe we could apply this heuristic here? Believing in one formulation of a theory over a different equivalent formulation isn't likely to win a Bayesian reasoner many dollars, no matter what observations come in.

Believing in one formulation of a theory over a different equivalent formulation isn't likely to win a Bayesian reasoner many dollars, no matter what observations come in.

Actually, it might help a reasoner saddled with *bounded rationality*. One theory might require less computation to get from theory to prediction, or it might require less memory resources to store. Having a fast, easy-to-use theory can be like money in the bank to someone who needs lots and lots of predictions.

It might be interesting to look at that idea someone here was talking about that merged ideas from Zadeh's fuzzy logic with Bayesianism. Instead of simple Bayesian probabilities which can be updated instantaneously, we may need to think of fuzzy probabilities which grow sharper as we devote cognitive resources to refining them. But with a good, simple theory we can get a sharper picture quicker.

I don't understand your point about bounded rationality. If you know theory X is equivalent to theory Y, you can believe in X more, but use Y for calculations.

Thats the definition of a free-floating belief isn't it? If you only have so much computational resources even storing theory X in your memory is a waste of space.

I think cousin_it's point was that if you have a preference for both quickly solving problems *and* knowing the true nature of things, then if theory X tells you the true nature of things but theory Y is a hackjob approximation that nevertheless gives you the answer you need much faster (in computer terms, say, a simulation of the actual event vs a monte-carlo run with the probabilities just plugged in) then it might be positive utility even under bounded rationality to keep both theory X and theory Y.

edit: the assumption is that we have at least mild preferences for both and the bounds on our rationality are sufficiently high that this is the preferred option for most of science).

It's one thing if you want to calculate a theory that is simpler because you don't have a need for perfect accuracy. Newton is good enough for a large fraction of physics calculations and so even though it is strictly wrong I imagine most reasoners would have need to keep it handy because it is simpler. But if you have two empirically equivalent and complete theories X and Y, and X is computationally simpler so you rely on X for calculating predictions, it seems to me *you believe x*. What would saying "No, actually I believe in Y not X" even mean in this context? The statement is unconnected to anticipated experience and any conceivable payoff structure.

Better yet, taboo "belief". Say you are an agent with a program that allows you to calculate, based on your observations, what your observations will be in the future contingent on various actions. You have another program that ranks those futures according to a utility function. What would it mean to add "belief" to this picture?

Your first paragraph looks misguided to me: does it imply we should "believe" matrix multiplication is *defined* by the naive algorithm for small n, and the Strassen and Coppersmith-Winograd algorithms for larger values of n? Your second paragraph, on the other hand, makes exactly the point I was trying to make in the original post: we can assign degrees of belief to *equivalence classes* of theories that give the same observable predictions.

For example, the Sleeping Beauty problem is very puzzling if you insist on thinking in terms of subjective probabilities, but becomes completely clear once you introduce a payoff structure.

Heh, I was just working on a post on that point.

Believing in one formulation of a theory over a different equivalent formulation isn't likely to win a Bayesian reasoner many dollars, no matter what observations come in. Therefore the reasoner should assign degrees of belief to equivalence classes of theories rather than individual theories.

I agree that that is true about *equivalent* formulations, literally isomorphic theories (as in this comment), but is that really the case about MWI vs. Copenhagen? Collapse is claimed as something that's actually happening out there in reality, not just as another way of looking at the same thing. Doesn't it have to be evaluated as a hypothesis on its own, such that the conjunction (MWI & Collapse) is necessarily less probable than just MWI?

Except the whole quantum suicide thing does create payoff structures. In determining weather or not to play a game of Quantum Russian Roulette you take your estimated winnings for playing if MWI and Quantum immortality is true and your estimated winnings if MWI or Quantum immortality is false and weigh them according to the probability you assign each theory.

(ETA: But this seems to be a quirky feature of QM interpretation, not a feature of empirically equivalent theories generally.)

(ETA 2: And it is a quirky feature of QM interpretation because MWI+Quantum Immortality is empirically equivalent to single world theories is a really quirky way.)

IMO quantum suicide/immortality is so mysterious that it can't support *any* definite conclusions about the topic we're discussing. I'm beginning to view it as a sort of thread-killer, like "consciousness". See a comment that mentions QI, collapse the whole thread because you know it's not gonna make you happier.

EY's very attractive intuition that of two theories making the same predictions, one is true and the other ... what? False? Wrong? Well, ... "not quite so true".

"More Wrong". :)

I can think of two circumstances under which two theories would make the same predictions (that is, where they'd *systematically* make the same predictions, under all possible circumstances under which they could be called upon to do so):

- They are mathematically isomorphic — in this case I would say they are the same theory.
- They contain isomorphic substructures that are responsible for the identical predictions. In this case, the part outside what's needed to
*actually generate*the predictions counts as extra detail, and by the conjunction rule, this reduces the probability of the "outer" hypothesis.

The latter is where collapse vs. MWI falls, and where "we don't know why the fundamental laws of physics are what they are" vs. "God designed the fundamental laws of physics, and we don't know why there's a God" falls, etc.

The tradition in Bayesianism and standard rationality (and logical positivism, for that matter) that the truth of a statement is to be found through its observable consequences.

Since when is that the Bayesian tradition? Citation needed.

the truth of a statement is to be found through its observable consequences.

Since when?

Well, I guess I am taking "observable consequences" to be something closely related to P(E|H)/P(E). And I am taking "the truth of a statement" to have something to do with P(H|E) adjusted for any bias that might have been present in the prior P(H).

I'm afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of "That ain't 'truth'. 'Truth' is to a Bayesian"

Observable consequences are part of what controls the *plausibility* of a statement, but not its truth. An unobservable truth can still be a truth. Things outside our past light cone exist despite being unobservable. Asking about a claim about some unobservable "Then how can we know whether it's true?" is irrelevant to evaluating whether it is the *sort of thing that could be a truth* because we're not talking about ourselves. Confusing truths with beliefs — even carefully-acquired accurate beliefs — is mind projection.

I'm afraid this explanation is all the citation I can offer. I would be happy to hear your opinion along the lines of "That ain't 'truth'. 'Truth' is to a Bayesian"

I can't speak for everyone who'd call themselves Bayesians, but I would say: There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect. A statement is true to the extent that it mirrors some aspect of reality (or some other structure if specified).

Observable consequences are part of what controls the plausibility of a statement, but not its truth. An unobservable truth can still be a truth.

...

There is a thing called reality, which causes our experiences and a lot of other things, characterized by its ability to not always do what we want or expect.

If we're going to distinguish 'truth' from our 'observations' then we need to be able to define 'reality' as something other than 'experience generator' (or else decouple truth and reality).

Personally, I suspect that we really need to think of reality as something other than an experience generator. What we can extract out of reality is only half of the story. The other half is the stuff we *put in* so as to *create* reality.

This is not a fully worked out philosophical position, but I do have some slogans:

- You can't do QM with only kets and no bras.
- You can't do Gentzen natural deduction with rules of elimination, but no rules of introduction.
- You can't write a program with GOTOs, but no COMEFROMs.

(That last slogan probably needs some work. Maybe I'll try something involving causes and effects.)

Well the second of those things already has very serious problems. See for example Quine's Confirmation Holism. We've know for a long time that our theories are under-determined by our observations and that we need some other way of adjudicating empirically equivalent theories. This was our basis for preferring Special Relativity over Lorentz Ether Theory. Parsimony seems like one important criteria but involves two questions:

One man's simple seems like another man's complex. How do you rigorously identify the more parsimonious between two hypotheses. Lots of people thing God is a very simple hypothesis. The most seemingly productive approach that I know of is the algorithmic complexity one that is popular here.

Is parsimony important because parsimonious theories are more likely be 'real' or is the issue really one of developing clear and helpful prediction generating devices?

The way the algorithmic probability stuff has been leveraged is by building candidates for universal priors. But this doesn't seem like the right way to do it. Beliefs are about anticipating future experience so they should take the form of 'Sensory experience x will occur at time t" (or something reducible to this). Theories aren't like this. Theories are frameworks that let us take some sensory experience and generate beliefs about our future sensory experiences.

So I'm not sure it makes sense to have beliefs distinguishing empirically identical theories. That seems like a kind of category error- a map-territory confusion. The question is, what do we do with this algorithmic complexity stuff that was so promising. I think we still have good reasons to be thinking cleanly about complicated science- the QM interpretation debate isn't totally irrelevant. But it isn't obvious algorithmic simplicity is what we want out of our theories (nor is it clear that what we want is the same thing other agents might want out of their theories). (ETA: Though of course K-complexity might still be helpful in making predictions between two possible futures that are empirically distinct. For example, we can assign a low probability to finding evidence of a moon landing conspiracy since the theory that would predict discovering such evidence is unparsimonious. But if that is the case, if theories can be ruled improbable on the basis of the structure of the theory alone why can we only do this with empirically distinct theories? Shouldn't all theories be understandable in this way?)

"Bayes's rule only deals with the fraction of reality-space spanned by a sentence"

Well, that's the thing: reality-space doesn't concern just our observations of the universe. If two different theories make the same predictions about our observations but disagree about which *mechanism* produces those events we observe, those are two different slices of reality-space.

Thanks, your comment is a very clear formulation of the reason why I wrote the post. Probably even better than the post itself.

I'm halfway tempted to write yet *another* post about complexity (maybe in the discussion area), summarizing all the different positions expressed here in the comments and bringing out the key questions. The last 24 hours have been a very educational experience for me. Or maybe let someone else do it, because I don't want to spam LW.

But that's not really a theory, it's a conjuction between two different theories,

It's actually the disjunction.

Yes, apologies. Fixed above.

Making the same predictions means making the same assignments of probabilities to outcomes.

Which brings us back to an issue which I was debating here a couple of weeks ago: Is there a difference between an event being impossible, and an event being of measure zero?

Orthodox Bayesianism says there is no difference and strongly advises against thinking either to be the case. I'm wondering whether there isn't some way to make the idea work that there is a distinction to be made - that some things are completely impossible given a theory, while other things are merely of infinitesimal probability.

There's a proposal to use surreal numbers for utilities. Such an approach was used for go by Conway.

**[deleted]**· 2010-11-09T17:35:26.748Z · score: 4 (6 votes) · LW(p) · GW(p)

When there is a testable physical difference between hypotheses, we want the one that makes the correct prediction.

When there is *no* testable physical difference between hypotheses, we want to use the one that makes it *easiest* to make the correct prediction. By definition, we can never get a prediction that wouldn't have happened were we using the other hypothesis, but we'll get that prediction quicker. Neither hypothesis can be said to be 'the way the world really is' because there's no way to distinguish between them, but the simpler hypothesis is more *useful*.

Wha? Then you must order the equivalent theories by running time, not code length. The two are frequently opposed: for example, the fastest known algorithm for matrix multiplication (in the big-O sense) is very complex compared to the naive one. In short, I feel you're only digging yourself deeper into the hole of heresy.

I think there’s a difference between looking at a theory as *data* versus looking at it as *code*.

You look at a theory as *code* when you need to use the theory to predict the future of something it describes. (E.g., will it rain.) For this purpose, theories that generate the same predictions *are* equivalent, you don’t care about their size. In fact, even theories with *different* predictions can be considered equivalent, as long as their predictions are close enough for your purpose. (See Newtonian vs. relativistic physics applied to predicting kitchen-sink performance.) You *do* care about how fast you can run them, though.

However, you look at a theory as *data* when you need to *reason about theories*, and “make predictions” about them, particularly unknown theories related to known ones. As long as two theories make *exactly* the same predictions, you don’t have much reason to reason about them. However, if they predict differently for something *you haven’t tested yet, but will test in the future*, and you need to take an action *now* that has different outcomes depending on the result of the *future* test (simple example: a bet), then you need to try to guess which is more likely.

You need something like a meta-theory that predicts which of the two is more likely to be true. Occam’s razor is one of those meta-theories.

Thinking about it more, this isn’t quite a disagreement to the post immediately above; it’s not immediately obvious to me that a simpler theory is easier to reason about (though intuition says it should be). But I don’t think Occam’s razor is about how easy it is to reason about theories, it just claims simpler ones are more likely. (Although one could justify it like this: take an incomplete theory; add *one* detail; add *another* detail; on each step you have to pick between many details you might add, so the more details you add you’re more likely to pick the wrong one (remember you haven’t tested the successive theories yet); thus, the more complex your theory the likelier you are to be wrong.)

**[deleted]**· 2010-11-09T19:32:16.597Z · score: 2 (2 votes) · LW(p) · GW(p)

Well, firstly, who said I cared at all about 'heresy'? I'm not replying here in order to demonstrate my adherence to the First Church Of Bayes or something...

And while there are, obviously, occasions where ordering by running time and code length are opposed, in general when comparing two arbitrary programs which generate the same output, the longer one will also take longer. This is obvious when you consider it - if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly. However, there are an infinite number of programs in the sample space that will generate the output more slowly, and that are also longer than X - just keep adding an extra 'sleep 1' before it prints the output, to take a trivial example.

In general, the longer the program, the more operations it performs and the longer it takes, when you're sampling from the space of all possible programs. So while run time and code length aren't perfectly correlated, they're a very decent proxy for each other.

if you have an arbitrary program X from the space of all programs that generate an output Y, there can only be a finite number of programs that generate that output more quickly.

Amusingly, this statement is false. If a program Z is faster than X, then there exist infinitely many versions of Z that also run faster than X: just add some never-executed code under an if(false) branch. I'm not sure whether your overall argument can be salvaged.

**[deleted]**· 2010-11-09T22:21:50.003Z · score: 2 (2 votes) · LW(p) · GW(p)

You're quite correct, there. I was only including code paths that can ever actually be executed, in the same way I wouldn't count comments as part of the program. This seems to me to be the correct thing to do, and I believe one could come up with some more rigorous reasoning along the lines of my previous comment, but I'm too tired right now to do so. I'll think about this...

I was only including code paths that can ever actually be executed...

Wouldn't a meta-algorithm that determines which paths are executable in a given algorithm necessarily not be able to do so for every possible algorithm unless it was functionally equivalent to a halting oracle?

I'm not sure how problematic this is to your idea, but it's one advantage that the simpler system of just counting total lines has.

The length of the program description is not really the measure of how easy it is to make a correct prediction. In fact, the shortest program for predicting is almost never the one you should use to make predictions in practice, precisely because it is normally quite slow. It is also very rarely the program which is easiest to manipulate mentally, since short programs tend to be very hard for humans to reason about.

Like PaulFChistiano said, the shortest accurate program isn't particularly *useful*, but its predictive model is more *a priori probable* according to the universal / Occamian prior.

It's really hard (and uncomputable) to discover, understand, and verify the shortest program that computes a certain input->prediction mapping. But we use the "shortest equivalent program" concept to judge which human-understandable program is more a priori probable.

When there is no testable physical difference between hypotheses, we want to use the one that makes it easiest to make the correct prediction.

Yes, we want to use the hypothesis that is easiest to use. But if we use it, does that commit us to 'believing' in it? In the case of no testable physical difference between hypotheses, I propose that someone has no obligation to believe (or admit they believe) that particular theory instead of another one with the same predictions.

I enthusiastically propose that we say we 'have' a belief only when we use or apply a belief for which there is an empirical difference in the predictions of the belief compared to the non-belief. Alternatively, we can use some other word instead of belief, that will serve to carry this more relevant distinction.

(Later: I realize this comment is actually directed at cousin_it, since he was the one that wrote, 'your degree of **belief** in (prior probability of) a hypothesis should not depend on how clearly it allows you to think'. I also think I may have reiterated what Vladimir_Nesov wrote here.)

"Evolution by natural selection occurs" and "God made the world and everything in it, but did so in such a way as to make it look exactly as if evolution by natural selection occured" make the same predictions in all situations.

I just wanted to make a comment here that the latter hypothesis is more complex because of the extra things that are packaged into the word "God".

"Something" making the world and everything in it and making it look like evolution isn't a hypothesis of higher complexity ... it's just the same hypothesis again, right? I feel like they're the same hypothesis to a large extent because the predictions are the same, and also because "something", "making" and "making it look like" are all vague enough to fill in with whatever is actually the case.

Two arguments - or maybe two formulations of the one argument - for complexity reducing probability, and I think the juxtaposition explains why it doesn't feel like complexity should be a straight-up penalty for a theory.

The *human-level* argument for complexity reducing probability something like A∩B is more probable than A∩B∩C because the second has three fault-lines, so to speak, and the first only has two, so the second is more likely to crack. **edit: equally or more likely, not strictly more likely.** (For engineers out there; I have found this metaphor to be invaluable both in spotting this in conversation, and explaining this in conversation to people). As byrnema noted down below, that doesn't seem applicable here, at least not in the direct simpler = better way, especially when having the same predictions seems to indicate that A, B, and C are all right.

The *formal* argument for complexity penalty (and this is philosophy, so bear with me) is that *a priori*, having absolutely no experiences about the universe so that all premises are equally likely (with nothing to privilege any of them, they default... the universal prior, if you like) - the theory with the fewest conjunctions of premises is the most likely by virtue of probability theory.
Now, we are restricted in our observations, because they don't tell us what actually is; they merely tell us that *anything* that predicts the outcome **is**, and *everything* that doesn't predict the outcome, **isn't**. This includes adhoc theories and overcomplicated theories like "Odin made Horus made God made the universe as we know it." However, we can extend that previous argument: Given that our observations have narrowed the universe as we know it to this section of hypotheses, we have no experiences *that say something* about **any** of the hypotheses in that section. So, *a priori*, all possible premises within that section are equally likely. So we should choose the one with the least conjunctions of premises, according to probability theory.

This doesn't really get to the heart of the matter addressed in the post, but it does justify a form of complexity-as-penalty that has some bearing: namely, that if Hamiltonian requires less premises than Lagrangian, and predictions bear out both of these systems out equally well, Hamiltonian *is* more probable, because it is less likely to be wrong due to a false premise somewhere in the area we haven't yet accessed. (In formal logic, Lagrangian is probably using some premise it doesn't need to).

The human-level argument for complexity reducing probability something like A∩B is more probable than A∩B∩C because the second has three fault-lines, so to speak, and the first only has two, so the second is more likely to crack.

Strictly speaking, the Pr(A∩B) ≥ Pr(A∩B∩C), not Pr(A∩B) > Pr(A∩B∩C). Otherwise, excellent post.

Uuuhhhh, wait, there's something wrong with your post. A simple logical statement can imply a complex-looking logical statement, right? Imagine that C is a very simple statement that implies B which is very complex. Then A∩B∩C is logically equivalent to A∩C, which is simpler than A∩B because C is simpler than B by assumption. Whoops.

You can make a statement more complex by adding more conjunctions or by adding more disjunctions. In general, the complexity of a statement about the world has no direct bearing on the prior probability we ought to assign to it. My previous post (linked from this one) talks about that.

find the shortest algorithm that outputs the same predictions.

Prediction making is not a fundamental attribute that hypotheses have. What distinguishes hypotheses is what they are saying is really going on. We use that to make predictions.

The waters get muddy when dealing with fundamental theories of the universe. In a more general case: If we have two theories which lead to identical predictions of the behavior of an impenetrable black box, but say different things about the interior, then we should choose the simpler one. If at some point in the future we figure out how to open the black box, then the things you had labeled implementation details might be leading to predictions.

I don't think we should abandon that just because we hit a black box that appears fundamentally impenetrable.

we should choose the simpler one

Why do you use the adjective 'simpler'? I understand that this isn't just you, but the common term for this context. But we really mean 'more probable', correct? In which case, why don't we just say, 'more probable'?

I'm not sure what 'simpler' means but I don't think the relationship between 'simple' and 'probable' is straight-forward -- except when the more complex thing **is a subset of** the more simple thing. That is, in the usual provided example that A∩B is more probable than A∩B∩C.

Simpler is not always more probable, it's just something with which to build your priors.

If you have two theories that make different but similar predictions of noisy data, the one that fits the data better might be the more probable, even if it's vastly more complex.

Suppose, counterfactually, that Many Worlds QM and Collapse QM really always made the same predictions, and so you want to say they are both the same theory QM. It still makes sense to ask what is the complexity of Many Worlds QM and how much probability does it contribute to QM, and what is the complexity of Collapse QM and how much probability does it contribute to QM. It even makes sense to say that Many Worlds QM has a strictly smaller complexity, and contributes more probability, and is the better formulation.

It still makes sense to ask what is the complexity of Many Worlds QM and how much probability does it contribute to QM, and what is the complexity of Collapse QM and how much probability does it contribute to QM.

You can of course introduce the universal prior over equivalent formulations of a given theory, and state which formulations weigh how much according to this prior, but I don't see in what way this is a natural structure to consider, and what questions it allows to understand better.

It seems you want to define the complexity of QM by summing over all algorithms that can generate the predictions of QM, rather than just taking the shortest one. In that case you should probably take the same approach to defining K-complexity of bit strings: sum over all algorithms that print the string, not take the shortest one. Do you subscribe to that point of view?

It seems you want to define the complexity of QM by summing over all algorithms that can generate the predictions of QM, rather than just taking the shortest one.

Yes, though to be clear, it is the prior probability associated with the complexity of the individual algorithm that I would sum over to get the prior probability of that common set of predictions being correct. I don't consider the common set of predictions to have a conceptially useful complexity in the same sense that the algorithms do.

In that case you should probably take the same approach to defining K-complexity of bit strings: sum over all algorithms that print the string, not take the shortest one. Do you subscribe to that point of view?

I would apply the same approach to making predictions about bit strings.

I don't consider the common set of predictions to have a conceptially useful complexity in the same sense that the algorithms do.

Why? Both are bit strings, no?

My computer represents numbers and letters as bit strings. This doesn't mean it makes sense to multiply letters together.

This is related to a point that I attempted to make previously. You can measure complexity, but you must pick the context appropriately.

"But imagine you refactor your prediction-generating program and make it shorter; does this mean the physical theory has become simpler?"

Yeah, (given the caveats already mentioned by Vladimir), as any physical theory *is* a prediction-generating program. A theory that isn't a prediction-generating program isn't a theory at all.

I think that while a sleek decoding algorithm and a massive look-up table might be mathematically equivalent, they differ markedly in what sort of process actually carries them out, at least from the POV of an observer on the same 'metaphysical level' as the process. In this case, the look-up table is essentially the program-that-lists-the-results, and the algorithm is the shortest description of how to get them. The equivalence is because, in some kind of sense, process and results imply each other. In my mind, this a bit like some kind of space-like-information and time-like-information equivalence, or as that between a hologram and the surface it's projected from.

In the end, how are we to ever prefer one kind of description over the other? I can only think that it either comes down to some arbitrary aesthetic appreciation of elegance, or maybe some kind of match between the form of description and how it fits in with our POV; our minds can be described in many ways, but only one corresponds *directly* with how we observe ourselves and reality, and we want any model to describe our minds with as minimal re-framing as possible.

Now, could someone please tell me if what I have just said makes any kind of sense?!

In the end, how are we to ever prefer one kind of description over the other?

The minimum size of an algorithm will depend on the context in which
it is represented. To meaningfully compare minimum algorithm sizes we
*must* choose a context that represents the essential entities and
relationships of the domain in consideration.

The minimum size of an algorithm will depend on the context in which it is represented

Isn't one of the basic results of Kolmogorov complexity/information theory that algorithms/programs can be converted from one formalism/domain to another with a constant-size prefix/penalty and hence there will be only a constant factor penalty in # of bits needed to distinguish the right algorithm in even the most biased formalism?

Isn't one of the basic results of Kolmogorov complexity/information theory that algorithms/programs can be converted from one formalism/domain to another with a constant-size prefix/penalty...

I believe that my point holds.

This constant-size prefix becomes part of the context in which the algorithm is represented. One way to think about it is that the prefix creates an interpretation layer which translates the algorithm from its domain of implementation to the substrate domain.

To restate my point in these new terms, the prefix must be chosen to provide the appropriate model of the domain under consideration, to the algorithms being compared. It does not make sense to consider algorithms implemented under different domain models (different prefixes).

For example if I want to compare the complexity of 3sat expressions, then I shouldn't be considering algorithms in domains that support multiplication.

Another way to think of the constant-size prefix is that one can choose any computer language in which to write the program which outputs the string, and then encode a compiler for that language in the prefix.

This works fine for theory: after all, K-complexity is not computable, so we really are in the domain of theory here. For practical situations (even stretching the term "practical" to include QM interpretations!), if the length of the prefix is non-negligible compared to the length of the program, then we can get misleading results. (I would love a correction or some help in supporting my intuition here.)

As a result, I think I agree that the choice of representation matters.

However, I don't agree that there is a principled way of choosing the right representation. There is no such thing as *the* substrate domain. Phrases such as "the essential entities and relationships of the domain" are too subjective.

...if the length of the prefix is non-negligible compared to the length of the program, then we can get misleading results.

For the purposes of complexity comparisons the prefix should be held constant across the algorithms. You should always be comparing algorithms in the same language.

However, I don't agree that there is a principled way of choosing the right representation.

You are correct. I only use phrases such as "the essential entities and relationships of the domain" to outline the nature of the problem.

The problem with comparing the complexity of QM interpretations is that our representation of those interpretations is *arbitrary*. We can only guess at the proper
representation of QM. By choosing different representations we could favor one theory or the other as the most simple.

For the purposes of complexity comparisons the prefix should be held constant across the algorithms. You should always be comparing algorithms in the same language.

Oh, that seems sensible. It makes the problem of choosing the language even more acute though, since now we can ignore the description length of the compiler itself, meaning that even crazy languages (such as the language which outputs Encyclopedia Brittanica with a single instruction) are in contention. The point of requiring the language to be encoded in the prefix, and its length added to the description length, is to prevent us from "cheating" in this way.

I had always assumed that it was necessary to allow the prefix to vary. Clearly "abcabcabc" and "aaabbbccc" require different prefixes to express them as succinctly as possible. In principle there's no clear distinction between a prefix which encodes an entire new language and a prefix which just sets up a function to take advantage of the regularities of the string.

In principle there's no clear distinction between a prefix which encodes an entire new language and a prefix which just sets up a function to take advantage of the regularities of the string.

Yes, and this is important to see. The split between content and context can be made anywhere, but the *meaning* of the content changes depending on where the split is made.

If you allow the prefix to change then you are considering string lengths in terms of the base language. This language can bias the result in relation to the problem domain that you are actually interested in.

As I said above:

For example if I want to compare the complexity of 3sat expressions, then I shouldn't be considering algorithms in domains that support multiplication.

At least the Quantum Immortality is something, what isn't the same under the MWI or any other interpretation of QM.

There is no QI outside the MWI. Do you postulate any quantum immortal suicider in your MWI branch? No? Why not?

Quantum immortality is not observable. You surviving a quantum suicide is not evidence for MWI - no more than it is for external observers.

What about me surviving a thousand quantum suicides (with neglible odds of survival) in a row?

That only provides evidence that you are determinedly suicidal and that you will eventually succeed.

But I'd have fun with my reality-steering anthropic superpowers in the meantime.

No. You're comparing the likelihood of 2 hypothesis. The observation that you survived 1000 good suicide attempts is much more likely under MWI than under Copenhagen. Then you flip it around using Bayes' rule, and believe in MWI.

But other Bayesians around you should not agree with you. This is a case where Bayesians should agree to disagree.

First, not to be nit-picky but MWI != QI. Second, if your suicide attempts are well documented the branch in which you survived would be populated by Bayesians who agreed with you, no?

Flip a quantum coin.

The observation that you survived 1000 good suicide attempts is much more likely under MWI than under Copenhagen.

Isn't that like saying "Under MWI, the observation that the coin came up heads, and the observation that it came up tails, both have probability of 1"?

The observation that I survive 1000 good suicide attempts has a probability of 1, but only if I condition on my being capable of making any observation at all (i.e. alive). In which case it's the same under Copenhagen.

The observation *is* that you're alive. If the Quantum Immortality hypothesis is true you will continue making that observation after an arbitrary number of good suicide attempts. The probability that you will continue making that observation if Quantum Immortality is false is much smaller than one.

The probability that *there exists an Everett branch in which I continue making that observation* is 1. I'm not sure if jumping straight to subjective experience from that is justified:

If P(I survive|MWI) = 1, and P(I survive|Copenhagen) = p, then what is the rest of that probability mass in Copenhagen interpretation? Why is P(~(I survive)|Copenhagen) = 1-p and what does it really describe? It seems to me that calling it "I don't make any observation" is jumping from subjective experiences back to objective. This looks like a confusion of levels.

ETA: And, of course, the problem with "anthropic probabilities" gets even harder when you consider copies and merging, simulations, Tegmark level 4, and Boltzmann brains (The Anthropic Trilemma). I'm not sure if there even *is* a general solution. But I strongly suspect that "you can prove MWI by quantum suicide" is an incorrect usage of probabilities.

It even depends on philosophy. Specifically on whether following equality holds.

I survive = There (not necessarily in our universe) exists someone who remembers everything I remember now plus failed suicide I'm going to conduct now.

or

I survive = There exists someone who don't remember everything I remember now, but he acts as I would acted if I remember what he remembers. (I'm not sure whether I correctly expressed subjunctive mood)

If P(I survive|MWI) = 1, and P(I survive|Copenhagen) = p, then what is the rest of that probability mass in Copenhagen interpretation?

First, I'm gonna clarify some terms to make this more precise. Let Y be a person psychologically continuous with your present self. P(there is some Y that observes surviving a suicide attempt|Quantum immortality) = 1. Note MWI != QI. But QI entails MWI. P(there is some Y that observes surviving a suicide attempt| ~QI) = p.

It follows from this that P(~(there is some Y that observes surviving a suicide attempt)|~QI) = 1-p.

I don't see a confusion of levels (whatever that means).

ETA: And, of course, the problem with "anthropic probabilities" gets even harder when you consider copies and merging, simulations, Tegmark level 4, and Boltzmann brains (The Anthropic Trilemma). I'm not sure if there even is a general solution. But I strongly suspect that "you can prove MWI by quantum suicide" is an incorrect usage of probabilities.

I don't know if this is the point you meant to make but the existence of these other hypotheses that could imply anthropic immortality definitely *does* get in the way of providing evidence in favor of Many Worlds through suicide. Surviving increases the probability of all of those hypotheses (to different extents but not really enough to distinguish them).

First, I'm gonna clarify some terms to make this more precise. Let Y be a person psychologically continuous with your present self. P(there is some Y that observes surviving a suicide attempt|Quantum immortality) = 1. Note MWI != QI. But QI entails MWI. P(there is some Y that observes surviving a suicide attempt| ~QI) = p.

It follows from this that P(~(there is some Y that observes surviving a suicide attempt)|~QI) = 1-p.

I don't see a confusion of levels (whatever that means).

I still see a problem here. Substitute quantum suicide -> quantum coinflip, and surviving a suicide attempt -> observing the coin turning up heads.

Now we have P(there is some Y that observes coin falling heads|MWI) = 1, and P(there is some Y that observes coin falling heads|Copenhagen) = p.

So *any* specific outcome of a quantum event would be evidence in favor of MWI.

I think that works actually. If you observe 30 quantum heads in a row you have strong evidence in favor of MWI. The quantum suicide thing is just a way of increasing the proportion of future you's that have this information.

If you observe 30 quantum heads in a row you have strong evidence in favor of MWI.

But then if I observed *any string of 30 outcomes* I would have strong evidence for MWI (if the coin is fair, "p" for any specific string would be 2^-30).

You have to specify a particular string to look for before you do the experiment.

Sorry, now I have no idea what we're talking about. If your experiment involves killing yourself after seeing the wrong string, this is close to the standard quantum suicide.

If not, I would have to see the probabilities to understand. My analysis is like this: P(I observe string S | MWI) = P(I observe string S | Copenhagen) = 2^-30, regardless of whether the string S is specified beforehand or not. MWI doesn't mean that my next Everett branch must be S because I say so.

The reason why this doesn't work (for coins) is that (when MWI is true) A="my observation is heads" implies B="some Y observes heads", but not the other way around. So P(B|A)=1, but P(A|B) = p, and after plugging that into the Bayes formula we have P(MWI|A) = P(Copenhagen|A).

Can you translate that to the quantum suicide case?

Isn't that like saying "Under MWI, the observation that the coin came up heads, and the observation that it came up tails, both have probability of 1"?

I have no theories about what you're thinking when you say that.

Either you condition the observation (of surviving 1000 attempts) on the observer existing, and you have 1 in both cases, or you don't condition it on the observer and you have p^-1000 in both cases. You can't have it both ways.

It convinces you that MWI is true. Due to the nature of quantum suicide, though, you will struggle to share this revelation with anyone else.

That's the problem - it *shouldn't* really convince him. If he shares all the data and priors with external observers, his posterior probability of MWI being true should end up the same as theirs.

It's not very different from surviving thousand classical Russian roulettes in a row.

ETA: If the chance of survival is p, then in both cases P(I survive) = p, P(I survive | I'm there to observe it) = 1. I think you should use the second one in appraising the MWI...

ETA2: Ok maybe not.

If he shares all the data and priors with external observers, his posterior probability of MWI being true should end up the same as theirs.

No; I think you're using the Aumann agreement theorem, which can't be used in real life. It has many exceedingly unrealistic assumptions, including that all Bayesians agree completely on all definitions and all category judgements, and all their knowledge about the world (their partition functions) is mutual knowledge.

In particular, to deal with the quantum suicide problem, the reasoner has to use an indexical representation, meaning this is knowledge expressed by a proposition containing the term "me", where *me* is defined as "the agent doing the reasoning". A proposition that contains an indexical can't be mutual knowledge. You can transform it into a different form in someone else's brain that will have the same extensional meaning, but that person will not be able to derive the same conclusions from it, because some of their knowledge is also in indexical form.

(There's a more basic problem with the Aumann agreement theorem - when it says, "To say that 1 knows that 2 knows E means that E includes all P2 in N2 that intersect P1," that's an incorrect usage of the word "knows". 1 knows that E includes P1(w), and that E includes P2(w). 1 concludes that E includes P1 union P2, for *some* P2 that intersects P1. Not for *all* P2 that intersect P1. In other words, the theorem is mathematically correct, but semantically incorrect; because the things it's talking about aren't the things that the English gloss says it's talking about.)

There are indeed many cases where Aumann's agreement theorem seems to apply semantically, but in fact doesn't apply mathematically. Would there be interest in a top-level post about how Aumann's agreement theorem can be used in real life, centering mostly around learning from disagreements rather than forcing agreements?

I'd be interested, but I'll probably disagree. I don't think Aumann's agreement theorem can ever be used in real life. There are several reasons, but the simplest is that it requires the people involved share the same partition function over possible worlds. If I recall correctly, this means that they have the same function describing how different observations would restrict the possible worlds they are in. This means that the proof *assumes* that these two rational agents would agree on the implications of any shared observation - which is almost equivalent to what it is trying to prove!

I will include this in the post, if and when I can produce one I think is up to scratch.

What if you represented those disagreements over implications as coming from agents having different logical information?

I don't really see what is the problem with Aumann's in that situation. If X commits suicide and Y watches, are there any factors (like P(MWI), or P(X dies|MWI)) that X and Y *necessarily* disagree on (or them agreeing would be completely unrealistic)?

If joe tries and fails to commit suicide, joe will have the proposition (in SNActor-like syntax)

action(agent(me), act(suicide)) survives(me, suicide)

while jack will have the propositions

action(agent(joe), act(suicide)) survives(joe, suicide)

They both have a rule something like

MWI => for every X, act(X) => P(survives(me, X) = 1

but only joe can apply this rule. For jack, the rule doesn't match the data. This means that joe and jack have different partition functions regarding the extensional observation survives(joe, X), which joe represents as survives(me, X).

If joe and jack both use an extensional representation, as the theorem would require, then neither joe nor jack can understand quantum immortality.

So you're saying that the knowledge "I survive X with probability 1" can in no way be translated into objective rule without losing some information?

I assume the rules speak about subjective experience, not about "some Everett branch existing" (so if I flip a coin, P(I observe heads) = 0.5, not 1). (What do probabilities of possible, mutually exclusive outcomes of given action sum to in your system?)

Isn't the translation a matter of applying conditional probability? i.e. (P(survives(me, X) = 1 <=> P(survives(joe, X) | joe's experience continues = 1)

I was actually going off the idea that the vast majority - 100% minus pr(survive all suicides) - of worlds would have the subject dead at some point, so all those worlds would not be convinced. Sure, people in your branch might believe you, but in (100 - 9.3x10^-302) percent of the branches, you aren't there to prove that quantum suicide works. This means, I think, that the chance of you existing to prove that quantum suicide proves MWI to the rest of the world, the chance is equal to the chance of you surviving in a nonMWI universe.

I was going to say well, if you had a test with a 1% chance of confirming X and a 99% chance of disconfirming X, and you ran it a thousand times and *made sure* you presented only the confirmations, you would be laughed at to suggest that X is confirmed - but it is MWI that predicts every quantum event comes out every result, so only under MWI could you run the test a thousand times - so that would indeed be pretty convincing evidence that MWI is true.

Also: I only have a passing familiarity with Robin's mangled worlds, but at the power of negative three hundred, it feels like a small enough 'world' to get absorbed into the mass of worlds where it works a few times and then they actually do die.

Sure, people in your branch might believe you

The problem I have with that is that from my perspective as an external observer it looks no different than someone flipping a coin (appropriately weighted) a thousand times and getting thousand heads. It's quite improbable, but the fact that someone's life depends on the coin shouldn't make any difference for me - the universe doesn't care.

Of course it also doesn't convince me that the coin will fall heads for the 1001-st time.

(That's only if I consider MWI and Copenhagen here. In reality after 1000 coin flips/suicides I would start to strongly suspect some alternative hypotheses. But even then it shouldn't change my confidence of MWI relative to my confidence of Copenhagen).

If the chance of survival is p, then in both cases P(I survive) = p, P(I survive | I'm there to observe it) = 1.

Indeed, the anthropic principle explains the result of quantum suicide, whether or not you subscribe to the MWI. The real question is *whether you ought* to commit quantum suicide (and harness its anthropic superpowers for good). It's a question of morality.

I would say quantum suiciding is not "harnessing its anthropic superpowers for good", it's just conveniently excluding yourself from the branches where your superpowers don't work. So it has no more positive impact on the universe than you dying has.

Related (somewhat): The Hero With A Thousand Chances.

If it's not observable, what difference than does it make?

It makes no difference. Its a thought experiment about the consequences of MWI, but it isn't a testable prediction.

Of course it is testable. Just do some 30 quantum coin flips in a row. If any of them is a head, knock yourself down into a deep sleep (with anesthesia) for 24 hours.

If you are still awake 1 hour after the coin failed for the last time, QI is probably the fact.

Nah. QI relies on your "subjective thread" coming to an end in some worlds and continuing in others. In your experiment I'd be pretty certain to get knocked out and wake up after 24 hours.

How does the Multiverse know, I am just sleeping for 24 (or 24000) hours? How the Multiverse knows, I'll not be rescued after the real suicide attempt after a quantum coin head popped up?

Or resurrected by some ultratech?

Where is the fine red line, that the Quantum Immortality is possible, but a Quantum Awakening described above - isn't?

How does the Multiverse know

It doesn't, not right now in the present moment. But there's no reason why "subjective threads" and "subjective probabilities" should depend on physical laws *only locally*. Imagine you're an algorithm running on a computer. If someone pauses the computer for a thousand years, afterwards you go on running like nothing happened, even though at the moment of pausing nobody "knew" when/if you'd be restarted again.

If someone pauses the computer for a thousand years, afterwards you go on running like nothing happened, even though at the moment of pausing nobody "knew" when/if you'd be restarted again.

But what if a new computer arises every time and an instance of this algorithm start there?

As it allegedly does in MW?

How does the Multiverse know, I am just sleeping for 24 (or 24000) hours? How the Multiverse knows, I'll not be rescued after the real suicide attempt after a quantum coin head popped up?

Because you won't be back. Universe has the whole eternity to just wait for you to come back. If you don't, the only remaining ones that keep on experiencing from where you left off are the branches where coin didn't come heads.

I see. The MW has a book of those who will wake up and those who will not?

And acts accordingly. Splits or not.

I do not buy this, of course.

It's a good thought to reject.

In fact, quantum immortality has little to do with the actual properties of the universe, as long as it's probabilistic. It's just what happens when you arbitrarily (well, anthropically) decide to stop counting certain possibilities.

No, it *always* splits into two everett branches. It's just that if you do in fact wake up in the distant future, that version of you that wakes up will be a successor of the you that is awake now, as is the version of you that never went to sleep in the next microsecond (or whatever). And you should anticipate either's experiences equally.

Or at least that's how I think it works (this assumes timeless physics, which I think is what Jonii assumed).

There are two problems with this test.

First, the result of a coin flip is almost certainly determined by starting conditions. With enough knowledge of those conditions you could predict the result. Instead you should make a measurement on a quantum system, such as measuring the spin of an electron.

Second the result of this test does not distinguish between QI and not-QI. The probability of being knocked out or left awake is the same in both cases.

I suppose you could be assuming that your consciousness can jump arbitrarily between universes to follow a conscious version of you.... but no that would just be silly.

Instead you should make a measurement on a quantum system, such as measuring the spin of an electron.

This is probably what Thomas meant by "quantum" coin flip.

You are right, I missed that. I probably shouldn't post comments when I'm hungry, I've got a few other comments like this to account for as well. :)

You might have missed the part where Thomas made it a "quantum coin flip". The problem with the test is that by definition is can't be replicated successfully by the scientific community and that even if QI is true you will get dis-confirming evidence in most Everett branches.

I don't postulate anything, what it is not already postulated in the so called Quantum Suicide mental experiment.

I just apply this on to the sleeping/coma case. Should work the same.

But I don't think it works in either case.

I don't postulate anything, what it is not already postulated in the so called Quantum Suicide mental experiment.

The test you proposed does not distinguish between QI and not-QI. I don't think that the current formulation of MWI even allows this to be tested.

I just apply this on to the sleeping/coma case. Should work the same.

Not a factor to my argument, both are untestable. You are arguing this point against other others, not me.

First, the result of a coin flip is almost certainly determined by starting conditions. With enough knowledge of those conditions you could predict the result.

If that's a valid objection, then quantum suicide won't work either. In fact, if that's a valid objection, then many-worlds is impossible, since everything is deterministic with no possible alternatives.

Many-worlds *is* a deterministic theory, as it says that the split configurations both occur.

Quantum immortality, mind you, is a very silly idea for a variety of other reasons -- foremost of which is that a googleplex of universes still doesn't ensure that there exists one of them in which a recognizable "you" survives next week, let alone to the end of time.