The Strong Occam's Razor

cousin_it

The Strong Occam's Razor

post by cousin_it · 2010-11-11T17:28:21.338Z · LW · GW · Legacy · 74 comments

74 comments

This post is a summary of the different positions expressed in the comments to my previous post and elsewhere on LW. The central issue turned out to be assigning "probabilities" to individual theories within an equivalence class of theories that yield identical predictions. Presumably we must prefer shorter theories to their longer versions even when they are equivalent. For example, is "physics as we know it" more probable than "Odin created physics as we know it"? Is the Hamiltonian formulation of classical mechanics apriori more probable than the Lagrangian formulation? Is the definition of reals via Dedekind cuts "truer" than the definition via binary expansions? And are these all really the same question in disguise?

One attractive answer, given by shokwave, says that our intuitive concept of "complexity penalty" for theories is really an incomplete formalization of "conjunction penalty". Theories that require additional premises are less likely to be true, according to the eternal laws of probability. Adding premises like "Odin created everything" makes a theory less probable and also happens to make it longer; this is the entire reason why we intuitively agree with Occam's Razor in penalizing longer theories. Unfortunately, this answer seems to be based on a concept of "truth" granted from above - but what do differing degrees of truth actually mean, when two theories make exactly the same predictions?

Another intriguing answer came from JGWeissman. Apparently, as we learn new physics, we tend to discard inconvenient versions of old formalisms. So electromagnetic potentials turn out to be "more true" than electromagnetic fields because they carry over to quantum mechanics much better. I like this answer because it seems to be very well-informed! But what shall we do after we discover all of physics, and still have multiple equivalent formalisms - do we have any reason to believe simplicity will still work as a deciding factor? And the question remains, which definition of real numbers is "correct" after all?

Eliezer, bless him, decided to take a more naive view. He merely pointed out that our intuitive concept of "truth" does seem to distinguish between "physics" and "God created physics", so if our current formalization of "truth" fails to tell them apart, the flaw lies with the formalism rather than with us. I have a lot of sympathy for this answer as well, but it looks rather like a mystery to be solved. I never expected to become entangled in a controversy over the notion of truth on LW, of all places!

A final and most intriguing answer of all came from saturn, who alluded to a position held by Eliezer and sharpened by Nesov. After thinking it over for awhile, I generated a good contender for the most confused argument ever expressed on LW. Namely, I'm going to completely ignore the is-ought distinction and use morality to prove the "strong" version of Occam's Razor - that shorter theories are more "likely" than equivalent longer versions. You ready? Here goes:

Imagine you have the option to put a human being in a sealed box where they will be tortured for 50 years and then incinerated. No observational evidence will ever leave the box. (For added certainty, fling the box away at near lightspeed and let the expansion of the universe ensure that you can never reach it.) Now consider the following physical theory: as soon as you seal the box, our laws of physics will make a localized exception and the victim will spontaneously vanish from the box. This theory makes exactly the same observational predictions as your current best theory of physics, so it lies in the same equivalence class and you should give it the same credence. If you're still reluctant to push the button, it looks like you already are a believer in the "strong Occam's Razor" saying simpler theories without local exceptions are "more true". QED.

It's not clear what, if anything, the above argument proves. It probably has no consequences in reality, because no matter how seductive it sounds, skipping over the is-ought distinction is not permitted. But it makes for a nice koan to meditate on weird matters like "probability as preference" (due to Nesov and Wei Dai) and other mysteries we haven't solved yet.

ETA: Hal Finney pointed out that the UDT approach - assuming that you live in many branches of the "Solomonoff multiverse" at once, weighted by simplicity, and reducing everything to decision problems in the obvious way - dissolves our mystery nicely and logically, at the cost of abandoning approximate concepts like "truth" and "degree of belief". It agrees with our intuition in advising you to avoid torturing people in closed boxes, and more generally in all questions about moral consequences of the "implied invisible". And it nicely skips over all the tangled issues of "actual" vs "potential" predictions, etc. I'm a little embarrassed at not having noticed the connection earlier. Now can we find any other good solutions, or is Wei's idea the only game in town?

74 comments

Comments sorted by top scores.

comment by PlaidX · 2010-11-11T18:34:45.134Z · LW(p) · GW(p)

gets out the ladder and climbs up to the scoreboard

5 posts without a tasteless and unnecessary torture reference

replaces the 5 with a 0

climbs back down

comment by HalFinney · 2010-11-12T00:11:28.872Z · LW(p) · GW(p)

Years ago, before coming up with even crazier ideas, Wei Dai invented a concept that I named UDASSA. One way to think of the idea is that the universe actually consists of an infinite number of Universal Turing Machines running all possible programs. Some of these programs "simulate" or even "create" virtual universes with conscious entities in them. We are those entities.

Generally, different programs can produce the same output; and even programs that produce different output can have identical subsets of their output that may include conscious entities. So we live in more than one program's output. There is no meaning to the question of what program our observable universe is actually running. We are present in the outputs of all programs that can produce our experiences, including the Odin one.

Probability enters the picture if we consider that a UTM program of n bits is being run in 1/2^n of the UTMs (because 1/2^n of all infinite bit strings will start with that n bit string). That means that most of our instances are present in the outputs of relatively short programs. The Odin program is much longer (we will assume) than one without him, so the overwhelming majority of our copies are in universes without Odin. Probabilistically, we can bet that it's overwhelmingly likely that Odin does not exist.

Replies from: DanielVarga, cousin_it, red75

↑ comment by DanielVarga · 2010-11-12T00:45:18.753Z · LW(p) · GW(p)

This is a cool theory, but it is probably equivalent to another, less cool theory that yields identical predictions and does not reference infinite virtual universes. :)

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2010-11-12T01:25:47.666Z · LW(p) · GW(p)

Although it postulates the existence of infinitely many inaccessible universes, it may be simpler than equivalent theories which imply only a single universe.

I feel like this is an argument we've seen before, but with more hilarious self-referentiality.

Replies from: khafra, DanielVarga

↑ comment by khafra · 2010-11-12T16:39:22.314Z · LW(p) · GW(p)

Perhaps in The Finale of the Ultimate Meta Mega Crossover?

↑ comment by DanielVarga · 2010-11-12T09:33:48.326Z · LW(p) · GW(p)

I feel like this is an argument we've seen before [...]

If I am not mistaken, it is a bit more formalized version of Greg Egan's Dust Theory.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2010-11-12T15:33:54.932Z · LW(p) · GW(p)

I was actually referring to the (slightly superficial) similarity to the MWI vs. collapse discussion that indirectly prompted this post.

↑ comment by cousin_it · 2010-11-12T02:47:38.136Z · LW(p) · GW(p)

Yep, I already arrived at that answer elsewhere in the thread. It's very nice and consistent and fits very well with UDT (Wei Dai's current "crazy" idea). There still remains the mystery of where our "subjective" probabilities come from, and the mystery why everything doesn't explode into chaos, but our current mystery becomes solved IMO. To give a recent quote from Wei, "There are copies of me all over math".

↑ comment by red75 · 2010-11-12T00:52:25.745Z · LW(p) · GW(p)

Should we stop on UDASSA? Can we consider universe that consists of continuum of UDASSAs each running some (infinite) subset of set of all possible programs.

Replies from: red75

↑ comment by red75 · 2010-11-12T08:26:50.836Z · LW(p) · GW(p)

If anyone is interested. This extension doesn't seem to lead to anything of interest.

If we map continuum of UDASSA multiverses into [0;1) then Lebesgue measure of set of multiverses which run particular program is 1/2.

Let binary number 0.b1 b2 ... bn ... be representation of multiverse M if for all n: (bn=1 iff M runs program number n, and bn=0 otherwise).

It is easy to see that map of set of multiverses which run program number n is a collection of intervals [i/2^n;2i/2^n) for i=1..2^(n-1). Thus its Lebesgue measure is 2^(n-1)/2^n=1/2.

comment by Vladimir_Nesov · 2010-11-11T23:02:50.142Z · LW(p) · GW(p)

If two theories imply different invisibles, they shouldn't be considered equivalent. That no evidence can tell them apart, and still they are not equal, is explained by them having different priors. But if two theories are logically (agent-provably, rather) equivalent, this is different, as the invisibles they imply and priors measuring them are also the same.

Replies from: Will_Sawin

↑ comment by Will_Sawin · 2010-11-12T15:00:37.724Z · LW(p) · GW(p)

Can a theory be proved logically equivalent to a theory with more, or fewer, morally valuable agents?

comment by steven0461 · 2010-11-11T21:16:34.999Z · LW(p) · GW(p)

The thing about positivism is it pretends to be a down-to-earth common-sense philosophy, and then the more you think about it the more it turns into this crazy surrealist madhouse. So we can't measure parallel universes and there's no fact of the matter as to whether they exist. But people in parallel universes can measure them, but there's no fact of the matter whether these people exist, and there's a fact of the matter whether these universes exist if and only if these people exist to measure them, so there's no fact of the matter whether there is a fact of the matter whether these universes exist. And meanwhile to the extent that these people exist, some of them claim there is no fact of the matter as to whether we exist, so really there's not one positivism, but there's a positivism for every quantum world. And in each of these positivisms, which worlds it's meaningful to talk about the existence of gets determined by some random process many times each second. And the question of whether other people in the same universe as you exist is meaningless too, because it makes no predictions that differ from those of the interpretation that other people are all well-disguised walruses who act exactly like people. And if you make an identical copy of yourself, you could end up being either one of them, so there's a 50% chance that there's a fact of the matter whether each continuation is a walrus. Etc.

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T21:27:43.556Z · LW(p) · GW(p)

I'm fine with throwing away positivism, as long as we find something viable to replace it with. If you think yielding identical observations does not make two theories equivalent, then what is your criterion for equivalence of theories? Or are all theories different and incompatible, so only one definition of real numbers can ever be "true"? This looks like replacing one surrealist madhouse with another.

Replies from: ata

↑ comment by ata · 2010-11-11T21:40:52.870Z · LW(p) · GW(p)

If you think yielding identical observations does not make two theories equivalent, then what is your criterion for equivalence of theories?

I could accept that two theories are equivalent if they yield identical observations to every possible observer, everywhere, or better yet, if they yield identical output for any given input if implemented as programs. If you write a program which simulates the laws of physics, and then you write another program which simulates "Odin" calling a function that simulates the laws of physics and doing nothing else, then I would accept that they represent equivalent theories, if they really do always result in the exact same (or isomorphic) output under every circumstance. (Though an Odin that impotent or constrained would be more of a weird programming mistake than a god.) But if the two programs don't systematically produce equivalent output for equivalent input, then they are not equivalent programs, even if none of the agents being simulated can tell the difference.

comment by Daniel_Burfoot · 2010-11-12T12:49:57.965Z · LW(p) · GW(p)

This theory makes exactly the same observational predictions as your current best theory of physics, so it lies in the same equivalence class and you should give it the same credence.

You're blurring an important distinction between two types of equivalence:

Empirical equivalence, where two program-theories give the same predictions on all currently known empirical observations.
Formal equivalence, where two program-theories give identical predictions on all theoretically possible configurations, and this can be proved mathematically.

If two theories are only empirically equivalent, you use the complexity penalty and prefer the simpler one. If the theories are formally equivalent, you don't bother trying to tell them apart. If you don't know which equivalence relation holds, you sit down and start doing math.

comment by NihilCredo · 2010-11-11T20:01:16.276Z · LW(p) · GW(p)

Boxes proofed against all direct and indirect observation, potential for observation mixed with concrete practicality of such observation, strictly-worse choices, morality... one would be hard-pressed to muddle your thought experiment more than that.

Let's try to make it a little more straightforward: assume that there exists a certain amount of physical space which falls outside our past light cone. Do you think it is equally likely that it contains galaxies and that it contains unicorns? More importantly, do you think the preceding question means anything?

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T21:02:29.015Z · LW(p) · GW(p)

In my example, as in many others, morality/utility is necessary for reducing questions about "beliefs" to questions about decisions. (Similar to how adding payoffs to the Sleeping Beauty problem clarifies matters a lot, and how naively talking about probabilities in the Absent-Minded Driver introduces a time loop.) In your formulation I may legitimately withhold judgment about unicorns - say the question is as meaningless as asking whether integers "are" a subset of reals, or a distinct set - because it doesn't affect my future utility either way. In my formulation you can't wiggle out as easily.

Replies from: NihilCredo

↑ comment by NihilCredo · 2010-11-11T21:15:59.973Z · LW(p) · GW(p)

[Edited out - I need to think this over a little longer]

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T22:09:06.475Z · LW(p) · GW(p)

I thought about your questions some more, and stumbled upon a perspective that makes them all meaningful - yes, even the one about defining the real numbers. You have to imagine yourself living in a sort of "Solomonoff multiverse" that runs a weighted mix of all possible programs, and act as if to maximize your expected utility over that whole multiverse. Never mind "truth" or "degrees of belief" at all! If Omega comes to you and asks whether an inaccessible region of space contains galaxies or unicorns, bravely answer "galaxies" because it wins you more cookies weighted by universe-weight - simpler programs have more of it. This seems to be the coherent position that many commenters seem to be groping toward...

comment by ata · 2010-11-11T19:52:56.903Z · LW(p) · GW(p)

I'm skeptical of the idea that the hypothesis "Odin created physics as we know it" would actually make no additional predictions over the hypothesis . I'm tempted to say that as a last resort we could distinguish between them by generating situations like "Omega asks you to bet on whether Odin exists, and will evaluate you using a logarithmic scoring rule and will penalize you that many utilons", though at this point maybe it is unjustified to invoke Omega without explaining how she knows these things.

But what do you think of some of the examples given in "No Logical Positivist I"?

On August 1st 2008 at midnight Greenwich time, a one-foot sphere of chocolate cake spontaneously formed in the center of the Sun; and then, in the natural course of events, this Boltzmann Cake almost instantly dissolved.

I would say that this hypothesis is meaningful and almost certainly false. Not that it is "meaningless". Even though I cannot think of any possible experimental test that would discriminate between its being true, and its being false.

Would that be in the same equivalence class as its negation?

Replies from: shokwave

↑ comment by shokwave · 2010-11-12T04:00:29.794Z · LW(p) · GW(p)

I'm skeptical of the idea that the hypothesis "Odin created physics as we know it" would actually make no additional predictions over the hypothesis

As the originator of that hypothesis, the idea I had in mind was that there are two theories: physics as we know it, and Odin created physics as we know it. Scientists hold the first theory, and Odinists hold the second. The theories predict exactly the same things, so that Odinists and scientists have the same answers to problems, but the Odinists theory lets them say that Odin exists - and they happen to have a book that starts with "If Odin exists, then..." and goes on to detail how they should live their lives. The scientists have a great interest in showing that the second theory is wrong, because the Odinists are otherwise justified in their pillaging of the scientists' home towns. But the Odinists are clever folk, and say we shouldn't expect to see anything different from the world as we know it because Odin is all-powerful at world-creation.

Honestly, I should have picked Loki.

Replies from: ata

↑ comment by ata · 2010-11-12T04:21:06.294Z · LW(p) · GW(p)

Are you defining Odin's role and behaviour such that he is guaranteed not to actually do anything that impinges on reality beyond creating the laws of physics that we already know? Or is it just that he hasn't interfered with anything so far, or hasn't interfered with anything in such a way that anything we can observe is different?

(Edit: I ask because any claims about metaethics that depend on Odin's existence would seem to require raising him to the level of a causally-active component of the theory rather than an arbitrary and unfalsifiable epiphenomenon.)

Replies from: shokwave

↑ comment by shokwave · 2010-11-13T06:21:27.971Z · LW(p) · GW(p)

I am defining him as being an arbitrary and unfalsifiable epiphenomenon everywhere excepting that he was causally active in the creation of the book that details the ethical lives Odinists ought to live. Basically, he hasn't interfered with anything in such a way that anything we could ever observe is different, except he wrote a book about it.

It's clear to me that anyone could choose to reject Odinism, but it's not clear what arguments other than a strong Occam's razor could convince a sufficiently reasonable and de-biased (ie genuinely truth-seeking) Odinist to give up their belief.

comment by thomblake · 2010-11-11T22:57:11.436Z · LW(p) · GW(p)

no matter how seductive it sounds, skipping over the is-ought distinction is not permitted

Yeah, some of us are still not convinced on that one.

Speaking of which, does anyone actually have something resembling a proof of this? People just seem to cast it about flippantly.

Replies from: Jack

↑ comment by Jack · 2010-11-12T01:50:26.822Z · LW(p) · GW(p)

So what Hume was talking about when he addressed this is just that people sometimes come to is conclusions on the basis of ought statements and ought statements on the basis of is statements. Hume makes that point that no rule in deductive logic renders this move valid. You would have to defend some introduction rule for ought. Or I guess throw out deductive logic.

That said, cousin_it's argument can be saved with a rather uncontroversial premise: The reason we don't want to send this person adrift is because we believe the "He will continue to be tortured even though we aren't observing him." This seems uncontroversial, my problem with the argument is that a) I'm not sure the hypothetical successfully renders the case "unobservable" and b)I'm not sure our evolved moral intuitions are equipped to rule meaningfully on such events.

Replies from: Perplexed

↑ comment by Perplexed · 2010-11-12T03:31:13.284Z · LW(p) · GW(p)

people sometimes come to is conclusions on the basis of ought statements and ought statements on the basis of is statements. Hume makes that point that no rule in deductive logic renders this move valid. You would have to defend some introduction rule for ought.

That is the first (Hume's) half of the argument. The second half is G.E. Moore's "open question" argument which tries to show that you can't come up with a valid introduction rule for ought by the obvious trick of defining "ought" in terms of simple concepts that don't already involve morality.

The irony here is the Hume is remembered for the "is/ought" thing even though he immediately proceeded to provide an account of "ought" in terms of "is". The way he did it is to break morality into two parts. The first part might be called the "moral instinct". But this is a real feature of human nature; it exists; it can be examined; it is something that lives entirely in the world of "is".

Of course, no one who thinks that there is something "spiritual" or "supernatural" about morality is particularly bothered by the fact that moral instincts are completely natural entities made out of "is" stuff. They maintain that there a second part to morality - call it "true morality" - and that the "moral instinct" is just an imperfect guide to "true morality". It is the "true morality" that owns the verb "ought" and hence it cannot be reduced to "is".

Hume is perfectly happy to have the distinction made between "moral instincts" and "true morality". He just disagrees that "true morality" is on any kind of higher plane. According to Hume, when you look closely, you will find that true morality, the ideal toward which our moral instincts tend, is nothing other than enlightened rational self interest, together with a certain amount of social convention - both of which can quite easily be reduced to "is".

So, I'm claiming that Hume made the first part of the argument precisely because he intended to define "ought" in terms of "is". But Moore came along later, didn't buy Hume's definition, and came up with the "open question" argument to 'prove' that no one else could define "ought" either.

Replies from: Will_Sawin

↑ comment by Will_Sawin · 2010-11-12T15:09:13.349Z · LW(p) · GW(p)

Isn't the problem that ought already has a definition?

"ought" is defined as "that stuff that you should do"

This definition sounds circular because it is. I can't physically point to an ought like I can an apple, but "ought" is a concept all human beings have, separate from learning language.

"is" is actually another example of this.

So the reason you can't define ought is the same reason that you can't define an apple as those red roundish things and then define an apple as a being capable of flight.

We can define new words, like Hume-ought, Utilitarian-ought, Eliezer-ought, based on what various people or schools of thought say those words mean. But "ought=Hume-ought" or whatever is not a definition, it's a statement of moral fact, and you can't prove it unless you take a statement of moral fact as an assumption.

Replies from: Perplexed

↑ comment by Perplexed · 2010-11-12T15:54:21.624Z · LW(p) · GW(p)

Isn't the problem that ought already has a definition?

"ought" is defined as "that stuff that you should do"

In a sense, that is exactly the point that Moore is making with the "open question" argument.

But the situation is a bit more complicated. The stuff you should do can be further broken down into "stuff you should do for your own sake" and "stuff you should for moral reasons". I.e. "ought" splits into two words - a practical-ought and a moral-ought.

Now, one way of looking at what Hume did is to say that he simply defined moral-ought as practical ought. A dubious procedure, as you point out. But another way of looking at what he did is that he analyzed the concept of 'moral-ought' and discovered a piece of it that seems to have been misclassified. That piece really should be classified as a variety of 'practical-ought'. And then, having gotten away with it once, he goes on to do it again and again until there is nothing left of independent 'moral-ought'. Dissolved away. What's more, if you are not strongly pre-committed to defending the notion of an independent moral 'ought', the argument can be rather convincing.

And as a supplementary incentive, notice that by dissolving and relocating the moral 'ought' in this way, Hume has solved the second key question about morality: "Now that I know how I morally ought to behave, what reason do I have to behave as I morally ought to behave? Hume's answer: "Because 'moral ought' is just a special case of 'practical ought'.

Replies from: thomblake

↑ comment by thomblake · 2010-12-02T16:52:47.596Z · LW(p) · GW(p)

And as a supplementary incentive, notice that by dissolving and relocating the moral 'ought' in this way, Hume has solved the second key question about morality: "Now that I know how I morally ought to behave, what reason do I have to behave as I morally ought to behave? Hume's answer: "Because 'moral ought' is just a special case of 'practical ought'.

Despite being a fellow-traveler in these areas, I had no idea Hume actually laid out all these pieces. I'll have to go read some more Hume. I tend to defend it as straightforward application of Sidgwick's definition of ethics coupled with the actual English meaning of 'should', but clearly a good argument preceding that by a century or two would be even better.

Replies from: Perplexed

↑ comment by Perplexed · 2010-12-02T17:14:33.946Z · LW(p) · GW(p)

I'll have to go read some more Hume.

Try this

And, indeed, to drop all figurative expression, what hopes can we ever have of engaging mankind to a practice, which we confess full of austerity and rigour? Or what theory of morals can ever serve any useful purpose, unless it can show, by a particular detail, that all the duties, which it recommends, are also the true interest of each individual? The peculiar advantage of the foregoing system seems to be, that it furnishes proper mediums for that purpose.

comment by Thomas · 2010-11-11T19:21:29.949Z · LW(p) · GW(p)

Is the Hulatioamiltonian formulation of classical mechanics apriori more probable than the Lagrangian formn?

They are both derivable from the same source, Newtonian mechanics plus the ZF Set theory. They are equivalent and therefore equally probable.

The shortest possible version of them all - mutually equivalent theories - is the measure how (equally) probable are they.

comment by prase · 2010-11-12T19:07:30.812Z · LW(p) · GW(p)

My favourite justification of the Occam razor is that even if two theories are equivalent in their explicit predictions, the simpler one is usually more likely to inspire correct generalisations. The reason may be that the more complicated the theory is, the more arbitrary constraints it puts on our thinking, and those constraints can prevent us from seeing the correct more general theory. For example, some versions of aether theory can be made eqivalent to special relativity, but the assumptions of absolute space and time make it nearly impossible to discover something equivalent to general relativity, starting from aether.

comment by DuncanS · 2010-11-12T00:26:04.222Z · LW(p) · GW(p)

I personally think Occam's razor is more about describing what you know. If two theories are equally good in their explanatory value, but one has some extra bells and whistles added on, you have to ask what basis you have for deciding to prefer the bells and whistles over the no bells and whistles version.

Since both theories are in fact equally good in their predictions, you have no grounds for preferring one over the other. You are in fact ignorant of which theory is the correct one. However, the simplest one is the one that comes closest to describing the state of your knowledge. The more complicated ones add extra bits that can only really be described as speculations, not knowledge at all, because all the extra bits of 'information' in that theory are not based on any data at all.

Perhaps a more complicated theory is true? Perhaps. But which one of the many many many more complicated theories should you pick? You have no evidence on which to make this choice.

Equally, one shouldn't be too doctrinaire about it. We don't know the simplest explanation is correct - we simply know it's the best way of describing what we know so far. If there are several similar theories of almost equal explanatory weight, there are grounds for reasonable agnosticism even if there is one that's a little 'lighter' than the others.

comment by Matt_Simpson · 2010-11-11T19:32:59.055Z · LW(p) · GW(p)

Adding premises like "Odin created everything" makes a theory less probable and also happens to make it longer; this is the entire reason why we intuitively agree with Occam's Razor in penalizing longer theories. Unfortunately, this answer seems to be based on a concept of "truth" granted from above - but what do differing degrees of truth actually mean, when two theories make exactly the same predictions?

and

Imagine you have the option to put a human being in a sealed box where they will be tortured for 50 years and then incinerated. No observational evidence will ever leave the box... Now consider the following physical theory: as soon as you seal the box, our laws of physics will make a localized exception and the human will spontaneously vanish. This theory makes exactly the same observational predictions as your current best theory of physics, so it lies in the same equivalence class and you should give them the same credence.

Why do we only care about observational predictions and not all "in principle" predictions? (Serious question, not rhetorical). My intuition is that in the first quote "Physics" and "Odin created physics" aren't in the same equivalence class because the latter makes an additional prediction: the existence of Odin. Similarly in the second quote, there are differing predictions about what is happening inside the box even if they are physically impossible to test, so I would put the two theories in different equivalence classes. I would do the same for MWI and Copenhagen quantum mechanics.

This is pure intuition on my part. I haven't had a chance to take a look at the math of K-complexity and the like, so I might just be missing something relatively basic.

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T20:00:50.136Z · LW(p) · GW(p)

A prediction that's impossible to test is a contradiction in terms. Show me any unfalsifiable theory, and I'll invent some predictions that follow from it, they will just be "impossible to test".

Replies from: Matt_Simpson, komponisto

↑ comment by Matt_Simpson · 2010-11-11T21:00:16.374Z · LW(p) · GW(p)

Ok, so don't call the existence of Odin or what's happening inside the box "predictions." Then I'll rephrase my question:

Why do we only care about "predictions" and not "everything a theory says about reality?" Clearly all three pairs of theories I mentioned above say different things about reality even if it is impossible in some sense to observe this difference.

(I'll add to this later, but I'm pressed for time currently) edit: nothing to add, actually

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T21:13:55.192Z · LW(p) · GW(p)

How can we distinguish statements that are "about reality" from statements that aren't, if we just threw away the criteria of predictive power and verification?

Replies from: Matt_Simpson

↑ comment by Matt_Simpson · 2010-11-11T21:57:26.570Z · LW(p) · GW(p)

How about counterfactual predictive power and verification? If I could observe the inside of that box, then I could see a difference between the two theories.

I realize this opens a potential can of worms, i.e., what sort of counterfactuals are we allowed to consider? But in any case, this is how I've understood the basic idea of falsifiability. Compare to Yvain's logs of the universe idea. (He's doing something different with it, I know)

↑ comment by komponisto · 2010-11-11T20:49:34.798Z · LW(p) · GW(p)

...and this is why Popperian falsificationism is wrong!

There aren't any "unfalsifiable" theories, though there may be unintelligible theories.

Replies from: None

↑ comment by [deleted] · 2010-11-11T20:57:49.080Z · LW(p) · GW(p)

I disagree, since prediction != theory. It is certainly possible to have a theory (e.g. Freud's ideas about the ego and superego) that make no predictions. In the comment above, cousin_it is correct in that "unfalsifiable prediction" is a contradiction, but "unfalsifiable theory" is not. It just means that the theory is not well-formed and does not pay rent.

Replies from: komponisto

↑ comment by komponisto · 2010-11-11T22:50:26.941Z · LW(p) · GW(p)

It is certainly possible to have a theory (e.g. Freud's ideas about the ego and superego) that make[s] no predictions.

Though cousin_it will have to speak for himself, I believe he was specifically disagreeing with this when he wrote:

Show me any unfalsifiable theory, and I'll invent some predictions that follow from it, they will just be "impossible to test".

comment by shokwave · 2010-11-12T05:26:15.103Z · LW(p) · GW(p)

Theories that require additional premises are less likely to be true, according to the eternal laws of probability ... Unfortunately, this answer seems to be based on a concept of "truth" granted from above - but what do differing degrees of truth actually mean, when two theories make exactly the same predictions?

Reading this and going back to my post to work out what I was thinking, I have a sort-of clarification for the issue in the quote. The original argument was that, before experiencing the universe, all premises are a priori equally likely, so we can generate as many hypotheses as we like out of them. Then, after experience (a posteriori) some premises are extremely likely (shorthand: true) and some are extremely unlikely (shorthand: false). Now, in our advanced position, there are a large number of false premises, a small number of true premises, and an unknown number of premises we haven't had experiences about. We can now generate as many "prediction-equivalent" theories as we like by combining the unknown premises with the true ones. As long as we avoid the false premises, all our hypotheses will be based on true premises, and premises which we have not yet checked. To refine that argument, it is the conjunction of specifically these unknown premises that might weaken the hypothesis. Therefore, we ought to include as few of these as-yet-untested premises in our hypothesis, in order to reduce the chances of it being wrong.

Now, in the case of two theories making the same prediction: I suggest that it is possible to look at an unknown premise and decide whether we can check it. In this sense, if it is checkable, we can view it as a prediction: the hypothesis includes premise C such that if C is true then the hypothesis is true and if C is false the hypothesis is false. In other words, the hypothesis makes a prediction that C is true. If it is uncheckable, though, we don't use the word prediction. This is the discussion that Matt_Simpson and cousin_it are having down below. If both theories make the same predictions because one says A B C and D, and the other says A B C E F G and H, (A and B are true, C is unknown-testable, D through H are unknown-untestable), then the theories are still distinguishable, still different, but they make the same predictions. In this specific case, we should in principle prefer the first theory, because it has one one fault-line and the second has four: even though we don't think we can test any of these fault-lines, the laws of probability still apply. So this is what degrees of truth mean when all theories make the same predictions.

Now, what about two different theories that say A B C and D? I'm not familiar with the physics, but it appears that Hamiltonian and Lagrangian systems are in this scenario: both say A B C and D, in different but equivalent ways. I haven't had enough time to think to bake up an answer for this, but I suspect it is similar to how you can express the same truth-table with different combinations of premises and operators. The question that stumps me is: in logic, we don't care about the operators except for how they twist the premises. In physics, we actually do care about the operators, to an extent: we give them names like "the mechanism of x" and "the underlying reality". So it seems to me that saying Lagrangian and Hamiltonian are different but equivalent is saying that two different logical formulations with the same truth-table are different but equivalent, except in physics we feel the difference actually matters.

comment by Zetetic · 2010-11-12T01:27:41.863Z · LW(p) · GW(p)

I wonder if this can't be considered more pragmatically? There was a passage in the MIT Encyclopedia of Cognitive Sciences in the Logic entry that seems relevant:

Johnson-Laird and Byrne (1991) have argued that postulating more imagelike MENTAL MODELS make better predictions about the way people actually reason. Their proposal, applied to our sample argument, might well help to explain the difference in difficulty in the various inferences mentioned earlier, because it is easier to visualize “some people” and “at least three people” than it is to visualize “most people.” Cognitive scientists have recently been exploring computational models of reasoning with diagrams. Logicians, with the notable exceptions of Euler, Venn, and Peirce, have until the past decade paid scant attention to spatial forms of representation, but this is beginning to change (Hammer 1995).

This made me think a bit differently about how we might choose between two abstract models with the same explanatory power. It seems that the rational thing to do is to choose the one that allows you to reason the most fluently so as to minimize the likelihood of fallacious reasoning.

In fact, it seems that we should expect the cognitive sciences to provide clues about how we could adjust formal systems with the view of easy of understanding and technical fluency when reasoning about/with them.

Taking this view; assuming we had finished physics, all the future work would be about tweaking the formalisms toward the most intuitive possible ones with respect to the knowledge we have of human reasoning. What would be important is that they be as easy to understand as possible. That way we could hope to ensure more efficiency in technological development as well as better general understanding among the public.

Replies from: gimpf

↑ comment by gimpf · 2010-11-12T13:14:49.937Z · LW(p) · GW(p)

I was thinking on a similar line:

Given that computation has costs, memory is limited, to make the best possible predictions given some resources one needs to use the computationally least expensive way.

Assuming that generating a mathematical model is (at least on average) more difficult for more complex theories, wasting time by creating (at the end equivalent) models by having to incorporate epiphenomenal concepts leads to practically worse predictions.

So not using the strong Occam's razor would lead to worse results.

And because we have taking moral issues with us: not using the best possible way would even be morally bad, as we would lose important information for optimizing our moral behavior, as we cannot look as far into the future/would have less accurate predictions at our disposal due to our limited resources.

ETA: The difference to your post above is mainly that this holds true for a perfect bayesian superintelligence still, and should be invariant to different computation substrate.

comment by humpolec · 2010-11-11T23:06:12.013Z · LW(p) · GW(p)

What if Tegmark's multiverse is true? All the equivalent formulations of reality would "exist" as mathematical structures, and if there's nothing to differentiate between them, it seems that all we can do is point to appropriate equivalence class in which "we" exist.

However, the unreachable tortured man scenario suggests that it may be useful to split that class anyway. I don't know much about Solomonoff prior - does it make sense now to build a probability distribution over the equivalence class and say what is the probability mass of its part that contains the man?

comment by hairyfigment · 2010-11-14T07:57:23.686Z · LW(p) · GW(p)

Theories that require additional premises are less likely to be true, according to the eternal laws of probability. Adding premises like "Odin created everything" makes a theory less probable and also happens to make it longer; this is the entire reason why we intuitively agree with Occam's Razor in penalizing longer theories. Unfortunately, this answer seems to be based on a concept of "truth" granted from above -

Not to me it doesn't. (Though I may not understand what you mean by "truth" here.) Bayesian probability theory as I've come to understand it deals with maps directly and with the territory only indirectly. It purports to describe how logic, or the laws of thought, apply to uncertainty. So we can describe in some detail how these laws demand a lesser probability for a compound hypothesis, without ever mentioning the reality you want this hypothesis to address. In that sense the math doesn't care about the content of your theories.

(I started to add a silly technical nitpick for something else on the site. Suffice it to say that numerical probabilities serve as maps of other maps, or measures of the trust that a 'rational' mind would have in those maps.)

So should we trust our 'logical' map-evaluating software in this respect? Well, it seems to work so far. As I understand it our mindless evolution created the basics by trial and error, after which we created more mathematics by similar methods. (We twisted our mathematical intuitions into strange shapes like "the square root of minus one" and kept whatever we found a use for.) Bayes' Theorem as we know it emerged from this process. So we can imagine discovering that probability or logic in general has misled us in some fundamental way. Perhaps the most complex possible theory is always correct and we just can't imagine said theory (or, indeed, a way for it to exist). But our internal software tells us not to expect this. ^_^

comment by Vaniver · 2010-11-13T19:58:25.902Z · LW(p) · GW(p)

Another intriguing answer came from JGWeissman. Apparently, as we learn new physics, we tend to discard inconvenient versions of old formalisms. So electromagnetic potentials turn out to be "more true" than electromagnetic fields because they carry over to quantum mechanics much better. I like this answer because it seems to be very well-informed!

I don't like this explanation- while potentials are useful calculation tools both macroscopically and quantum mechanically, fields have unique values whereas potentials have non-unique values. It's not clear to me how to compare those two benefits and decide if one is "more true."

The alternative way to look at it: if you only knew E&M, would you talk in terms of four-vector potentials or in terms of fields? Most of the calculations for complicated problems are easier with potentials (particularly for magnetism), but the target is generally coming up with the fields from those potentials. Similarly, most calculations in QM are easier with the potentials (I've never seen them done with fields, but I imagine it must be possible- you can do classical mechanics with or without Hamiltonians), but the target is wavefunctions or expectation values.

So it's not clear to me what it means to choose potentials over fields, or vice versa. The potentials are a calculation trick, the fields are real, just like in QM the potentials are a calculation trick, and the wavefunction is real. They're complementary, not competing.

Replies from: wnoise, Sniffnoy, Perplexed

↑ comment by wnoise · 2010-11-15T08:38:30.626Z · LW(p) · GW(p)

I don't like this explanation- while potentials are useful calculation tools both macroscopically and quantum mechanically, fields have unique values whereas potentials have non-unique values. It's not clear to me how to compare those two benefits and decide if one is "more true."

You can just as easily move to a different mathematical structure where the gauge is "modded out", a "torsor". Similarly, in quantum mechanics where the phase of the wavefunction has no physical significance, rather than working with the vectors of a Hilbert space, we work with rays (though calculational rules in practice reduce to vectors).

There are methods of gaugeless quantization but I'm not familiar with them, though I'd definitely like to learn. (I'd hope they'd get around some of the problems I've had with QFT foundations, though that's probably a forlorn hope.)

↑ comment by Sniffnoy · 2010-11-13T21:40:50.094Z · LW(p) · GW(p)

I don't like this explanation- while potentials are useful calculation tools both macroscopically and quantum mechanically, fields have unique values whereas potentials have non-unique values. It's not clear to me how to compare those two benefits and decide if one is "more true."

Immediate thought: Why not just regard the potentials as actual elements of a quotient space? :)

↑ comment by Perplexed · 2010-11-13T21:24:49.498Z · LW(p) · GW(p)

So it's not clear to me what it means to choose potentials over fields, or vice versa. The potentials are a calculation trick, the fields are real, just like in QM the potentials are a calculation trick, and the wavefunction is real. They're complementary, not competing.

Are you familiar with the Aharonov-Bohm effect? My understanding is that it is a phenomenon which, in some sense, shows that the EM potential is a "real thing", not just a mathematical artifact.

Replies from: Vaniver

↑ comment by Vaniver · 2010-11-13T22:34:52.048Z · LW(p) · GW(p)

I am and your understanding is correct for most applications. I don't think it matters for this question, as my understanding is that the operative factor behind the Aharonov-Bohm effect is the nonlocality of wavefunctions.* Because wavefunctions are nonlocal, the potential formulation is staggeringly simpler than a force formulation. (The potentials are more real in the sense that the only people who do calculations with forces are imaginary!)

You still have gauge freedom with the Aharonov-Bohm effect- if you adjust the four-potential everywhere, all it does is adjust the phase everywhere, and all you can measure are phase differences.

Although, that highlights an inconsistency: if I'm willing to accept wavefunctions as real, despite their phase freedom, then I should be willing to accept potentials are real, despite their gauge freedom. I'm going to think this one over, but barring any further thoughts it looks like that's enough to change my mind.

*I could be wrong: I have enough physics training to speculate on these issues, but not to conclude.

[edit] It also helps that Feynman, who certainly knows more about this than I do, sees the potentials as more real (I suppose this means 'fundamental'?) than the fields.

Replies from: wnoise

↑ comment by wnoise · 2010-11-15T08:41:51.010Z · LW(p) · GW(p)

wavefunctions as real, despite their phase freedom,

Heh. It gets worse. Typically one is taught that the wavefunction is defined up to a global constant. You might have thought that the difference in phase between two places would at least be well defined. This is true, so long as you stick to one reference frame. A Galilean boost will preserve the magnitude everywhere, but add a different phase everywhere.

comment by XiXiDu · 2010-11-12T14:01:14.078Z · LW(p) · GW(p)

Isn't this comment a shorter version of this post Belief in the Implied Invisible?

comment by cousin_it · 2010-11-12T06:02:34.766Z · LW(p) · GW(p)

I just added to the post.

comment by anonym · 2010-11-12T04:30:34.030Z · LW(p) · GW(p)

Your thought experiment of the person in the sealed torture box ignores the question of what evidence I have to believe that such a box exists and what evidence I have that the physical theory you've outlined is true (in the thought experiment).

The fact that a theory makes the same predictions as some other theory is irrelevant if I don't have good reason for thinking the theory might be true in the first place. The problem with "Odin created physics" is that I have no good reasons to believe in the existence of Norse gods and that the universe was created by one of them, just like I have no good reasons to believe that a localized exception to physics will occur for some particular box.

To phrase my point a little differently, I think we have to consider the genesis of the theory. If the theory exists because somebody appended "because God made it so" to some other existing theory, then it doesn't matter that it makes the same predictions -- it fails because there are no good reasons for thinking it might be the case. We must consider why the theory makes the predictions it does. "Odin created physics" makes the particular predictions it does for no other reason than that it was specifically designed to make exactly the same predictions (and no other predictions).

Replies from: khafra

↑ comment by khafra · 2010-11-12T16:30:28.109Z · LW(p) · GW(p)

Counting the genesis of the theory into its likelihood sounds a lot like couting the stopping condition of repeated trials.

Replies from: anonym

↑ comment by anonym · 2010-11-13T03:30:35.350Z · LW(p) · GW(p)

I meant something closer to determining whether the process by which a theory was created was a rational process based on evidence. "Odin create physics" is clearly not in that category, and neither is the torture box hypothesis.

comment by Nisan · 2010-11-12T04:06:20.135Z · LW(p) · GW(p)

I like this argument. But in this case I think there's another argument that doesn't rely on morality so much.

Your belief that the two theories in question will always make the same predictions is conditional on the box being perfectly sealed, and the universe continuing to expand forever. There's a small chance that these things are not true, and if that turns out to be the case, you may or may not expect to see the guy again, depending on what physical theory you believe in.

I think Matt Simpson is getting at this when he talks about counterfactual predictions.

You could counter this by inventing the physical theory that says that the guy in the box disappears, but if somehow you ever recover the box and open it, then the guy will reappear as if he had been there all along. But if we're going that far we may as well posit the theory that the Moon doesn't exist even though we can see it. The definition of existence should involve morality.

comment by Thomas · 2010-11-11T21:42:58.164Z · LW(p) · GW(p)

In other words. You can't make a theory less/more probable just by expressing it differently, with more/less words.

Only the shortest known formulation counts.

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T21:44:20.493Z · LW(p) · GW(p)

I wasn't trying to make this point. Since the last post I've updated my position and got rid of most of my certainty. Now it's all a mystery to me.

Replies from: Thomas

↑ comment by Thomas · 2010-11-12T07:53:09.475Z · LW(p) · GW(p)

An intellectually honest man. Enough for the beginning.

comment by AndyCossyleon · 2010-11-11T21:24:07.472Z · LW(p) · GW(p)

Doesn't the human inside qualify as an observer? For all we know, WE outside the box could be the ones tortured for 50 years and then incinerated once the button is pushed.

comment by ata · 2010-11-11T19:36:29.148Z · LW(p) · GW(p)

It probably has no consequences in reality, because no matter how seductive it sounds, skipping over the is-ought distinction is not permitted.

What, we can't just assume for now that torture is bad without getting into metaethics?

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T19:52:46.044Z · LW(p) · GW(p)

We can assume that, but we can't make the conclusion about Occam's Razor that my argument makes. There's a mistake in it somewhere. A statement like "torture is bad" can never imply a statement like "this physical or mathematical theory is true"; the world doesn't work like that.

Replies from: ata

↑ comment by ata · 2010-11-11T20:11:37.603Z · LW(p) · GW(p)

A statement like "torture is bad" can never imply a statement like "this physical or mathematical theory is true"; the world doesn't work like that.

Of course it can't imply it, but it can test whether you actually believe it. The bit that says "If you're still reluctant to push the button, it looks like you already are a believer in the 'strong Occam's Razor' saying simpler theories without local exceptions are 'more true'" sounds fine to me. Then the only question is, in the long run and outside the context of weird hypotheticals, whether this kind of thinking wins more than it loses.

But we can reframe it not to talk about morality, and keep things on the "is" side of the divide.

Suppose you are a paperclip maximizer, and imagine you have a sealed box with 50 paperclips in it. You have a machine with a button which, if pressed, will create 5 paperclips and give them to you, and will vaporize the contents of the box, while not visibly affecting the box or anything external to it. Consider the following physical theory: Right after you sealed the box, our laws of physics will make a temporary exception and will immediately teleport the paperclips to the core of a distant planet, where they will be safe and intact indefinitely. Given that this makes the same observational predictions as our current understanding of the laws of physics, would pressing the button be the paperclip-maximizing thing to do?

If I were a paperclip maximizer, I would not press the button. If that means accepting the "strong Occam's Razor", so be it.

Replies from: cousin_it, cousin_it

↑ comment by cousin_it · 2010-11-11T21:12:25.989Z · LW(p) · GW(p)

If I were a paperclip maximizer, I would not press the button.

This is begging the question. The answer depends on the implementation of the maximizer. Of course, if you have a "strong Occamian" prior, you imagine a paperclip maximizer based on that!

Replies from: ata, JGWeissman

↑ comment by ata · 2010-11-11T21:24:36.343Z · LW(p) · GW(p)

Okay, but... what decision actually maximizes paperclips? The world where the 50 paperclips have been teleported to safety may be indistinguishable, from the agent's perspective, from the world where the laws of physics went on working as they usually do, but... I guess I'm having trouble imagining holding an epistemology where those are considered equivalent worlds rather than just equivalent states of knowledge. That seems like it's starting to get into ontological relativism.

Suppose you've just pressed the button. You're you, not a paperclip maximizer; you don't care about paperclips, you just wanted to see what happens, because you have another device: it has one button, and an LED. If you press the button, the LED will light up if and only if the paperclips were teleported to safety due to a previously unknown law of physics. You press the button. The light turns on. How surprised are you?

↑ comment by JGWeissman · 2010-11-11T21:19:31.866Z · LW(p) · GW(p)

And a paperclipper with an anti-Occamian prior that does push the button is revealing a different answer to the supposedly meaningless question.

Either way, it is a assigning utility to stuff it cannot observe, and this shows that questions about the implied invisible, about the differences in theories with no observable differences, can be important.

↑ comment by cousin_it · 2010-11-11T21:05:03.550Z · LW(p) · GW(p)

If I were a paperclip maximizer, I would not press the button.

With all due respect, you don't know that. It depends on the implementation of the paperclip maximizer, and how to "properly" implement it is exactly the issue we're discussing here.

comment by inklesspen · 2010-11-11T18:39:19.401Z · LW(p) · GW(p)

I don't think that argument is even valid. After all, I have the option of putting a human in a box. If I do, one hypothesis states that the human will be tortured and then killed. The other hypothesis states that the human will "vanish"; it's not precisely clear what "vanish" means here, but I'm going to assume that since this state is supposed to be identical in my experience to the state in the first hypothesis, the human will no longer exist. (Alternative explanations, such as the human being transported to another universe which I can never reach, are even more outlandish.)

In either case, I am permanently removing a human from our society. On that basis alone, in the absence of more specific information, I choose not to take this option.

I think you will have to come up with a scenario where 'the action coupled with the more complicated explanation' is more attractive than both 'the action with the simpler explanation' and 'no action' in order to make this argument.

Replies from: cousin_it

↑ comment by cousin_it · 2010-11-11T19:56:05.346Z · LW(p) · GW(p)

I don't think you're addressing the core of the argument. Even if you don't actually press the button, how much disutility you assign to pressing it depends on your beliefs. If you think the action will cause 50 years of torture, you're a believer in the "strong Occam's Razor" and the proof is complete.

Replies from: Will_Sawin

↑ comment by Will_Sawin · 2010-11-12T15:17:52.850Z · LW(p) · GW(p)

A simple fix is to have the button-pressing also prevent, say, 45 years of observable torture. That gets you more complicated ethics, but that may be a sacrifice worth making to put the zero point between the two.

comment by Academian · 2010-11-11T17:38:59.113Z · LW(p) · GW(p)

Very nice summary!

The Strong Occam's Razor

Contents

74 comments