Occam alternatives

selylindi

Occam alternatives

post by selylindi · 2012-01-25T07:50:07.841Z · LW · GW · Legacy · 51 comments

51 comments

One of the most delightful things I learned while on LessWrong was the Solomonoff/Kolmogorov formalization of Occam's Razor. Added to what had previously been only an aesthetic heuristic to me were mathematical rigor, proofs of optimality of certain kinds, and demonstrations of utility. For several months I was quite taken with it in what now appears to me to be a rather uncritical way. In doing some personal research (comparing and contrasting Marian apparitions with UFO sightings), I encountered for the first time people who explicity rejected Occam's Razor. They didn't have anything to replace it with, but it set off a search for me to find some justification for Occam's Razor beyond aesthetics. What I found wasn't particularly convincing, and in discussion with a friend, we concluded that Occam's Razor feels conceptually wrong to us.

First, some alternatives for perspective:

Occam's Razor: Avoid needlessly multiplying entities.

All else being equal, the simplest explanation is usually correct.

(Solomonoff prior) The likelihood of a hypothesis that explains the data is proportional to 2^(-L) for L, the length of the shortest code that produces a description of at least that hypothesis.

(speed prior) The likelihood of a hypothesis that explains the data is proportional to 2^(-L-N) for L, the length of the shortest code that produces a description of at least that hypothesis, and N, the number of calculations to get from the code to the description.

Lovejoy's Cornucopia: Expect everything.

If you consider it creatively enough, all else is always equal.

(ignorance prior) Equally weight all hypotheses that explain the data.

Crabapple's Bludgeon: Don't demand it makes sense.

No set of mutually inconsistent observations can exist for which some human intellect cannot conceive a coherent explanation, however complicated. The world may be not only stranger than you know, but stranger than you can know.

(skeptics' prior) The likelihood of a hypothesis is inversely proportional to the number of observations it purports to explain.

Pascal's Goldpan: Make your beliefs pay rent.

All else being equal, the most useful explanation is usually correct.

(utilitarian prior) The likelihood of a hypothesis is proportional to the expected net utility of the agent believing it.

Burke's Dinghy: Never lose sight of the shore.

All else being equal, the nearest explanation is usually correct.

(conservative prior) The likelihood of a new hypothesis that explains the data is proportional to the Solomonoff prior for the Kolmogorov complexity of the code that transforms the previously accepted hypothesis into the new hypothesis.

Orwell's Applecart: Don't upset the applecart.

Your leaders will let you know which explanation is correct.

(social prior) The likelihood of a hypothesis is proportional to how recently and how often it has been proposed and to the social status of its proponents.

Obviously, some of those are more realistic than others. The one that initially leapt out at me was what I'll call Pascal's Goldpan. Granted, a human trying to understand the world and using the Goldpan would likely settle on largely the same theories as a human using the Razor since simple theories have practical advantages for our limited mental resources. But ideally, it seems to me that a rational agent trying to maximize its utility only cares about the truth insofar as truth helps it maximize its utility.

The illustration that immediately sprung to my mind was of the characters Samantha Carter and Jack O'Neill in the television sci-fi show Stargate: SG1. Rather frequently in the series, these two characters became stuck in a situation of impending doom and they played out the same stock responses. Carter, the scientist, quickly realized and lucidly explained just how screwed they were according to the simplest explanation, and so optimized her utility under the circumstances and essentially began preparing for a good death. O'Neill, the headstrong leader, dismissed her reasoning out of hand and instead pursued the most practical course of action with a chance of getting them rescued. My aesthetics prefers the O'Neill way over the Carter way: the Goldpan over the Razor.

Though it is no evidence at all, it is also aesthetically pleasing to me that the Goldpan unifies the Platonic values of truth, goodness, and beauty into a single primitive. I also like that it suggests an alternative to Tarski's definition of "truth". A proposition is true if the use of its content would be beneficial in all relevant utility functions. A proposition is false if the use of its content would be harmful in all relevant utility functions. A proposition is partly true and partly false if the use of its content would be beneficial for some relevant utility functions and harmful for others. A proposition can be neutral by being inapplicable or irrelevant to all relevant utility functions.

Critiques encouraged; I've no special commitment to the Goldpan. Are there good reasons to prefer Occam's Razor? Are there other hypothesis weighting heuristics with practical or theoretical interest?

51 comments

Comments sorted by top scores.

comment by Spurlock · 2012-01-25T13:07:21.411Z · LW(p) · GW(p)

The most notable problem with Pascal's Goldpan is that when you calculate the utility of believing a particular hypothesis, you'll find that there is a term in that equation for "is this hypothesis actually true?"

That is, suppose you are considering whether or not to believe that you can fly by leaping off a cliff and flapping your arms. What is the expected utility of holding this belief?

Well, if the belief is correct, there's a large utility to be gained: you can fly, and you're a scientific marvel. But if it's false, you may experience a tremendous disutility in the form of a gruesome death.

The point is that deciding you're just going to believe whatever is most useful doesn't even solve the problem of deciding what to believe. You still need a way of evaluating what is true. It may be that there are situations where one can expect a higher utility for believing something falsely, but as EY has touched on before, if you know you believe falsely, then you don't actually believe. Human psychology does seem to contain an element of Pascal's Goldpan, but that doesn't make it rational (at least not in the sense of "optimal"; it does seem to imply that at some point in our evolution such a system tended to win in some sense).

At present the best we can do seems to be keeping our truth-determining and our utility-maximizing systems mostly separate (though there may be room for improvement on this), and Occam's Razor is one of our tried-and-true principles for the truth-determining part.

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T16:11:10.211Z · LW(p) · GW(p)

That is, suppose you are considering whether or not to believe that you can fly by leaping off a cliff and flapping your arms. What is the expected utility of holding this belief?

I completely grant that this scheme can have disastrous consequences for a utility function that discounts consistency with past evidence, has short time horizons, considers only direct consequences, fails to consider alternatives, or is in any other way poorly chosen. Part of the point in naming it Pascal's Goldpan was as a reminder of how naive utility functions using it will be excessively susceptible to wagers, muggings, and so on. Although I expect that highly weighting consistency with past evidence, long time horizons, considering direct and indirect consequences, considering all alternative hypotheses, and so on would prevent the obvious failure modes, it may nevertheless be that there exists no satisfactory utility function that would be safe using the Goldpan. That would certainly be compelling reason to abandon it.

Replies from: alexflint

↑ comment by Alex Flint (alexflint) · 2012-01-29T10:10:39.570Z · LW(p) · GW(p)

The point is that to evaluate the utility of holding a belief, you need to have already decided upon a scheme to set your beliefs.

Replies from: selylindi

↑ comment by selylindi · 2012-01-31T20:16:07.526Z · LW(p) · GW(p)

That's a little too vaguely stated for me to interpret. Can you give an illustration? For comparison, here's one of how I assumed it would work:

A paperclip-making AI is given a piece of black-box machinery and given specifications for two possible control schemes for it. It calculates that if scheme A is true, it can make 700 paperclips per second, and if scheme B is true, only 300 per second. As a Bayesian AI using Pascal's Goldpan formalized as a utilitarian prior, it assigns a prior probability of 0.7 for A and 0.3 for B. Then it either acts based on a weighted sum of models (0.7A+0.3B) or runs some experiments until it reaches a satisfactory posterior probability.

That doesn't seem intractably circular.

Replies from: alexflint

↑ comment by Alex Flint (alexflint) · 2012-02-01T21:27:36.696Z · LW(p) · GW(p)

Occam's razor is the basis for believing that those experiments tell us anything whatsoever about the future. Without it, there is no way to assign the probabilities you mention.

Replies from: selylindi

↑ comment by selylindi · 2012-02-06T21:28:59.215Z · LW(p) · GW(p)

Clearly people who don't know about Occam's Razor, and people who explicitly reject it, still believe in the future. Just as clearly, we can use Occam's Razor or other principles in evaluating theories about what happened in the past. Your claim appears wholly unjustified. Was it just a vague hifalutin' metaphysical claim, or are there some underlying points that you're not bringing out?

Replies from: alexflint

↑ comment by Alex Flint (alexflint) · 2012-02-08T22:41:00.273Z · LW(p) · GW(p)

People who don't know about Newtonian mechanics still believe that rocks fall downwards, but people who reject it explicitly will have a harder time reconciling their beliefs with the continued falling of rocks. It would be a mistake to reject Newtonian mechanics, then say "people who reject Newtonian mechanics clearly still believe that rocks fall", then to conclude that there is no problem in rejecting Newtonian mechanics. Similarly, if you reject Occam's razor then you need to replace it with something that actually fills the explanatory gap -- it's not good enough to say "well people who reject Occam's razor clearly still believe Occam's razor", and then just carry right on.

comment by lavalamp · 2012-01-25T14:47:27.319Z · LW(p) · GW(p)

The illustration that immediately sprung to my mind was of the characters Samantha Carter and Jack O'Neill in the television sci-fi show...

Beware fictional evidence! Main characters in serial TV shows really should follow "Pascal's Goldpan", because that's the way their universe (usually) works! The episode wouldn't have been written if there wasn't a way out of the problem. I suspect that experiencing just two or three such "insoluble" problems get resolved ought to make a proper rationalist wonder if they were living in a fictional universe.

But our universe doesn't seem to have that property. (Or perhaps I'm just not a main character.) What is true seems to be true independent from how much utility I get from believing it.

BTW, that isn't keeping me from loving your coined expressions!

Replies from: Will_Newsome, faul_sname, Armok_GoB

↑ comment by Will_Newsome · 2012-01-26T00:45:33.441Z · LW(p) · GW(p)

It's worked for me many times in the past, but thus far I've refused to use it as a prior for future events simply because I am afraid of jinxing it. Which means yes, I've explicitly held an anti-inductive prior because pride comes before the fall, and holding this anti-inductive prior has resulted in continued positive utility. (I would say I was a main character, or at least that some unseen agenty process wanted me to believe I was a main character, but that would be like asking to be deceived. Or does even recognizing the possibility count as asking to be deceived?) Note that if my sub-conscious inductive biases are non-truth-trackingly schizophrenic then holding an explicit meta-level anti-inductive interpretation scheme is what saves me from insanity. I would hold many traditionally schizophrenic beliefs if I was using a truly inductive meta-level interpretation scheme, and I'm not actually sure I shouldn't be using such an inductive interpretation scheme. Given my meta-level uncertainty I refuse to throw away evidence that does not corroborate my anti-schizophrenic/anti-inductive prior. It's a tricky epistemic and moral situation to be in. "To not forget scenarios consistent with the evidence, even at the cost of overweighting them."

Replies from: lavalamp

↑ comment by lavalamp · 2012-01-26T01:07:50.186Z · LW(p) · GW(p)

I've refused to use it as a prior for future events simply because I am afraid of jinxing it.

Error: Conflict between belief module and anticipation module detected!

If the universe really followed Pascal's Goldpan, it seems like there ought to be some way to reliably turn that into a large amount of money...

Replies from: Will_Newsome, army1987

↑ comment by Will_Newsome · 2012-01-26T01:32:56.533Z · LW(p) · GW(p)

Yes, like I said, it's a tricky epistemic situation to be in.

Utility is more valuable than money. And the universe doesn't have to follow Pascal's Goldpan for you or for most people. It happens to for me, or so I anticipate but do not believe.

↑ comment by A1987dM (army1987) · 2012-02-08T23:42:12.547Z · LW(p) · GW(p)

If the universe really followed Pascal's Goldpan

Prior probabilities are a feature of maps, not of territories... or am I missing something?

↑ comment by faul_sname · 2012-01-25T22:52:54.906Z · LW(p) · GW(p)

Malthusian crisis: Solved for the foreseeable future. Cold war: solved, (mostly) Global warming: looked unsolvable, now appears to have feasible solutions.

It would appear that quite a number of problems that seemed unsolvable have been solved in the past. Of course, that could just be the anthropic principle talking.

↑ comment by Armok_GoB · 2012-02-12T11:37:10.708Z · LW(p) · GW(p)

It certainly does seem to have that property to me. Although I'd guess Eliezer is the main character and I'm (currently the backstory of) the sidekick or villain or something.

comment by [deleted] · 2012-01-25T07:57:59.369Z · LW(p) · GW(p)

It looks to me like the goldpan is circular, you need a probability to calculate expected utility to calculate prior probability.

Replies from: selylindi, atucker, lavalamp

↑ comment by selylindi · 2012-01-25T15:35:25.185Z · LW(p) · GW(p)

Consider this ugly ASCII version of the expression for AIXI found in this paper by Marcus Hutter,

a_k := arg max[a_k, SUM[o_k*r_k ... max[a_m, SUM[o_m*r_m, (r_k +...+ r_m) SUM[q:U(q,a_1...a_m) = o_1*r_1..o_m*r_m, 2^-l(q)] ]]...]] .

What I was thinking was to replace the inner sum for the Solomonoff prior, SUM[q:..., 2^-l(q)], with a repeat of the interleaved maxes and SUMs.

SUM[q:U(q,a_1...a_m)=o_1*r_1..o_m*r_m, max[a_k, SUM[o_k*r_k ... max[a_m, SUM[o_m*r_m, (r_k + ... + r_m)]]...]] ] .

Now that I write it out explicity, I see that, while it isn't circular, it's definitely double-counting. I'm not sure that's a problem, though. Initially, for all deterministic programs q that model the environment, it calculates its expected reward assuming each q one at a time. Then it weights all q by the rewards and acts to maximize the expected reward for that weighted combination of all q.

↑ comment by atucker · 2012-01-25T14:10:32.545Z · LW(p) · GW(p)

"I believe that I will get nigh-unbounded utility after I die, but I can't cause my death. "

↑ comment by lavalamp · 2012-01-25T15:00:04.691Z · LW(p) · GW(p)

Oh, I thought the point was that in universes where the goldpan is true, the universe is determined by the set of things that would give you high utility if you believed them. Which doesn't sound coherent to me (for one thing, there are probably multiple sets of such beliefs), so maybe you understood the meaning better than I...

comment by djcb · 2012-01-25T17:12:08.106Z · LW(p) · GW(p)

Diax's Rake Never believe a thing simply because you want it to be true

(the non-sentimental prior-- the universe doesn't care about you)

Replies from: WrongBot, CharlesR

↑ comment by WrongBot · 2012-01-25T18:14:09.296Z · LW(p) · GW(p)

Though note that this is compatible with Gardan's Steelyard (a.k.a. Occam's Razor).

↑ comment by CharlesR · 2012-01-25T19:45:01.146Z · LW(p) · GW(p)

Upvoted purely for the Anathem reference!

comment by kilobug · 2012-01-25T11:17:37.850Z · LW(p) · GW(p)

Interesting thought, but I fear it's not practical for the same reason you can't doublethink.

To evaluate how useful your explanation is, you need an accurate model of the world, in which you'll "run" your explanation and evaluate its consequences. Consider the belief in UFOs (to change from the usual belief in God). You can't evaluate how useful that explanation is regardless of whatever it is true or not. If there really are UFOs, the explanation will have a different utility than if there aren't. So the only way to compute the utility of the belief depends of the probability of UFOs existing. And of many other assumptions. Once you have knowledge required to evaluate how useful having the belief will be, it's too late. You can't forget what you know, you can't doublethink.

comment by DanielLC · 2012-01-26T01:10:21.363Z · LW(p) · GW(p)

Are there good reasons to prefer Occam's Razor?

The probability must add to one. There are exponentially more possibilities with linearly more complexity, so the probability would have to decrease exponentially on average. You can base it on something other than complexity, but whatever you use will still correlate to complexity.

Equally weight all hypotheses that explain the data.

There are infinitely many of them. Now what?

The likelihood of a hypothesis is inversely proportional to the number of observations it purports to explain.

How does that work? Saying the coin will land on heads or tails explains two observations. Saying it will land on heads explains one. Is it twice as likely to land on heads than land on heads or tails?

The likelihood of a new hypothesis that explains the data is proportional to the Solomonoff prior for the Kolmogorov complexity of the code that transforms the previously accepted hypothesis into the new hypothesis.

I don't think you could do this without violating the axioms of probability. Also, what's your first hypothesis?

If your first hypothesis is nothing, then until your first observation, you have the Solomonoff prior. If you do manage to obey the axioms of probability, you will have to use Solomonoff induction.

My aesthetics prefers the O'Neill way over the Carter way: the Goldpan over the Razor.

If that's your razor, you're doing it wrong. Everything is possible. The simplest explanation is just most likely. Preparing for your death may be the most likely action to be relevant, but it also doesn't do much to help.

In addition, the show skews the results by making O'Neill right much more often than reasonable. Of course it will look like it's a good idea to assume you have a good chance of survival if they only ever show the characters surviving.

comment by lavalamp · 2012-01-25T14:58:15.011Z · LW(p) · GW(p)

... instead pursued the most practical course of action with a chance of getting them rescued...

One more thing; this is actually rational, and doesn't violate occam's razor. You should take the course of action with the highest P(this is true) * P(I can be rescued|this is true).

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T16:31:02.436Z · LW(p) · GW(p)

You're right, of course. The complication is that P(this is true) will be different for Occam's Razor than for the other heuristics.

comment by Manfred · 2012-01-25T14:44:55.810Z · LW(p) · GW(p)

(ignorance prior) Equally weight all hypotheses that explain the data.

This prior is impossible to implement for almost all situations, because there is no way to assign uniform probabilities over an infinite number of things - you just end up with 0 everywhere, which isn't normalized, and hence is not a probability distribution.

(utilitarian prior) The likelihood of a hypothesis is proportional to the expected net utility of the agent believing it.

So problem one is that this is circular if someone values the truth.

Then if this is ignored, there's still a problem that it's unstable - someone who things a stock market crash is likely will think that it is high-utility to believe this, and someone who thinks a stock market crash is unlikely will think that their thought is the high-utility one, and so you have a feedback loop, naturally driving people to extremes but not favoring one extreme over another, which is bad since usually either A or ~A is true, but not both.

The third problem is sentences like "it is high utility to believe that this sentence is true and that Santa Claus exists," (or "you will go to heaven if you believe this sentence and God exists") though this can be solved by including a full zeroth-order theory of human utility inside what used to look like a one-sentence maxim.

However, it does have an important thing going for it - it's a first step towards thinking about what a self-modifying agent with a utility function would choose to use as a prior.

My aesthetics prefers the O'Neill way over the Carter way: the Goldpan over the Razor.

Is this because the show repeatedly portrays one as working and the other as not working? This sounds like the common "Hollywood rationality" trope.

A proposition is true if the use of its content would be beneficial in all relevant utility functions.

There is no such proposition, if we don't allow the Santa Claus sentences.

(social prior) The likelihood of a hypothesis is proportional to how recently and how often it has been proposed and to the social status of its proponents.

Yeah that's just a bad plan. "It is the highest-utility option for you to grovel before me, peons." And it has no reason to exist, since what are the people making the propositions using to guess at what's true?

The others didn't have any really killer (e.g. "the things this outputs aren't probabilities") flaws I could see - they just end you up with different sets of beliefs. That can still be bad, if the belief "that's a tiger about to leap at me" gets rated at low probability when it should have been high, but oh well.

Replies from: selylindi, army1987

↑ comment by selylindi · 2012-01-25T17:44:59.152Z · LW(p) · GW(p)

This prior is impossible to implement for almost all situations, because there is no way to assign uniform probabilities over an infinite number of things - you just end up with 0 everywhere, which isn't normalized, and hence is not a probability distribution.

Is there some reason we can't bypass this problem by using, say, the surreal numbers instead of the real numbers? This is a genuine question, not a rhetorical one.

So problem one is that this is circular if someone values the truth.

The same criticism applies to Occam's Razor and all a priori hypothesis weighting heuristics. A critic of Solomonoff induction would say "I value truth, not simplicity per se".

For your stock market example, note that any theory which lets you predict the stock market would be high-utility. Some theories are complete opposites, like "the stock market will crash" and "the stock market will boom". By symmetry I suspect that their weighted contributions to the sum of models will cancel out. The same goes for Santa sentences, "it is high utility to believe that this sentence is true and that Santa Claus exists" and "it is high utility to believe that this sentence is true and that Santa Claus does not exist" simply cancel out. Weighting hypotheses by utility doesn't pull them up by their own bootstraps. It's only a prior probability -- actually making decisions will depend on posterior probabilities.

Is this because the show repeatedly portrays one as working and the other as not working? This sounds like the common "Hollywood rationality" trope.

Of course. I assumed that readers of this site would know to take a fictional example as an illustration of a principle but not as evidence.

Yeah that's just a bad plan. "It is the highest-utility option for you to grovel before me, peons."

Haha, obviously so. But you and I probably both know people whom it describes, so it was amusing to include.

Replies from: Manfred

↑ comment by Manfred · 2012-01-25T20:18:29.045Z · LW(p) · GW(p)

Is there some reason we can't bypass this problem by using, say, the surreal numbers instead of the real numbers? This is a genuine question, not a rhetorical one.

I dunno, but I don't see any reason why that should be the case - especially on continuous domains.

So problem one is that this is circular if someone values the truth.

The same criticism applies to Occam's Razor

Not quite - Occam's razor is arbitrary, but it isn't circular. The circularity comes in from thinking both "it is good to believe true things" and "things that are good to believe are more likely to be true."

For your stock market example, note that any theory which lets you predict the stock market would be high-utility. Some theories are complete opposites, like "the stock market will crash" and "the stock market will boom". By symmetry I suspect that their weighted contributions to the sum of models will cancel out.

"By symmetry" seems to be misapplied in this particular case. Think of a single person with a single belief. What does that person do when faced with a choice of how to update their belief?

The same goes for Santa sentences, "it is high utility to believe that this sentence is true and that Santa Claus exists" and "it is high utility to believe that this sentence is true and that Santa Claus does not exist" simply cancel out.

There's still a problematic contradiction between "it is high utility to believe that this sentence is true and that Santa Claus exists" and "Santa Claus exists."

Replies from: selylindi

↑ comment by selylindi · 2012-01-26T15:11:13.541Z · LW(p) · GW(p)

I dunno, but I don't see any reason why that should be the case - especially on continuous domains.

Now that I consider it a bit more, since the number of deterministic programs for modeling an environment is countably infinite, it should only require a hyperreal infinitesimal weight to maintain conservation of probability. The surreals are completely overkill. And furthermore, that's only in the ideal case - a practical implementation would only examine a finite subset of programs, in which case the theoretical difficulty doesn't even arise.

Not quite - Occam's razor is arbitrary, but it isn't circular. The circularity comes in from thinking both "it is good to believe true things" and "things that are good to believe are more likely to be true."

It doesn't look circular to me. It just looks like a restatement. If you consider a specific model of an environment, the way you evaluate its posterior probability (i.e. whether it's true) is from its predictions, and the way you get utility from the model is also by acting on its predictions. The truth and the goodness of a belief end up being perfectly dependent factors when your posterior probability is dominated by evidence, so it doesn't seem problematic to me to also have the truth and goodness of a belief unified for evaluation of the prior probability.

"By symmetry" seems to be misapplied in this particular case. Think of a single person with a single belief.

Hm, I think you and I are viewing these differently. I had in mind an analogy to the AIXI model of AI: it's a single entity, but it doesn't have just a single belief. AIXI keeps all the beliefs that fit and weights them according to the Solomonoff prior, then it acts based on the weighted combination of all those beliefs. Now obviously I haven't done the math so I could be way off here, but I suspect that the appeal to symmetry works in the case of equal-and-opposite high-utility beliefs like the ones mentioned for the stock market and Santa clause precisely because the analogous AI model with the Goldpan would keeps all beliefs (in a weighted combination) instead of choosing just one.

Replies from: pengvado

↑ comment by pengvado · 2012-01-28T03:21:09.950Z · LW(p) · GW(p)

It should only require a hyperreal infinitesimal weight to maintain conservation of probability.

Doing arithmetic on infinities is not the same as doing infinite sequences of arithmetic. You can talk about a hyperreal-valued uniform prior, but can you actually do anything with it that you couldn't do with an ordinary limit? $EU = lim\_\{n\\to\\infty\} \\Sigma\_\{i=1\}^n\\Sigma\_\{o\\in outcomes\} U\(o\$ P(o%7CH_i)n%5E{-1})

The reasons that limit doesn't suffice to specify a uniform prior are: (1) The result of the limit depends on the order of the list of hypotheses, which doesn't sound very uniform to me (I don't know if it's worse than the choice of universal Turing machine in Solomonoff, but at least the pre-theoretic notion of simplicity comes with intuitions about which UTMs are simple). (2) For even more perverse orders, the limit doesn't have to converge at all. (Even if utility is bounded, partial sums of EU can bounce around the bounded range forever.)

Hyperreal-valued expected utility doesn't change (1). It does eliminate (2), but I think you have to sacrifice computability to do even that much: Construction of the hyperreals involves the axiom of choice, which prevents you from actually determining which real number is infinitesimally close to the hyperreal encoded by a given divergent sequence.

Replies from: selylindi

↑ comment by selylindi · 2012-01-31T20:38:47.500Z · LW(p) · GW(p)

Thanks!

You wrote the above in regards to hyperreal infinities, "the hyperreal encoded by a given divergent sequence". I'm under the impression that hyperreal infinitesimals are encoded by convergent sequences: specifically, sequences that converge to zero. The hyperreal [1, 1/2, 1/3, 1/4, ...] is the one that corresponds to the limit you gave. Does that adequately dispel the computability issue you raised?

In any case, non-computability isn't a major defect of the utilitarian prior vis-a-vis the also non-computable Solomonoff prior. It is an important caution, however.

Your first objection seems much more damaging to the idea of a utilitarian prior. Indeed, there seems little reason to expect max(U(o|Hi)) to vary in a systematic way with a useful enumeration of the hypotheses.

Replies from: pengvado

↑ comment by pengvado · 2012-02-01T22:20:31.003Z · LW(p) · GW(p)

A non-constant sequence that converges to zero encodes an infinitesimal, and I think any infinitesimal has an encoding of that form. But a sequence that's bounded in absolute value but doesn't converge, e.g. $y\_i=\\sin\(i\$ ), also encodes some real plus some infinitesimal. It's this latter kind that involves the axiom of choice, to put it in an equivalence class with some convergent sequence.

[1, 1/2, 1/3, 1/4, ...] is the infinitesimal in the proposed definition of a uniform prior, but the hyperreal outcome of the expected utility calculation is $\[U\(o|H\_1\$ ,%20{1\over%202}\sum_{i=1}%5E2U(o%7CH_i),%20{1\over%203}\sum_{i=1}%5E3U(o%7CH_i),%20...]) which might very well be the divergent kind.

Agreed that my first objection was more important.

↑ comment by A1987dM (army1987) · 2012-02-08T23:50:46.594Z · LW(p) · GW(p)

This prior is impossible to implement for almost all situations, because there is no way to assign uniform probabilities over an infinite number of things - you just end up with 0 everywhere, which isn't normalized, and hence is not a probability distribution.

If you have enough data, you can get a proper posterior from an improper prior.

Replies from: Manfred

↑ comment by Manfred · 2012-02-09T02:22:04.117Z · LW(p) · GW(p)

That's true. I guess it could be your "prior" even if it's not a probability distribution. You don't just need "enough" data, then - you need data that gives you quickly-decreasing likelihoods as a function of some parameter that numbers finite bunches of hypotheses. Which I don't think is all that common.

comment by Dan_Moore · 2012-01-25T14:28:48.785Z · LW(p) · GW(p)

Upvoted for seeking a tiebreaker among tiebreaking heuristics.

comparing and contrasting Marian apparitions with UFO sightings

Any conclusions you care to share?

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T15:55:09.055Z · LW(p) · GW(p)

Marian apparitions have a very low rate of being officially approved by the various Christian groups that do that. Even ignoring Occam's Razor and considering only a person who already believes Marian doctines, that should give the apparitions very low prior probability estimates. There are putative Marian apparitions with good evidence for something unusual happening, such as Fatima's "Miracle of the Sun", but the evidence seems to be completely independent of Christian belief.

UFO sightings that I've read about always have dramatically simpler explanations than "that's an alien craft defying our known physics". I tried investigating UFO sightings by the same standards used for Marian apparitions, and it was roughly a wash.

comment by Solvent · 2012-01-25T08:57:56.498Z · LW(p) · GW(p)

The likelihood of a hypothesis is proportional to the expected net utility of the agent believing it.

I'm sure there's some way to translate that into expected return from betting on it.

Pascal's Goldpan is an interesting idea. However, it produces all kinds of problems. Over which time period are we defining utility? Surely most pieces of knowledge would have either no effect on utility, regardless of truth, or wildly unpredictable effects, and in some cases negative effects. I don't think it's useful.

Knowledge needs to constrain expectations. Usefulness is a helpful test of whether it does that, but doesn't work as a general definition.

Kudos for originality, though.

comment by prase · 2012-01-25T19:01:02.026Z · LW(p) · GW(p)

The one that initially leapt out at me was what I'll call Pascal's Goldpan. Granted, a human trying to understand the world and using the Goldpan would likely settle on largely the same theories as a human using the Razor since simple theories have practical advantages for our limited mental resources.

It doesn't require much mental effort to postulate the hypothesis "I will certainly become a millionaire tomorrow" and its expected utility is pretty high if I happen to believe it. Usefulness is a good criterion for directing future research, but I can't imagine its sensible use in assigning priors to already stated hypotheses.

comment by timtyler · 2012-01-25T13:41:59.398Z · LW(p) · GW(p)

All else being equal, the most useful explanation is usually correct.

If living in an overwhelmingly Christian-dominated society, the most useful explanation for your existence is that god in heaven created you - but that's not the correct explanation.

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T14:13:39.529Z · LW(p) · GW(p)

If your utility function heavily weights "achieve high social status", then I'd agree. In fact, I think this is precisely what most U.S. politicians are doing when professing their religion publicly. You don't see them actually trying to live simply or give everything they own to the poor.

If your utility function puts more weight on almost anything else, such as building reliable technology, expressing yourself in art, or maintaining internal consistency among your beliefs, then this conclusion does not follow.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T15:38:53.597Z · LW(p) · GW(p)

I was assuming that things like not being ostracised as a heathen, bot being burned at the stake for witchcraft and finding a mate were things of value. If that is not true, then the original point is still valid, but you might like to make up another example to illustrate it.

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T17:10:54.227Z · LW(p) · GW(p)

Unless they have telepathy or brain scanning technology, people would ostracize, burn, or marry you based on your actions, not your beliefs. After Kim Jong Il's death, people had to display grief, but they didn't have to feel grief.

Barring telepathy, brain-scanning tech, or an inability to lie well, I expect that actually believing the zeitgeist of a totalitarian society is of lower utility than just pretending to believe it.

Replies from: timtyler

↑ comment by timtyler · 2012-01-25T17:33:18.784Z · LW(p) · GW(p)

The problem there is that humans are good lie detectors and bad liars. They are built that way: lie detecting is important because our ancestors may have had to deal with the best liars - while they could often get by without being good liars more easily. So: the safest way to make people think that you believe something is often to actually believe it.

comment by Alex Flint (alexflint) · 2012-01-29T10:15:54.248Z · LW(p) · GW(p)

Occam\s razor is famously difficult to justify except by circular appeal to itself. It's interesting to think of alternatives but you should be aware of what you give up when you give up Occam's razor. You can no longer make sensible inferences about the future based on your past experiences. For example, you can no longer have any confidence that the direction of gravity will still point downwards tomorrow, or that the laws of physics won't spontaneously change a minute from now. The experimental method itself no longer makes sense if you have no reason to think that the future will resemble the past.

You should read:

comment by Shmi (shminux) · 2012-01-26T01:23:47.668Z · LW(p) · GW(p)

None of these are worth anything unless you look for testable predictions, not for simply explaining the existing data. The problem with the UFO explanation is not that it has a fantastically low Solomonoff prior, but that it predicts nothing that would differentiate it from an explanation with a better prior.

In that vein, the Pascal's Goldpan with a very specific utility function is one way to go: construct (independent) testable predictions and estimate the a priori probability of each one, then add (negative logs of) the probabilities for each model, and call it the utility function. Basically, the more predictions and the unlikelier they are, the higher the utility of a given model. Then test the predictions. Among the models where every prediction is confirmed, pick one with the highest utility.

I'm sure this approach has a name, but google failed me...

comment by Peter Wildeford (peter_hurford) · 2012-01-26T01:14:36.574Z · LW(p) · GW(p)

I always liked what, I think, was Karl Popper's defense of Okham's Razor -- simpler theories are easier to falsify.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2012-01-28T06:13:07.817Z · LW(p) · GW(p)

I've never really understood this argument. In general, for two separate hypotheses A and B, by most notions of more complicated "A and B" will be more complicated but often easier to falsify. This will apply trivially when both hypotheses talk about completely different domains. So for example, "The sun is powered by fusion" and "Obama will be reelected" are both falsifiable but ""The sun is powered by fusion and Obama will be reelected" is at least as easy to falsify. This is connected to the conjunction fallacy. There may be a notion of natural hypotheses that don't do things that just connect hypotheses about very separate areas, but I don't know any way of making that precise.

comment by dbaupp · 2012-01-25T12:09:44.777Z · LW(p) · GW(p)

Are there good reasons to prefer Occam's Razor?

An aesthetic (with a hint of rigor, but only a hint) one is: if one is enumerating (and running) Turing machines (using a correspondence between them and the naturals), then ones with a short code (i.e. small description number) will be reached first.

(This argument is working on the premise that a Turing machine could, in theory, simulate a universe such as ours (if not exactly, then at least approximate it, arbitrarily accurately)).

And so, if one then has some situation where the enumeration has some chance of being stopped after each machine, then those with shorter codes have a greater likelihood than those with longer codes. (Or some other situation, like (for instance) the enumeration is occurring in a universe with finite lifespan)

Replies from: selylindi

↑ comment by selylindi · 2012-01-25T14:20:48.131Z · LW(p) · GW(p)

This paper by Russell Standish expresses a similar notion.

Given that the Solomonoff prior is the main reason for arguing in favor of the Mathematical Universe Hypothesis, it is potentially circular to then use the MUH to justify Occam's Razor.

comment by A1987dM (army1987) · 2012-02-08T23:44:25.879Z · LW(p) · GW(p)

Are you sure that wherever you say “likelihood” you don't mean “prior probability” instead?

comment by DanielLC · 2012-01-26T01:04:25.137Z · LW(p) · GW(p)

Are there good reasons to prefer Occam's Razor?

Equally weight all hypotheses that explain the data.

There are infinitely many of them. Now what?

The likelihood of a hypothesis is inversely proportional to the number of observations it purports to explain.

The likelihood of a new hypothesis that explains the data is proportional to the Solomonoff prior for the Kolmogorov complexity of the code that transforms the previously accepted hypothesis into the new hypothesis.

I don't think you could do this without violating the axioms of probability. Also, what's your first hypothesis?

If your first hypothesis is nothing, then until your first observation, you have the Solomonoff prior. If you do manage to obey the axioms of probability, you will have to use Solomonoff induction.

My aesthetics prefers the O'Neill way over the Carter way: the Goldpan over the Razor.

Occam alternatives

Contents

51 comments