How seriously should I take the supposed problems with Cox's theorem?

post by jsalvatier · 2010-12-06T02:04:43.359Z · LW · GW · Legacy · 10 comments

I had been under the impression that Cox's theorem said something pretty strong about the consistent ways to represent uncertainty, relying on very plausible assumptions. However, I recently found this 1999 paper, which claims that Cox's result actually requires some stronger assumptions. I am curious what people here think of this. Has there been subsequent work which relaxes the stronger assumptions?

10 comments

Comments sorted by top scores.

comment by Perplexed · 2010-12-06T05:05:56.145Z · LW(p) · GW(p)

The wikipedia article on Cox's theorem mentions Halpern's 1999 paper and links to some subsequent work which seems to restore something like a status quo. But I haven't yet looked at any of the papers.

ETA: I've looked at the papers. I think I can recommend both the original 1999 paper by Halpern and this 2002 paper by Hardy.

To answer your title question, I would say that you shouldn't take the problems very seriously at all. Cox's theorem basically doesn't work for "small worlds" - i.e. models in which only a finite number of events exist. Cox's theorem does work if your model consists of a small world plus a fair coin which can be flipped an arbitrary number of times.

Somewhere in between those two points (small world and small world + coin), Cox's theorem switches from not working to working. Describing exactly where the switchover takes place may interest mathematicians, but it probably won't interest most Bayesians - or at least not Bayesians who are willing to carry coins in their pockets.

Replies from: jsalvatier
comment by jsalvatier · 2010-12-08T18:35:33.119Z · LW(p) · GW(p)

Interesting. Do they give a good intuition for why this change occurs?

Replies from: Perplexed
comment by Perplexed · 2010-12-08T19:12:12.086Z · LW(p) · GW(p)

The missing ingredient in a "small world" is roughly the continuity conditions that Jaynes calls "qualitative correspondence with common sense" in Chapter 2 of PT:TLoS. In terms of model theory, adding the coin means that the model now "has enough points".

Here is one way to think about it: One of the consequences of Cox's theorem is that

  • P(X) = 1 - P(~X)

Suppose you decided to graph P(X) against P(~X). But in a small world, there are only a finite number of events you can substitute-in for X. So your graph is just a finite set of colinear points - not a line. Many continuous functions can be made to fit those points. Add a coin to your world, and you can interpolate an event between any two events in your world. You get a dense infinity of points between 0 and 1. And that is all you need. Only a single unique function (y = 1 - x) can be fit to this data.

That was hand-waving, but I hope it helped.

Replies from: jsalvatier
comment by jsalvatier · 2010-12-08T19:51:58.405Z · LW(p) · GW(p)

It did help. I was expecting something like this. I still have to go look at the paper for some more clarification.

comment by Roko · 2010-12-07T22:37:36.664Z · LW(p) · GW(p)

Failures of Cox's theorem are more likely to come from unstated implicit assumptions than from this kind of mathematical pedantry.

Replies from: jsalvatier
comment by jsalvatier · 2010-12-08T18:31:25.071Z · LW(p) · GW(p)

Unstated implicit assumptions in Cox's theorem? That's exactly what this was about.

Replies from: Roko
comment by Roko · 2010-12-08T19:36:42.956Z · LW(p) · GW(p)

Now that the assumption of an infinite set of events has been made explicit, I don't think it's a problem. I think that other subtle violations of the axioms might be a problem, e.g. likelihoods not always comparable, etc would be more of a problem.

I'd like to see an example of a nonbayesian probability function in a finite world btw.

Replies from: jsalvatier
comment by jsalvatier · 2010-12-08T19:49:40.547Z · LW(p) · GW(p)

OK, fair enough, I guess the value of this paper was making that assumption explicit. Halpern's 1999 paper (Perplexed links) constructs such an example.

Replies from: Roko
comment by Roko · 2010-12-08T19:54:58.318Z · LW(p) · GW(p)

And is it in any way interesting? Does it allow you to do great inference beyond the ken of Bayesianism? Or is it just some annoying corner-case?

Replies from: jsalvatier
comment by jsalvatier · 2010-12-08T20:03:50.592Z · LW(p) · GW(p)

I haven't spent time understanding the example, but Perplexed's explanation of the need for infinite event space suggests it's not very interesting.