In Defense of Objective Bayesianism: MaxEnt Puzzle.

post by Larks · 2011-01-06T00:56:50.739Z · LW · GW · Legacy · 11 comments

Contents

11 comments

In Defense of Objective Bayesianism by Jon Williamson was mentioned recently in a post by lukeprog as the sort of book that should be being read by people on Less Wrong. Now, I have been reading it, and found some of it quite bizarre. This point in particular seems obviously false. If it’s just me, I’ll be glad to be enlightened as to what was meant. If collectively we don’t understand, that’d be pretty strong evidence that we should read more academic Bayesian stuff.

Williamson advocates use of the Maximum Entropy Principle. In short, you should take account of the limits placed on your probability by the empirical evidence, and then choose a probability distribution closest to uniform that satisfies those constraints.

So, if asked to assign a probability to an arbitrary A, you’d say p = 0.5. But if you were given evidence in the form of some constraints on p, say that p ≥ 0.8, you’d set p = 0.8, as that was the new entropy-maximising level. Constraints are restricted to Affine constraints. I found this somewhat counter-intuitive already, but I do follow what he means.

But now for the confusing bit. I quote directly;

 

“Suppose A is ‘Peterson is a Swede’, B is ‘Peterson is a Norwegian’, C is ‘Peterson is a Scandinavian’, and ε is ‘80% of all Scandinavians are Swedes’. Initially, the agent sets P(A) = 0.2, P(B) = 0.8, P(C) = 1 P(ε) = 0.2, P(A & ε) = P(B & ε) = 0.1. All these degrees of belief satisfy the norms of subjectivism. Updating by maxent on learning ε, the agent believes Peterson is a Swede to degree 0.8, which seems quite right. On the other hand, updating by conditionalizing on ε leads to a degree of belief of 0.5 that Peterson is a Swede, which is quite wrong. Thus, we see that maxent is to be preferred to conditionalization in this kind of example because the conditionalization update does not satisfy the new constraints X’, while the maxent update does.”

p80, 2010 edition. Note that this example is actually from Bacchus et al (1990), but Williamson quotes approvingly.

 

His calculation for the Bayesian update is correct; you do get 0.5. What’s more, this seems to be intuitively the right answer; the update has caused you to ‘zoom in’ on the probability mass assigned to ε, while maintaining relative proportions inside it.

As far as I can see, you get 0.8 only if we assume that Peterson is a randomly chosen Scandinavian. But if that were true, the prior given is bizarre. If he was a randomly chosen individual, the prior should have been something like P(A & ε) = 0.16 P(B & ε) = 0.04 The only way I can make sense of the prior is if constraints simply “don’t apply” until they have p=1.

Can anyone explain the reasoning behind a posterior probability of 0.8?

11 comments

Comments sorted by top scores.

comment by lukeprog · 2011-01-11T16:46:12.221Z · LW(p) · GW(p)

I contacted Williamson about this, and he wrote back:

If I remember rightly, Bacchus et al were suggesting that, if you all you learn is that 80% of all Scandinavians are Swedes then you should by default believe that Peterson is a Swede to degree 0.8 (with no assumption of random sampling). This seems right to me as a default principle, given that you fully believe that Peterson is Scandinavian. But the norms of subjectivism do not require you to set P(Peterson is a Swede|80% of all Scandinavians are Swedes) to be 0.8. Indeed this conditional probability can be more or less anything – it is subjective. So if it isn’t set to be 0.8 there is a mismatch between what Bayesian conditionalisation requires and the rational course of action.

I hope this makes more sense now, and apologise if the presentation in the book was a bit terse!

Replies from: Larks
comment by Larks · 2011-01-12T14:51:02.180Z · LW(p) · GW(p)

Ahhh. Thanks for emailing, and for his relying.

It seems like you could just say, "The odds of a coin being heads are 1/2. However, subjectivism allows you to say they're 1 (or .9999 at any rate). Hence, subjectivism is wrong."

comment by Oscar_Cunningham · 2011-01-06T18:39:05.843Z · LW(p) · GW(p)

So, if asked to assign a probability to an arbitrary A, you’d say p = 0.5. But if you were given evidence in the form of some constraints on p, say that p ≥ 0.8, you’d set p = 0.8, as that was the new entropy-maximising level. Constraints are restricted to Affine constraints. I found this somewhat counter-intuitive already, but I do follow what he means.

This bit doesn't make sense either, what kind of evidence imposes a condition on your personal (subjective) probability?

Replies from: endoself
comment by endoself · 2011-01-06T20:29:05.820Z · LW(p) · GW(p)

A subjective probability is not arbitrary. It is the most accurate estimate possible given the evidence to the subject. See http://lesswrong.com/lw/s6/probability_is_subjectively_objective .

Replies from: Oscar_Cunningham
comment by Oscar_Cunningham · 2011-01-06T22:35:55.786Z · LW(p) · GW(p)

I think you're misunderstanding me, but I can't think of a way to better phrase what I said, sorry.

Maybe if I put it thus: I can't think of a situation where some evidence E will make my posterior probability P(A|E) greater than 0.8 regardless of my prior P(A).

Replies from: endoself
comment by endoself · 2011-01-06T23:12:42.454Z · LW(p) · GW(p)

Yeah, re-reading the quote, I see what you mean. He seems to have confused a frequency with a probability distribution over possible values of the frequency. Maybe that's why he made the other error that the post discusses.

comment by Jack · 2011-01-06T05:56:06.048Z · LW(p) · GW(p)

Are there other examples that have the same bizarrely wrong results?

Replies from: Larks
comment by Larks · 2011-01-07T01:40:01.349Z · LW(p) · GW(p)

There are a couple of others, all of which seem to rely on being told probabilities. You assign P(A) = 0.5, and then get told that P(A) = 0.7.

It seems that either P(A) = 0.7 is someone else's degree of belief, in which case Aumann comes into play, or a statement about what your degree of belief should be, given your evidence. But idealised Bayesian agents don't make that sort of mistake!

comment by jsalvatier · 2011-01-06T01:49:15.208Z · LW(p) · GW(p)

The prior given implies there was other evidence that Peterson was a Swede and Williamson wants to ignore it, but doesn't explain why (that you've quoted anyway).

Replies from: Larks
comment by Larks · 2011-01-06T01:52:24.180Z · LW(p) · GW(p)

I quoted the entire section. I suppose I could see if there's any more in Bacchus.

comment by endoself · 2011-01-06T04:32:35.036Z · LW(p) · GW(p)

This seems wrong to me too. His prior shows that he has some other information about Peterson, but then he throws it out.

P(A|ε) = P(A & ε)/P(ε) = 0.1/0.2 = 0.5 != 0.8

This failure to use all of the evidence is exactly what Jaynes warns against for much of PT:LoS.