What's going on with this failure of Bayes to converge?

post by Orborde · 2019-12-19T03:58:13.883Z · LW · GW · 3 comments

This is a link post for https://a-point-in-tumblspace.tumblr.com/post/189588000957/bayes-trubs-part-1

Contents

3 comments
There are circumstances (which might only occur with infinitesimal probability, which would be a relief) under which a perfect Bayesian reasoner with an accurate model and reasonable priors – that is to say, somebody doing everything right – will become more and more convinced of a very wrong conclusion, approaching certainty as they gather more data.

(click through the notes on that post to see some previous discussion)

I have two major questions:

1. Is this exposition correctly capturing Freedman's counterexample?

2. If using a uniform prior sometimes breaks, what prior should I be using, and, more importantly, how do I arrive at that prior?

3 comments

Comments sorted by top scores.

comment by habryka (habryka4) · 2019-12-19T04:13:23.067Z · LW(p) · GW(p)

Tremorbond on Tumblr replies, which I find reasonably compelling: 

The probability that theta is exactly 0.25 is not just practically 0 but actually 0. It doesn’t seem at all a problem to me that a bayesian is not able to learn that something they assigned a prior of 0 to is not true.

If you restrict your problem statement to only talk about ranges, the issue disappears. (I’m quite confident but have not checked the math on this.) It only looks wrong because you assume a point value theta, but test the model on ranges of theta.

In general, if you have a smooth distribution over some range of the real numbers, then any individual point in that range will have 0 probability assigned to it, so you can't expect to come to accurate beliefs about point-values (I think), but you can expect to have accurate beliefs about arbitrary ranges.  

comment by omegastick (isaac-poulton) · 2019-12-19T08:15:42.632Z · LW(p) · GW(p)

This highlights an interesting case where pure Bayesian reasoning fails. While the chance of it occurring randomly is very low (but may rise when you consider how many chances it has to occur), it is trivial to construct. Furthermore, it potentially applies in any case where we have two possibilities, one of which continually becomes more probable while the other shrinks, but persistently doesn't become disappear.

Suppose you are a police detective investigating a murder. There are two suspects: A and B. A doesn't have an alibi, while B has a strong one (time stamped receipts from a shop on the other side of town). A belonging of A's was found at the crime scene (which he claims was stolen). A has a motive: he had a grudge against the victim, while B was only an acquaintance.

A naive Bayesian (in both senses) would, with each observation, assign higher and higher probabilities to A being the culprit. In the end, though, it turns out that B commited the crime to frame A. He chose someone B had a grudge against, planted the belonging of A's, and forged the receipts.

It's worth noting that, assuming your priors are accurate, given enough evidence you *will* converge on the correct probabilities. Actually acquiring that much evidence in practice isn't anywhere near guaranteed, however.

comment by Oskar Mathiasen (oskar-mathiasen) · 2019-12-21T00:09:34.498Z · LW(p) · GW(p)

They seem to forget to first condition on the fact that the threshold must be an integer. This narrows the possibility space to have size countable infinit rather than uncountable infinit. Meaning they need to do a completely different mahtematics, which gives the correct ressult