Statisticsish Question

post by damang · 2011-11-28T16:03:17.389Z · LW · GW · Legacy · 13 comments

This is a question really, not a post, I just can't find the answer formally. Does laplace's rule of succession work when you are taking from a finite population without replacement? If I know that some papers in a hat have "yes" on them, and I know that the rest don't, and that there is a finite amount of papers, and every time I take a paper out I burn it, but I have no clue how many papers are in the hat, should I still use laplace's rule to figure out how much to expect the next paper to have a "yes" on it? or is there some adjustment you make, since every time I see a yes paper the odds of yes papers:~yes papers in the hat goes down.

13 comments

Comments sorted by top scores.

comment by janos · 2011-11-28T17:16:09.283Z · LW(p) · GW(p)

If your prior distribution for "yes" conditional on the number of papers is still uniform, i.e. if the number of papers has nothing to do with whether they're "yes" or not, then the rule still applies.

Replies from: Manfred
comment by Manfred · 2011-11-28T19:02:08.784Z · LW(p) · GW(p)

Add-on:

You can make the analogy clearer if you imagine, instead of rummaging around in a hat, you lined up all the strips of paper in random order and read them one at a time. Then it makes sense that the total number of slips of paper shouldn't matter.

Replies from: damang
comment by damang · 2011-11-29T11:21:52.163Z · LW(p) · GW(p)

This still doesn't seem right to me. If a paper is the third paper, than the n-3 remaining papers will not have the same thing written on them as the 3d paper, and therefor it is less likely that I will observe whatever the 3d paper was than it was when I started. In the hat with replacement I have an even chance of seeing each one after I have observed it.

It stands to reason that if there were N papers, Y/N of them yeses, if I see and remove a y at the first trial, P(y_2|y_1) = Y-1/N-1 and this now becomes our prior and we use the same rule if we see another yes, if we ~yes, P(y_2|~y_1) = Y/N-1. Under this reasoning, it is clear that without replacement, as you remove yeses, you should expect nos more often because there are less yeses left.

Replies from: Manfred
comment by Manfred · 2011-11-29T20:59:15.651Z · LW(p) · GW(p)

The reason it seems that way is because you are imagining holding the number of Ys constant. However, if the number of Ys is unknown, you have to figure out what proportion of the cards say Y as you go along, so you get a different result.

Maybe an analogy will help. Because you draw the slips of paper in random order, they will not be correlated with each other except through the total percentages that say Y and N. Analogously, if you flip a weighted coin, the flips will not be correlated with each other except through the bias of the coin. Drawing a slip of paper follows the exact same mathematical rules as flipping a weighted coin. And so since Laplace's rule of succession works for the weighted coin, it also works for the slips of paper.

Since you're already thinking about keeping the number of Ys fixed, you may object, "but the number of Ys is fixed in the case of the papers and not fixed in the case of the coin, so they must be different." So we can go a step further and imagine someone else flipping the coin, and then writing down what they get. Now when we read the papers, there is a fixed number of Ys, but since it's the same coinflips all along, the probability of seeing Y or N is exactly the same. This demonstrates that having a finite amount of stuff doesn't really matter, what matters is the mathematical rules that stuff follows.

Replies from: damang
comment by damang · 2012-06-06T17:54:05.905Z · LW(p) · GW(p)

Thanks :)

comment by Paul Crowley (ciphergoth) · 2011-11-29T08:22:45.698Z · LW(p) · GW(p)

Yes, Laplace's rule works in this instance. Assume you have a printer that prints out papers that say either "yes" or "no", each independently identically distributed with unknown (and uniformly distributed) p. If you pull the papers directly from the printer you have a classic Laplace's rule situation. If you print out N papers without looking at them, then look at each in turn, the situation is essentially unchanged. Furthermore, the probability that k of the N papers say "yes" is the same for each 0 <= k <= N.

comment by DanielLC · 2011-11-29T01:06:39.121Z · LW(p) · GW(p)

If you're trying to find out the probability of the next paper, as opposed to the ratio for all the papers, it works fine.

Suppose you have a printer that randomly prints "yes" and "no" at a certain ratio. Laplace's rule of succession would work fine. If someone decided to turn it off after a certain number of papers, that wouldn't change anything until then.

Technically, you have to adjust for the probability that there is no next paper, and the hat is empty.

comment by gwern · 2011-11-28T16:55:48.781Z · LW(p) · GW(p)

This sounds like Bernouilli's urn. If you have N papers/balls, only one of which is Yes, then on every draw, your expectation is 1/N, right? and as you keep drawing, N gets smaller by 1 every turn.

In other words, as we keep drawing without hitting Yes, the odds of hitting Yes keep changing and getting more: 1/N, 1/N-1, 1/N-1-1, 1/N-1-1-1...

But in Laplace's Law, every day that goes by with the sun rising, N gets bigger since here N is the number of days that have passed, not how many days are left to go; the odds that the sun won't rise keep changing and getting less, 1/N, 1/N+1, 1/N+1+1, 1/N+1+1+1...

Unless I am missing something, Laplace's law is not like your papers-in-hat/Bernouilli-urn example.

Replies from: jsteinhardt, damang
comment by jsteinhardt · 2011-11-30T01:00:04.378Z · LW(p) · GW(p)

The difference is that in that case you know the exact number of balls of each type, in this case you do not. The difference between Bernoulli and Laplace is not whether N gets bigger or smaller, but whether the number of balls is known or has to be inferred.

comment by damang · 2011-11-29T11:05:31.024Z · LW(p) · GW(p)

Yes that is exactly the paradox I was having.

(edit):

Actually, Manfred seems to have solved the issue.

comment by HonoreDB · 2011-11-28T16:40:50.636Z · LW(p) · GW(p)

A very similar problem came to me in a dream the night before last. I was just working on it when you posted this. Mostly a coincidence, but with the dice slightly weighted by the zeitgeist, I suppose.

I wrote a story in college about a society that was forced, as a punitive measure after losing a war, to spend some percentage of their lives in suspended animation, thereby ensuring that their civilization would progress slower than everyone else's. A nationalist movement subverted this by secretly ensuring that people dreamed during their suspension.

comment by buybuydandavis · 2011-11-29T03:10:49.531Z · LW(p) · GW(p)

It's been a while, but I'd be surprised if the answer isn't readily apparent from Jaynes' analysis of the problem.

comment by shminux · 2011-11-28T19:21:36.122Z · LW(p) · GW(p)

Unfortunately, the wiki entry on the rule of succession is poorly written.

Simplifying and working backwards are the two standard approaches that can help here.

Simplest possible case: 1 paper, chances of yes are 50% before you start. Next simplest case: 2 papers. 50% for the first one, what are the odds for the second one, given that the first one is yes? Use Bayesian inference to calculate it. Increase the number of papers, repeat a few more times, notice a pattern. Prove the pattern in general or by induction.