binomial variance problem

post by nerfhammer · 2012-04-06T22:59:55.015Z · LW · GW · Legacy · 14 comments

Found in an old Kahneman & Tversky paper:

There are two programs in a high school. Boys are a majority (65%) in program A, and a minority (45%) in program B. There is an equal number of classes in each of the two programs.

You enter a class at random, and observe that 55% of the students are boys. What is your best guess -- does the class belong to program A or to program B?

14 comments

Comments sorted by top scores.

comment by Richard_Kennaway · 2012-04-07T01:11:02.203Z · LW(p) · GW(p)

Um, B, but only by a hair. 55 is equidistant between 45 and 65, but the variance is smaller for A because 65 is farther from 50 than 45 is, so measured by the relevant standard deviations, 55 is closer to 45 than 65. (Making the obviously obvious assumption that children are assigned to classes independent of gender.)

I had to google up the source to find out why the "obvious" answer is supposed to be A.

Replies from: nerfhammer, Protagoras
comment by nerfhammer · 2012-04-08T03:11:59.268Z · LW(p) · GW(p)

What's the name of the principle that variance increases further from 50%?

Replies from: torekp, othercriteria, Richard_Kennaway
comment by torekp · 2012-04-08T14:07:51.085Z · LW(p) · GW(p)

Not having memorized the formula for variance in binomial distributions, but intuiting that said principle was true, was my weaker reason for concluding B.

More saliently, the problem statement contains the gratuitous information that boys are a majority in program A. It's Kahneman and Tversky, for FSM's sake; therefore this information is used to mislead. Therefore, B.

comment by othercriteria · 2012-04-08T14:22:09.159Z · LW(p) · GW(p)

Decreases! Note that there's zero variance when p = 0 versus non-zero variance when p = 0.5.

comment by Richard_Kennaway · 2012-04-08T09:41:29.386Z · LW(p) · GW(p)

No principle, just the fact that the variance of the binomial distribution is p(1-p), which peaks at p=0.5.

comment by Protagoras · 2012-04-08T00:55:40.208Z · LW(p) · GW(p)

It looks like I approached the problem in exactly the same way you did. I'm very curious as to how common it is for people to think A is more likely; it really doesn't seem obvious to me either.

Replies from: nerfhammer
comment by nerfhammer · 2012-04-08T03:10:48.158Z · LW(p) · GW(p)

75% choose program A

comment by othercriteria · 2012-04-06T23:14:19.812Z · LW(p) · GW(p)

Cute problem. And you can probably go a bit further in assessing how good your best guess is by inferring that the class size is at least 20 and lower bounding your variances.

[Or you can be dickish/clever and claim that the problem is underspecified because you're only given the overall boy/girl percentages for the two programs, and not their distribution. E.g., if each class has either exactly 65% or exactly 45% boys, then your observation is consistent with neither of the classes.]

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-04-07T02:28:08.882Z · LW(p) · GW(p)

[Actually you can't be dickish/clever that way: The problem isn't underspecified as the goal is to do the best you can with the information you've got. You've got no information/evidence regarding the distribution between classes so your best bet is to treat it as random. From there you can use Bayes theorem, blah blah, etc. etc....]

Replies from: faul_sname, othercriteria
comment by faul_sname · 2012-04-07T02:55:32.048Z · LW(p) · GW(p)

Or just change the 45% and 65% to 11% and 99%. That makes the correct answer pretty obvious without changing anything important.

comment by othercriteria · 2012-04-08T13:53:33.663Z · LW(p) · GW(p)

Oops, you're right. The variant of the problem I mentioned above got rid of the assumption of binomially distributed boys (equivalently, girls).

The following setup should work, though:

%20\\%0Ap_i%20%7C%20z_i%20=%200%20\sim%20\text{Beta}(a_0,%20b_0)%20\\%0Ap_i%20%7C%20z_i%20=%201%20\sim%20\text{Beta}(a_1,%20b_1)%20\\%0Ax_i%20\sim%20\text{Binomial}(n,%20p_i)%0A)

In words, this says that to generate the i-th class, you flip a coin to tell whether it's in program A or program B, conditioned on the program, the proportion of boys is drawn from a program-specific beta distribution, and then the number of boys is drawn from the corresponding binomial distribution. Under the constraints that %20=%200.65) and %20=%200.45), the average proportion of boys matches up with the problem.

However, by taking or small (where and are adjusted accordingly to maintain the constraint), you can play with the variance so that the observed 55% boys class is more likely under either of the programs. If you had available repeated trials, you might be able to learn and . In a single trial, you can't be sure that your strategy will do worse than chance.

comment by Paul Crowley (ciphergoth) · 2012-04-07T07:53:39.668Z · LW(p) · GW(p)

SPOILER ALERT: solution presented here. Rot-13 would be immensely painful, so I'll just present some facts from which an LW reader can piece together a solution. The probability of drawing 11 blue balls followed by 9 green balls from an urn that's 45% blue is 7.0567033E-7. If the urn is 65% blue it's 6.8969856E-7. The log-likelihood-ratio is 0.099425348 decibans. So for practical purposes the answer is "my best guess is still 50/50" but the posterior probability is really 0.50572313. If you draw 100 rather than 20 it's 0.52858571.

I think even if it were a real-life problem I would have correctly guessed, without doing arithmetic, which class had more evidence; but I was sort of spoilered by knowing that the answer has to be the "counterintuitive" one.

ETA: another way to do the sums is that each boy provides 1.5970084 decibans of evidence for program A, and each girl 1.9629465 for program B.

Replies from: faul_sname
comment by faul_sname · 2012-04-07T08:04:03.863Z · LW(p) · GW(p)

If you look at girls in addition to boys, it's no longer quite so counterintuitive.