A Probability Question

post by JMiller · 2012-12-06T05:29:22.028Z · LW · GW · Legacy · 16 comments

Hi, I am relatively new to this site, I am not sure if this is the right place to be posting.

I am sure many of you are familiar with the following probability riddle:

"Sarah is walking along the street when she encounters a man. With the man is his son. He tells Sarah that he has only one more child at home. She is asked, 'what is the probability that my child is a girl?'"

Since Sarah does not know whether the boy is the elder or younger sibling, she needs to take four possible states into account. The father either had:

1) a boy, then a  girl

2) a girl, then a boy

3) two girls

4) two boys

Since 3 is impossible (Sarah knows there is at least one boy) that leaves three options. Two of those options imply a girl, the other implies a boy. Therefore, she can conclude that her probability estimate must be that it is 66.6% likely that there is a girl at home, and 33.3% likely that there is a boy.

Compare this to George's situation.

"George is walking along the street when he encounters a man. With the man is his son. He tells George that the boy with him is his oldest son, and that he has only one more child at home. He is asked, 'What is the probability that my child at home is a girl?'"

George's probability estimate is clear: either the man had a boy then a girl, or he had two boys. Therefore, it is 50% likely that the child at home is a girl.

My problem is this: I understand probability exists in the mind. The actual answer to the question is 100% one way or the other. Still, it seems like Sarah knows more about the situation, where George, by being given more information, knows less. His estimate is as good as knowing nothing other than the fact that the man has a child which could be equally likely to be a boy or a girl. 

If the reply is something like "Well, Sarah actually knows less so her estimate is less likely to be right" then that is something she could have figured out on her own, and then realized that assigning probability .5 is best anyways. That seems wrong.

I know I must be making a mistake somewhere: why does it seem like George learns less by knowing more?

Thank you for your help.

16 comments

Comments sorted by top scores.

comment by jimrandomh · 2012-12-06T07:02:14.543Z · LW(p) · GW(p)

Since 3 is impossible (Sarah knows there is at least one boy) that leaves three options. Two of those options imply a girl, the other implies a boy. Therefore, she can conclude that her probability estimate must be that it is 66.6% likely that there is a girl at home, and 33.3% likely that there is a boy.

You've taken a piece of information - you observe the man with a son - and half-applied it, by crossing off two daughters as a possibility, but you forgot to update the relative probability of having two boys, since you were more likely to see him with a son if he had two sons than if he had a son and a daughter.

Replies from: JMiller
comment by JMiller · 2012-12-07T03:31:08.109Z · LW(p) · GW(p)

Ah I did not see this post last night. Thanks.

comment by wgd · 2012-12-06T06:19:39.748Z · LW(p) · GW(p)

I'll just note in passing that this puzzle is discussed in this post, so you may find it or the associated comments helpful.

I think the specific issue is that in the first case, you're assuming that each of the three possible orderings yields the same chance of your observation (the son out walking with him is a boy). If you assume that his choice of which child to go walking with is random, then the fact that you see a boy makes the (girl, boy) possibilities each less likely, so together they are equally likely to the (boy, boy) one.

Let's define (imagining, for the sake of simplicity, that Omega descended from the heavens and informed you that the man you are about to meet has two children who can both be classified into ordinary gender categories):

h1 = "Boy then Girl"
h2 = "Girl then Boy"
h3 = "Girl then Girl"
h4 = "Boy then Boy"
o = "The man is out walking with a boy child"

Our initial estimates for each should be 25% before we see any evidence. Then if we make the aforementioned assumption that the man doesn't like one child more than the other:

P(o | h1) = 0.5
P(o | h2) = 0.5
P(o | h3) = 0.0
P(o | h4) = 1.0

And then we can apply bayes theorem to figure out the posterior probability of each hypothesis:

P(h1 | o) = P(h1) * P(o | h1) / P(o)
P(h2 | o) = P(h2) * P(o | h2) / P(o)
P(h3 | o) = P(h3) * P(o | h3) / P(o)
P(h4 | o) = P(h4) * P(o | h4) / P(o)
(where P(o) = P(o | h1)*P(h1) + P(o | h2)*P(h2) + P(o | h3)*P(h3) + P(o | h4)*P(h4))

The denominator is a constant factor which works out to 0.5 (meaning "before making that observation I would have assigned it 50% probability"), and overall the math works out to:

P(h1 | o) = P(h1) * P(o | h1) / 0.5 = 0.25
P(h2 | o) = P(h2) * P(o | h2) / 0.5 = 0.25
P(h3 | o) = P(h3) * P(o | h3) / 0.5 = 0.0
P(h4 | o) = P(h4) * P(o | h4) / 0.5 = 0.5

So the result in the former case is the same as in the latter, seeing one child offers you no information about the gender of the other (unless you assume that the man hates his daughter and never goes walking with her, in which case you get the original 1/3 chance of it being a boy).

The lesson to take away here is the same lesson as the usual bayesian vs frequentist debate, writ very small: if you're getting different answers from the two approaches, it's because the frequentist solution is slipping in unstated assumptions which the bayesian approach forces you to state outright.

Replies from: JMiller
comment by JMiller · 2012-12-06T06:47:20.478Z · LW(p) · GW(p)

Thanks. I see why the probability of H1|o and H2|o need to be taken as 25% each. In that case, it seems like Sarah can say that it is 50% likely a boy and 50% likely a girl (at home). Why is the answer to the question then given as 66%?

Replies from: wgd
comment by wgd · 2012-12-06T06:54:46.448Z · LW(p) · GW(p)

The standard formulation of the problem is such you are the one making the bizarre contortions of conditional probabilities by asking a question. The standard setup has no children with the person you meet, he tells you only that he has two children, and you ask him a question rather than them revealing information. When you ask "Is at least one a boy?", you set up the situation such that the conditional probabilities of various responses are very different.

In this new experimental setup (which is in very real fact a different problem from either of the ones you posed), we end up with the following situation:

h1 = "Boy then Girl"
h2 = "Girl then Boy"
h3 = "Girl then Girl"
h4 = "Boy then Boy"
o = "The man says yes to your question"

With a different set of conditional probabilities:

P(o | h1) = 1.0
P(o | h2) = 1.0
P(o | h3) = 0.0
P(o | h4) = 1.0

And it's relatively clear just from the conditional probabilities why we should expect to get an answer of 1/3 in this case now (because there are three hypotheses consistent with the observation and they all predict it to be equally likely).

Replies from: JMiller
comment by JMiller · 2012-12-06T07:05:40.531Z · LW(p) · GW(p)

That makes a lot of sense, thank you.

comment by JRMayne · 2012-12-06T18:10:59.975Z · LW(p) · GW(p)

wgd is correct as to the logic, but not as to the biology of the problem. In fact, the other kid is more likely than not to be male.

These problem types tend to assume an equal chance of a boy and a girl being born, which is a false assumption. (See: http://www.infoplease.com/ipa/A0005083.html)

I realize this may seem petty, but this is roughly like calculating the chance of picking the three of clubs as a random card from a deck is one in fifty. It's close, but it's wrong. An implicit assumption otherwise seems misguided; it should be made explicit (to make a logic problem rather than a logic and biology problem.)

Replies from: JMiller
comment by JMiller · 2012-12-06T18:27:12.076Z · LW(p) · GW(p)

You are right to point that out. I think that the spirit of the question assumes equal probability of 50% B,G for each birth independent of previous births and statistics in order to make it a probability and logic question, and not one of biology.

comment by pragmatist · 2012-12-06T06:26:54.633Z · LW(p) · GW(p)

EDIT: As wgd points out below, my answer here is wrong in its particulars (I didn't take into account all of the information available to Sarah and George in the puzzle as stated). The general principles invoked are sound, though.

George does actually know more. I think you're getting thrown by the fact that his 50-50 probability distribution seems more equivocal (less concentrated) than Sarah's 66-33 distribution. But remember that these distributions are defined over a space of four elements (boy-boy, boy-girl, girl-boy and girl-girl), so the actual distributions are 0.5-0.5-0-0 for George and 0.33-0.33-0.33-0 for Sarah. When you see it this way, it becomes a bit more plausible that Sarah's distribution is actually more "spread out", more equivocal.

To be more precise, suppose you have been given the task of conveying information about the genders of this man's children. You decide that you will transmit a 0 to represent a boy and a 1 to represent a girl. If the receiver has absolutely no information about the man's children, apart from the fact that there are two of them and neither is genderqueer, you will need to send two bits of information -- one for the elder child's gender, and one for the younger one's -- in order to convey full information about the genders. On the other hand, if the receiver already knows the children's genders, you have to send zero bits of information in order to convey full information. So you can think of the number of bits you need to transmit as a measure of the lack of knowledge of the receiver. The fewer bits you need to send, the more the receiver knows.

Now let's compare George and Sarah. George already knows the elder child's gender, so you only need to send one further bit, representing the younger child's gender, in order to convey full information. Sarah's case is trickier. She knows that one of the children is a boy, but she doesn't know which one. If it turns out that both children are boys, then your task is easy: you need to send just one bit of information, a 0, representing the gender of the child she hasn't seen. Once she gets this bit, Sarah will know the genders of both children. But if the other child is a girl, you can't just say that. You will also need to tell Sarah whether the order of birth is boy-girl or girl-boy. So besides sending a 1 to represent a girl, you'll need to send one more bit of information in order to distinguish between boy-girl and girl-boy. This means that the number of bits you will have to send Sarah is either 1 or 2, depending on whether the other child is a boy or a girl. If you did this experiment over and over again, with a bunch of different groups of siblings, the average number of bits you send Sarah will be greater than 1 but less than 2.

So with George you only need 1 bit to convey full information, while with Sarah you need (on average) more than 1 bit. This means Sarah does indeed know less about the situation, and there is no paradox. All of this can be made a lot more rigorous using the concept of Shannon entropy, if you're interested.

Replies from: wgd, JMiller
comment by wgd · 2012-12-06T06:48:36.704Z · LW(p) · GW(p)

I agree that George definitely does know more information overall, since he can concentrate his probability mass more sharply over the 4 hypotheses being considered, but I'm fairly certain you're wrong when you say that Sarah's distribution is 0.33-0.33-0-0.33. I worked out the math (which I hope I did right or I'll be quite embarassed), and I get 0.25-0.25-0-0.5.

I think your analysis in terms of required message lengths is arguably wrong, because the purpose of the question is to establish the genders of the children and not the order in which they were born. That is, the answer to the question "What gender is the child at home?" can always be communicated in a single bit, and we don't care whether they were born first or second for the purposes of the puzzle. You have to send >1 bit to Sarah only if she actually cares about the order of their births (And specifically, your "1 or 2 bits, depending" result is made by assuming that we don't care about the birth order if they're boys. If we care whether the boy currently out walking is the eldest child regardless of the other child's gender we have to always send Sarah 2 bits).

Another way to look at that result is that when you simply want to ask "What is the probability of a boy or a girl at home?" you are adding up two disjoint ways-the-world-could-be for each case, and this adding operation obscures the difference between Sarah's and George's states of knowledge, leading to them both having the same distribution over that answer.

Replies from: pragmatist
comment by pragmatist · 2012-12-06T07:18:58.358Z · LW(p) · GW(p)

I agree that George definitely does know more information overall, since he can concentrate his probability mass more sharply over the 4 hypotheses being considered, but I'm fairly certain you're wrong when you say that Sarah's distribution is 0.33-0.33-0-0.33. I worked out the math (which I hope I did right or I'll be quite embarassed), and I get 0.25-0.25-0-0.5.

Good point. I was treating the description of Sarah's encounter with the man as a proxy for "Sarah knows one of the man's children is a boy, but not which one." That seems to be the way it's usually intended when the problem is presented, but you're right that in the problem as described, Sarah has an additional relevant piece of information -- that the man is out with a boy. I think this is an unintended artifact of the way the problem is presented, though. The people presenting the problem are usually trying to get at something different. The usual intent of the puzzle is captured by "Sarah knows that one of Brian's two children is a boy, and George knows that his eldest child is a boy. What are the probabilities according to Sarah and George that Brian's other child is a boy?".

I think your analysis in terms of required message lengths is arguably wrong, because the purpose of the question is to establish the genders of the children and not the order in which they were born. That is, the answer to the question "What gender is the child at home?" can always be communicated in a single bit, and we don't care whether they were born first or second for the purposes of the puzzle.

Again, I think this is an unintended artifact of the way the puzzle is stated. The fact that Sarah sees one of the kids and doesn't see the other one gives her a way of individuating the kids other than their birth order. If we don't assume she has this method of individuation (as in the restated puzzle above) then the birth order is relevant.

Replies from: wgd
comment by wgd · 2012-12-06T07:35:31.032Z · LW(p) · GW(p)

I think we're in agreement then, although I've managed to confuse myself by trying to actually do the Shannon entropy math.

In the event we don't care about birth orders we have two relevant hypotheses which need to be distinguished between (boy-girl at 66% and boy-boy at 33%), so the message length would only need to be 0.9 bits#Definition) if I'm applying the math correctly for the entropy of a discrete random variable. So in one somewhat odd sense Sarah would actually know more about the gender than George does.

Which, given that the original post said

Still, it seems like Sarah knows more about the situation, where George, by being given more information, knows less. His estimate is as good as knowing nothing other than the fact that the man has a child which could be equally likely to be a boy or a girl.

may not actually be implausible. Huh.

Replies from: JMiller
comment by JMiller · 2012-12-06T16:35:43.261Z · LW(p) · GW(p)

Pragmatist is correct, I did not realize that the way I stated the problem was different than the original.

I full understand the solution to this problem.

However, lets look at the original problem. John only knows that one of the man's children is a boy:

1) B, G | 0.33

2) G, B | 0.33

3) G, G | 0.00

4) B, B | 0.33

P(B)|(4) = 1 P(G)| (1,2) = 1

P(B)= .33 P(G) = .66

So lets say that now the woman tells John that the boy is also the eldest:

1) B, G | 0.5

2) G, B | 0.0

3) G, G | 0.0

4) B, B | 0.5

P(B)|(4) = 1 P(G)| (1) = 1
P(B)= .5 P(G) = .5

At first I saw a problem because John obviously knows more given the second piece of information, so the fact that his estimate is worse seemed really weird. What I think is going on here is that his learning more really does decrease his ability to predict the gender of the other child: Before, he had 3 options, 2 of which contained a girl-answer. Now, one of those 2 answers are taken away, so he currently has 2 options, 1 of which contains a girl-answer. As he becomes more informed about the total state of the world, his ability to predict this particular piece of information decreases.

Replies from: ChristianKl
comment by ChristianKl · 2012-12-07T18:26:45.261Z · LW(p) · GW(p)

The fact that John predicts 0.5 while Sarah predicts 0.66 doesn't mean that Sarah's prediction is somehow better.

comment by JMiller · 2012-12-06T06:50:54.874Z · LW(p) · GW(p)

Thank you, that is very helpful! If I understand it, according to your analysis, Sarah knows less about the total state of the birth order/ gender of the two children. Still, it seems like she knows more about the particular gender of the child at home.

Is that still a problem?

Replies from: Viliam_Bur
comment by Viliam_Bur · 2012-12-06T12:59:58.348Z · LW(p) · GW(p)

I guess the problem is with the "knows more" words. It's not just how many bits of information you get, but also how are they related to your question. As a trivial example, it would be better to have 1 relevant bit of information than 1024 bits of irrelevant information. In this example, all information is relevant, but differently.

Imagine the following situation: You have letters "A", "B", "C", "D" and you randomly choose one of them.

You have two participants in the experiment. To the first participant you tell that you did not choose "A". To the second participant you tell that you did not choose "B". Each of them has the same amount of information, right?

Then you ask them whether the letter you chose was a consonant. The first one says "Certainly yes." The second one says "I am not sure, but with probability 66% yes."

How is it possible that the same amount of information gives them different certainty? The answer is, the same amount of information in general is not necessarily the same amount of information about the question you gave them.