Sleeping Beauty Problem Can Be Explained by Perspective Disagreement (II)

post by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-19T23:00:40.027Z · LW · GW · Legacy · 42 comments

Contents

  The 81-Day Experiment(81D):
None
42 comments

This is the second part of my argument. It mainly involves a counter example to SIA and Thirdism.

Different part of the argument can be found here: IIIIIIIV

 

The 81-Day Experiment(81D):

There is a circular corridor connected to 81 rooms with identical doors. At the beginning all rooms have blue walls. A random number R is generated between 1 and 81. Then a painter randomly selects R rooms and paint them red. Beauty would be put into a drug induced sleep lasting 81 day, spending one day in each room. An experimenter would wake her up if the room she currently sleeps in is red and let her sleep through the day if the room is blue. Her memory of each awakening would be wiped at the end of the day. Each time after beauty wakes up she is allowed to exit her room and open some other doors in the corridor to check the colour of those rooms. Now suppose one day after opening 8 random doors she sees 2 red rooms and 6 blue rooms. How should beauty estimate the total number of red rooms(R).

For halfers, waking up in a red room does not give beauty any more information except that R>0. Randomly opening 8 doors means she took a simple random sample of size 8 from a population of 80. In the sample 2 rooms (1/4) are red. Therefore the total number of red rooms(R) can be easily estimated as 1/4 of the 80 rooms plus her own room, 21 in total.

For thirders, beauty's own room is treated differently.As SIA states, finding herself awake is  as if she chose a random room from the 81 rooms and find out it is red. Therefore her room and the other 8 rooms she checked are all in the same sample. This means she has a simple random sample of size 9 from a population of 81. 3 out of 9 rooms in the sample (1/3) are red. The total number of red rooms can be easily estimated as a third of the 81 rooms, 27 in total.

If a bayesian analysis is performed R=21 and R=27 would also be the case with highest credence according to halfers and thirders respectively. It is worth mentioning if an outside Selector randomly chooses 9 rooms and check them, and it just so happens those 9 are the same 9 rooms beauty saw (her own room plus the 8 randomly chosen rooms), the Selector would estimate R=27 and has the highest credence for R=27. Because he and the beauty has the exact same information about the rooms their answer would not change even if they are allowed to communicate. So again, there will be a perspective disagreement according to halfers but not according to thirders. Same as mentioned in part I. 

However, thirder's estimation is very problematic. Because beauty believes the 9 rooms she knows is a fair sample of all 81 rooms (since she used it in statistical estimation), it means red rooms (and blue rooms) are not systematically over- or under-represented. Since beauty is always going to wake up in a red room, she has to conclude the other 8 rooms is not a fair sample. Red rooms have to be systematically underrepresent in those 8 rooms. This means even before beauty decides which doors she wants to open we can already predict with certain confidence that those 8 rooms is going to contains less reds than the average of the 80 suggests. This supernatural predicting power is a strong evidence against SIA and thirding. 

The argument can also be structured this way. Consider the following three statements:

A: The 9 rooms is an unbiased sample of the 81 rooms.

B: Beauty is guaranteed to wake up in a red room

C: The 8 rooms beauty choose is an unbiased sample of the other 80 rooms.

These statements cannot be all true at the same time. Thirders accept A and B meaning they must reject C. In fact they must conclude the 8 rooms she choose would be biased towards blue. This contradicts the fact that the 8 rooms are randomly chosen. 

(EDIT Aug 1. I think the best answer thirders shall give is accept C since it is obviously a simple random sample. Adding another red room to this will make the 9 rooms biased from his perspective. However they can argue if a selector saw the same 9 rooms through random selection then it is unbiased from the selector's perspective. Thirders could argue she must answer from the selector's perspective instead of her own. Main reason being she is undergoing potential memory wipes so her perspective is somewhat "compromised". However, with this explanation thirder must confirm the perspective disagreement between beauty and the selector about whether or not the 9 rooms are biased. It also utilize perspective reasoning followed. In another word perspective disagreement is not unique to halfers and shall not be treated as a weakness.)

It is also easy to see why beauty should not estimate R the same way as the selector does. There are about 260 billion distinct combinations to pick 9 rooms out of 81. The selector has a equal chance to see any one of those 260 billion combinations. Beauty on the other hand could only possibility see a subset of the combinations. If a combination does not contains a red room, beauty would never see it. Furthermore, the more red rooms a combination contains the more awakening it has leading to a greater chance for a beauty to select the said combination. Therefore while the same 9 rooms is a unbiased sample for the selector it is a sample biased towards red for beauty.

(EDIT Aug 1. We can show this another way. Let the selector, halfer beauty and thirder beauty do a large number repeated estimation on the same set of rooms. The selector and halfer's estimations would be concentrated around the true value of R, where as thirders answer would be concentrated on some value larger.)

One might want to argue after the selector learns a beauty has the knowledge of the same 9 rooms he should lower his estimation of R to the same as beauty’s. After all beauty could only know combinations in a subset biased towards red. The selector should also reason his sample is biased towards red. This argument is especially tempting for SSA supporters since if true it means their answer also yields no disagreements. Sadly this notion is wrong, the selector ought to remain his initial estimation. To the selector a beauty knowing the same 9 rooms simply means after waking up in one of the red rooms in his sample, beauty made a particular set of random choices coinciding said sample. It offers him no new information about the other rooms. This point can be made clearer if we look at how people reach to an agreement in an ordinary problem. Which would be shown by another thought experiment in the next part.  

Part III can be found at here

42 comments

Comments sorted by top scores.

comment by cousin_it · 2017-07-20T12:03:02.931Z · LW(p) · GW(p)

(I'm not sure this is right anymore, deleted.)

Replies from: entirelyuseless, Xianda_GAO_duplicate0.5321505782395719
comment by entirelyuseless · 2017-07-20T13:40:57.141Z · LW(p) · GW(p)

I think the OP's argument was that "I am currently awake rather than asleep. So there are likely a lot of red rooms," is analagous to "I currently exist rather than not existing. So there are likely a lot of existing people." The first argument is obviously stupid; so the second argument is probably stupid as well. That seems reasonable to me.

Replies from: scarcegreengrass
comment by scarcegreengrass · 2017-07-25T15:20:29.632Z · LW(p) · GW(p)

I'm not an expert, so i might be misunderstanding, but let me try to come up with a rebuttal.

'Obviously' is a strong word here. I think "I am currently awake rather than asleep. So there are likely a lot of red rooms" is pretty intuitive under these rules. After all, red rooms cause wakefulness.

Here's how i look at it: Imagine there is a city where all the hotels have 81 rooms (and exactly the same prices). Some hotels are almost full and some are almost empty. A travel agency books you a random room, distributed such that you are equally likely to be assigned any vacant room in the city. You are more likely to be assigned a room in one of the almost-empty hotels than in one of the almost-full hotels.

(The statement about existing people is more complicated.)

comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-20T14:05:15.690Z · LW(p) · GW(p)

What you said is correct. I'm arguing because the 8 rooms is obviously an unbiased sample, the 9 rooms cannot be. Which means beauty cannot treat her own room as a randomly selected room from all rooms as SIA suggests. The entire thought experiment is a reductio against thirders in the sleeping beauty problem. It also argues that beauty and the selector, even free to communicate and having identical information, would still disagree on the estimation of R.

Replies from: cousin_it
comment by cousin_it · 2017-07-20T14:33:07.045Z · LW(p) · GW(p)

Can you explain exactly what the supernatural powers are? For simplicity let's assume three rooms, with the number of red rooms uniformly distributed between 1 and 3. After waking up in a red room, and just before opening a random other room, a thirder expects it to be red with probability 2/3. You seem to say that it can't be right, but I can't tell if you consider it too low or too high, and why?

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-20T15:14:47.585Z · LW(p) · GW(p)

Imagine a bag of red and blue beans. You are about to take a random sample by blindly take a handful out of the bag. All else equal, one should expect the fraction of red beans in your hand is going to be the same as its fraction in the bag. Now someone comes along and says based on his calculation you are most likely to have a lower faction of red beans in your hand than in the bag. He is telling you this even before you deciding where in the bag you are going to grab. He is either (a) having supernatural predicting power or (b) wrong in his reasoning.

I think it is safe to say he is wrong in his reasoning.

Replies from: cousin_it
comment by cousin_it · 2017-07-20T15:48:13.587Z · LW(p) · GW(p)

In the example I outlined with three rooms, can you give numerical values for "the fraction in the hand" and "the fraction in the bag"?

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-21T02:16:42.034Z · LW(p) · GW(p)

Sure. Although I want to point out the estimation would be very rough. That is just the nature of statistics with very small sample size.

The "beans in the hand" would be the random other room you open. The "beans in the bag" would be the two other rooms.

Let's say you open another room and found it red. If I'm correct as a thirder you would give R=3 a probability of 3/4, R=2 a probability of 1/4. This can be explained by SIA: this is randomly selecting 2 rooms out of the 3 and they are both red. 3 ways for it to happen if R=3 and 1 way for it to happen if R=1. This is the bayesian analysis.

A statistic estimation of R is also quite easy. As SIA states you have 2 randomly selected rooms out of 3. Both of the rooms are red. Simple statistics suggests unbiased estimate would be all 3 rooms are red, aka R=3. You can also only estimate the number of reds in the other 2 rooms. You have 1 random room chosen out of the 2 and it is red. Simple statistics dictate a fair estimate would be both rooms are red. In this case we have no problem.

Now suppose you open the other room and it's blue. Now as a thirder your probability of R=1 and R=2 are both 1/2. This is explained by SIA: two randomly selected room contains 1 red and 1 blue each. There are two possible ways for it to happen for R=1 and two ways for R=2 as well. This is the bayesian analysis.

Now if we apply a statistical analysis, as SIA dictates 50% of the room in the 2 randomly selected rooms are red. Therefore an unbiased estimation would be R=1.5 which means you are estimating there are 0.5 reds in the other 2 rooms. However, a simple random sample of size 1 is available for those 2 rooms. In that sample (which is just 1 room) there is no reds. So by committing to SIA one has to suggest the simple random sample is not representative of the population it's drawn from. Aka Red is under represented, biased towards blue. To make the matter worse unless the rooms you open are all reds (as in the first case) you would always conclude the room(s) you randomly chosen are biased toward blue. Hence comes the predicting power: you are more likely to choose a random sample that contains more blue than population average suggests. Even before you decide which rooms you want to open.

Replies from: cousin_it
comment by cousin_it · 2017-07-21T11:32:55.861Z · LW(p) · GW(p)

as SIA dictates 50% of the room in the 2 randomly selected rooms are red

That seems to be our main disagreement. By my calculation, upon waking up, a thirder believes that number to be 66.6%. I'm not sure how you get 50% from SIA.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-21T19:02:10.259Z · LW(p) · GW(p)

Maybe it is my English. In this case, you wake up in a red room, and open another room and found it to be blue. As SIA states, you should treat both rooms as they are randomly selected from all rooms. So in the 2 randomly selected rooms 1 is red and 1 is blue. Hence 50%.

Replies from: cousin_it, cousin_it
comment by cousin_it · 2017-07-21T19:39:21.583Z · LW(p) · GW(p)

It seems like you're changing the definition of "fraction in the hand" to also include the room you woke up in, but keep the definition of "fraction in the bag" without that room. So now the "hand" contains a bean that didn't come from the "bag". That ain't gonna work.

Maybe let's stick to your old definitions:

The "beans in the hand" would be the random other room you open. The "beans in the bag" would be the two other rooms.

You said there was a difference between "fraction in the hand" and "fraction in the bag", which was predictable before you grab. But to a thirder, before you grab, the expected values of both fractions are 2/3. Can you explain what difference you saw?

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-21T22:46:16.272Z · LW(p) · GW(p)

Ok, let's slow down. First of all there are two type of analysis going on. One is bayesian analysis which you are focusing on. The other is simple statistics, which I am saying thirders and SIA are having troubles with.

If there are 100 rooms either red or blue. You randomly open 10 of them and saw 8 red and 2 blue. Here you can start a bayesian analysis (with an uniform prior obviously) and construct the pdf. I'm going to skip the calculation and just want to point out R=80 would have the highest probability. Now instead of going the bayesian way you can also just use simple statistics. You have a simple random sample of size 10 with 8 reds. So the unbiased estimation should be 8/10x100=80. You have applied two different ways of reasoning but got the same result, unsurprising since you used uniformed prior in bayesian analysis. So far I hope everything is clear.

Now let's consider SIA. It tells you how to interpret the fact your own room is red. It says you should treat your own room as randomly selected from all rooms and it happens to be red, which is new information. Now if you open another room, then both rooms are randomly selected from all rooms. Thirders bayesian reasoning is consistent with this idea as shown by the calculation in my last reply.

Now apply SIA to statistics. Because it treat both rooms as randomly selected it is a simple random sample, which is a unbiased sample. I am not supporting that, all I'm saying is that's what SIA suggests. The population to this sample, is all the rooms (including your own). Using statistics you can give an estimation of R the same way we gave the estimation of 80 before. Let's call it E1. If thirder think SIA is valid they should stand by this estimation.

But you know you randomly selected a room (from the other 2 rooms). Which is a simple random sample of the other 2 rooms. If it helps, the room(s) you randomly selected are "the beans in hand", all other rooms are "beans in bags". Surely you should expect their fraction of red to be about equal right? Well, as I have calculated in my last reply. If you stand by the above estimation E1, then you would always conclude the rest of the rooms have a higher fraction of red, unless all the room you randomly opened are red of course. Basically you are already concluding the sample is biased towards blue before the selection is made. Or if you prefer, before you grab you already know you are going to say it has lower fraction of red than the bag does.

In essence you cannot take a unbiased sample and divided it into two parts, claiming one part is biased towards red while the other part is unbiased. The other part must be biased towards opposite direction aka blue.

I hope you now see that the probability of 2/3 you calculated is not relevant. It is a probability calculated using bayesian analysis. Not a sample's property or the sample's "fraction" used in statistics. For what it is worth, yes I agree with your calculation. It is the correct number a thirder would give.

Replies from: cousin_it
comment by cousin_it · 2017-07-22T05:31:41.486Z · LW(p) · GW(p)

If Bayes + SIA gives a consistent answer, while "simple statistics" + SIA gives a contradiction, it looks like "simple statistics" is at fault, not SIA.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-22T14:09:37.169Z · LW(p) · GW(p)

Both claims are very bold, both unsubstantiated.

First of all, SIA in bayesian is up to debate. That's the whole point of halfer/thirder disagreement. A "consistent" reasoning is not necessarily correct. Halfers are also consistent.

Second of all, the statistics involved is as basic as it gets. You are saying with a simple random sample of 9 rooms with 3 reds, it is wrong to estimate the population have 30% reds. Yet no argument is given.

Also please take no offence, but I am not going to continue this discussion we are having. All I have been doing is explaining the same points again and again. While the replies I got are short and effortless. I feel this is no longer productive.

Replies from: cousin_it
comment by cousin_it · 2017-07-22T14:24:12.668Z · LW(p) · GW(p)

My replies to you are short, but they weren't simple to write. Each of them took at least 30 minutes of work, condensing the issues in the most clear way. Apologies if that didn't come across. Maybe a longer explanation would help? Here goes:

In the latest reply I tried to hint that many people use "simple statistics" in a way that disagrees with Bayes, and usually they turn out to be wrong in the end. One example is the boy or girl puzzle, which Eliezer mentioned here. Monty Hall variations are another well known example, they lead to many plausible-sounding frequentist intuitions, which are wrong while Bayes is reliably right. After you've faced enough such puzzles, you learn how to respond. Someone tells me, hey, look at this frequentist argument, it gives a weird result! And I reply, sorry, but if you can't capture the weirdness in a Bayesian way, then no sale. If your ad hoc tools are correct, they should translate to the Bayes language easily. If translating is harder than you thought, you should get worried, not confident.

To put it another way, you've been talking about supernatural predictive power. But if it looks supernatural only to non-Bayesians, while Bayesians see nothing wrong, it must be very supernatural indeed! The best way to make sure it's not an illusion is to try explaining the supernaturalness to a Bayesian. That's what I've been asking you to do.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-22T15:46:30.228Z · LW(p) · GW(p)

In both boy or girl puzzle and Monty hall problem the main point is "how" the new information is obtained. Is the mathematician randomly picking a child and mentioning its gender, or is he purposely checking for a boy among his children. Does the host know what's behind the door and always reveal a goat, or does he simple randomly opens a door and it turns out to be a goat. Or in statistic terms: how is the sample drawn. Once that is clear bayesian and statistics gives the same result. Of course if one start from a wrong assumption about the sampling process his conclusion would be wrong. No argument there.

But SIA itself is a statement regarding how the sample is drawn. Why we must only check its merit with bayesian but not stats? And if you are certain the statistic reasoning is wrong then instead of pointing to different probability puzzles why not point out the mistake?

With all these posts you haven't even mention whether you believe the thirder should estimate R=27 or not. While I have been explicitly clear about my positions and dissecting my arguments step by step I feel you are being very vague about yours. This put me into a harder and more labours position to counter argue. That's why I feel this discussion is no longer about sleeping beauty problem but more about who's right and who's better at arguing. That's not productive, and I am leaving it.

Replies from: cousin_it
comment by cousin_it · 2017-07-22T23:36:13.262Z · LW(p) · GW(p)

With all these posts you haven't even mention whether you believe the thirder should estimate R=27 or not.

If by "estimate" you mean "highest credence", the short answer is that Bayesians usually don't use such tools (maximum likelihood, unbiased estimates, etc.) They use plain old expected values instead.

After waking up in a red room and then opening 2 red and 6 blue rooms, a Bayesian thirder will believe the expected value of R to be 321/11, which is a bit over 29. I calculated it directly and then checked with a numerical simulation.

It's easy to explain why the expected value isn't 27 (proportional to the fraction of red in the sample). Consider the case where all 9 rooms seen are red. Should a Bayesian then believe that the expected value of R is 81? No way! That would imply believing R=81 with probability 100%, because any nonzero credence for R<81 would lead to lower expected value. That's way overconfident after seeing only 9 rooms, so the right expected value must be lower. You can try calculating it, it's a nice exercise.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-24T22:29:30.660Z · LW(p) · GW(p)

Appreciate the effort. Especially about the calculation part. I am no expert on coding. But from my limited knowledge on python the calculation looks correct to me. I want to point out for the direct calculation formulation like this+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)+%2F+(sum+(+((r)+choose+3)++((81-r)+choose+6)),+r%3D3+to+75)) gives the same answer. I would say it reflect SIA reasoning more and resemble your code better as well. Basically it shows under SIA beauty should treat her own room the same way as the other 8 rooms.

The part explaining the relationship between expected value and unbiased estimation (maximum likelihood) is obviously correct. Though I wouldn't say it is relevant to the argument.

You claim Bayesian's don't usually uses maximum likelihood or unbiased estimates. I would say that is a mistake. They are important in decision making. However "usually" is a subjective term and argument about how often is "usual" is pointless. The bottom line is they are valid questions to ask and bayesians should have an answer. And how should thirders answer it, that is the question.

Replies from: cousin_it
comment by cousin_it · 2017-07-25T08:52:29.302Z · LW(p) · GW(p)

Mathematically, maximum likelihood and unbiased estimates are well defined, but Bayesians don't expect them to always agree with intuition.

For example, imagine you have a coin whose parameter is known to be between 1/3 and 2/3. After seeing one tails, an unbiased estimate of the coin's parameter is 0 (lower than all possible parameter values) and the maximum likelihood estimate is 1/3 (jumping to extremes after seeing a tiny bit of information). Bayesian expected values don't have such problems.

You can stop kicking the sand castle of frequentism+SIA, it never had strong defenders anyway. Bayes+SIA is the strong inconvenient position you should engage with.

Replies from: Lumifer, Xianda_GAO_duplicate0.5321505782395719, IlyaShpitser
comment by Lumifer · 2017-07-25T15:28:00.484Z · LW(p) · GW(p)

Bayesian expected values don't have such problems.

That's an unfair comparison, since you assume a good prior. Screw up the prior and Bayes can be made to look as silly as you like.

Doing frequentist estimation on the basis of one data point is stupid, of course.

comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-25T14:03:24.739Z · LW(p) · GW(p)

Maximum likelihood is indeed 0 or Tails, assuming we start from a uniform prior. 1/3 is the expected value. Ask yourself this, after seeing a tail what should you guess for the next toss result to have maximum likelihood of being correct?

If halfers reasoning applies to both Bayesian and Frequentist while SIA is only good in Bayesian isn't it quite alarming to say the least?

Replies from: cousin_it
comment by cousin_it · 2017-07-25T14:31:28.436Z · LW(p) · GW(p)

The 0 isn't a prediction of the next coin toss, it's an unbiased estimate of the coin parameter which is guaranteed to lie between 1/3 and 2/3. That's the problem! Depending on the randomness in the sample, an unbiased estimate of unknown parameter X could be smaller or larger than literally all possible values of X. Since in the post you use unbiased estimates and expect them to behave reasonably, I thought this example would be relevant.

Hopefully that makes it clearer why Bayesians wouldn't agree that frequentism+halfism is coherent. They think frequentism is incoherent enough on its own :-)

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-26T01:25:43.621Z · LW(p) · GW(p)

OK, I misunderstood. I interpreted the coin is biased 1/3 to 2/3 but we don't know which side it favours. If we start from uniform (1/2 to H and 1/2 to T), then the maximum likelihood is Tails.

Unless I misunderstood again, you mean there is a coin we want to guess its natural chance (forgive me if I'm misusing terms here). We do know its chance is bounded between 1/3 and 2/3. In this case yes, the statistical estimate is 0 while the maximum likelihood is 1/3. However it is obviously due to the use of a informed prior (that we know it is between 1/3 and 2/3). Hardly a surprise.

Also I want to point out in your previous example you said SIA+frequentist never had any strong defenders. That is not true. Until now in literatures thirding are generally considered to be a better fit for frequentist than halving. Because long run frequency of Tail awakening is twice as many as Head awakenings. Such arguments are used by published academics including Elga. Therefore I would consider my attack from the frequentist angle has some value.

Replies from: cousin_it
comment by cousin_it · 2017-07-26T08:42:42.831Z · LW(p) · GW(p)

Interesting. I guess the right question is, if you insist on a frequentist argument, how simple can you make it? Like I said, I don't expect things like unbiased estimates to behave intuitively. Can you make the argument about long run frequencies only? That would go a long way in convincing me that you found a genuine contradiction.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-26T16:52:26.200Z · LW(p) · GW(p)

Yes, I have given a long run frequency argument for halving in part I. Sadly that part have not gotten any attention. My entire argument is about the importance of perspective disagreement in SBP. This counter argument is actually the less important part.

comment by IlyaShpitser · 2017-07-25T11:29:37.680Z · LW(p) · GW(p)

Sorry slightly confused here, bias (although an F concept, since it relies on "true parameter value") is sort of orthogonal to B vs F.

Estimates based on either B or F techniques could be biased or unbiased.

Quoth famous Bayesian Andrew Gelman:

"I can’t keep track of what all those Bayesians are doing nowadays—unfortunately, all sorts of people are being seduced by the promises of automatic inference through the “magic of MCMC”—but I wish they would all just stop already and get back to doing statistics the way it should be done, back in the old days when a p-value stood for something, when a confidence interval meant what it said, and statistical bias was something to eliminate, not something to embrace."

(http://www.stat.columbia.edu/~gelman/research/published/badbayesmain.pdf)

Replies from: cousin_it
comment by cousin_it · 2017-07-25T12:56:52.401Z · LW(p) · GW(p)

Heh. I'm not a strong advocate of Bayesianism, but when someone says their estimator is unbiased, that doesn't fill me with trust. There are many problems where the unique unbiased estimator is ridiculous (e.g. negative with high probability when the true parameter is always positive, etc.)

Replies from: IlyaShpitser
comment by IlyaShpitser · 2017-07-25T13:07:47.286Z · LW(p) · GW(p)

Sure, unbiasedness is a weak property:

If you throw a dart either one foot to the left or one foot to the right of the bullseye, you are unbiased wrt the bullseye, but this is stupid.

Consistency is a better property.

comment by cousin_it · 2017-07-21T19:19:18.651Z · LW(p) · GW(p)

Lets say you wake up in room 1 which is red, then open room 2 which is blue, and room 3 stays unopened. Are you using {1,2} as a random sample that predicts the frequency of red in {2,3}? How on Earth is that reasonable?

comment by simon · 2017-07-21T01:28:01.337Z · LW(p) · GW(p)

I don't think it's accurate to say that thirders accept A and B. It seems to me that thirders reject A. Indeed the fact that the thirder agrees with the selector in terms of posterior indicates that they must consider the 9 rooms to be biased because they have a different prior so need to consider the sample biased to come up with the same posterior.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-21T20:09:12.507Z · LW(p) · GW(p)

Thirder and the selector have the exact same prior and posteriors. Their bayesian analysis are exactly the same.

Think from the selector's perspective. He randomly opens 9 out of the 81 rooms and found 3 red. Say he decided to perform a bayesian analysis. As stated in the question he starts from an uniform prior and updates it with the 9 rooms as new information. I will skip the calculation but in the end he concluded R=27 has the highest probability. Now think from the thirder's perspective. As SIA states she is treating her own room as randomly selected from all the rooms. Therefore she is treating all of the 9 rooms as randomly selected from the 81 rooms. The new information she has is exactly the same as the selector. Starting from an uniform prior and updates on those new information she would get the same pdf as the selector with R=27 has the highest probability. The two of them must agree just as in the original sleeping beauty problem.

Now suppose instead of doing a bayesian analysis, the selector just want to perform statistical analysis. He wants to get a fair estimate of R. It is clear he has a simple random sample with sample size 9 and 3 of which are red. He can estimate the R of the population as 3/9x81=27. It is unsurprising he got the same number as his bayesian analysis since he started from an uniform prior. Until now, all good.

The problem, however, starts when the thirder wants to use his sample in simple statistics. If he uses SIA reasoning as he did in bayesian analysis he would treat the 9 rooms as a simple random sample the same way as the selector did and give R=27 as the unbiased estimation. By doing so he accepts the 9 rooms as an unbiased sample which leads to the problems I discussed in the main post.

Replies from: simon
comment by simon · 2017-07-23T00:42:06.077Z · LW(p) · GW(p)

We may just be arguing over definintions.

For the priors,. I would consider Beauty's expectations from the problem definition before she takes a look at anything to be a prior, i.e. she expects 81 times higher probability of R=81 than R=1 right from the start.

SIA states that you should expect to be randomly selected from the set of possible observers. That doesn't imply that you are in a postion randomly selected from some other set. (only if observers are randomly selected from that set). Here, observers start in red rooms only, so clearly, you can't expect your room to be randomly selected colour if you believe in SIA.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-24T20:46:28.030Z · LW(p) · GW(p)

For the priors,. I would consider Beauty's expectations from the problem definition before she takes a look at anything to be a prior, i.e. she expects 81 times higher probability of R=81 than R=1 right from the start.

In the original sleeping beauty problem, what is the prior for H according to a thirder? It must be 1/2. In fact saying she expects 2 times higher probability of T than H right from the start means she should conclude P(H)=1/3 before going to sleep on Sunday. That is used as a counter argument by halfers. Thirders are arguing after waking up in the experiment, beauty should update her probability as waking up is new information. T being 2 times more likely than H is a posterior.

If you think thirders should reject A based on your interpretation of SIA, then what is a fair estimation of R according to thirders? Should they use a biased sample of 9 rooms and estimate 27, or estimate 21 and disagree with the selector having the same information?

Replies from: simon
comment by simon · 2017-07-26T01:36:34.663Z · LW(p) · GW(p)

Well argued, you've convinced me that most people would probably define what's prior and what's posterior the way you say. Nonetheless, I don't agree that what's prior and what's posterior should be defined the way you say. I see this sort of info as better thought of as a prior (precisely because waking up shouldn't be thought of as new info) [edit: clarification below]. I don't regard the mere fact that the brain instantiating the mind having this info is physically continuous with an earlier-in-time brain instantiating a mind with different info as sufficient to not make it better thought of as a prior.

Some clarification on my actual beliefs here: I'm not a conventional thirder believing in the conventional SIA. I prefer, let's call it, "instrumental epistemic rationality". I weight observers, not necessarily equally, but according to how much I care about the accuracy of the relevant beliefs of that potential observer. If I care equally about the beliefs of the different potential observers, then this reduces to SIA. But there are many circumstances where one would not care equally, e.g. one is in a simulation and another is not, or one is a Boltzmann brain and another is not.

Now, I generally think that thirdism is correct, because I think that, given the problem definition, for most purposes it's more reasonable to value the correctness of the observers equally in a sleeping beauty type problem. E.g. if Omega is going to bet with each observer, and beauty's future self collects the sum of the earnings of both observers in the case there are two of them, then 1/3 is correct. But if e.g. the first instance of the two observer case is valued at zero, or if for some bizarre reason you care equally about the average of the correctness of the observers in each universe regardless of differences in numbers, then 1/2 is correct.

Now, I'll deal with your last paragraph from my perspective, The first room isn't a sample, it's guaranteed red. If you do regard it as a sample, it's biased in the red direction (maximally) and so should have zero weight. The prior is that the probability of R is proportional to R. The other 8 rooms are an unbiased sample of the remaining rooms. The set of 9 rooms is a biased sample (biased in the red direction) such that it provides the same information as the set of 8 rooms. So use the red-biased prior and the unbiased (out of the remaining rooms after the first room is removed) 8 room sample to get the posterior esimate. This will result in the same answer the selector gets, because you can imagine the selector found a red room first and then break down the selector's information into that first sample and a second unbiased sample of 8 of the remaining rooms.

Edit: I didn't explain my concept of prior v. posterior clearly. To me, it's conceptual not time-based in nature. For a set problem like this, what someone knows from the problem definition, from the point of view of their position in the problem, is the prior. What they then observe leads to the posterior. Here, waking sleeping beauty learns nothing on waking up that she does not know from the problem definition, given that she is waking up in the problem. So her beliefs at this point are the prior. Of course, her beliefs are different from sleeping beauty before she went to sleep, due to the new info. That new info told her she is within the problem, when she wasn't before, so she updated her beliefs to new beliefs which would be a posterior belief outside the context of the problem, but within the context of the problem constitute her prior.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-26T18:49:18.653Z · LW(p) · GW(p)

Very clear argument and many good points. Appreciate the effort.

Regarding your position on thirders vs halfers, I think it is a completely reasonable position and I agree with the analysis about when halfers are correct and when thirders are correct. However to me it seems to treat Sleeping Beauty more as a decision making problem rather than a probability problem. Maybe one's credence without relating consequences is not defined. However that seems counter intuitive to me. Naturally one should have a belief about the situation and her decisions should depend on it as well as her objective (how much beauty cases about other copies) and the payoff structure (is the money reward depends only on her own answer, or all correct answers or accuracy rate etc). If that's the case, there should exist a unique correct answer to the problem.

About how should beauty estimate R and treat the samples, I would say that's the best position for a thirder to take. In fact that's the same position I would take too. If I may reword it slightly, see if you agrees with this version: The 8 rooms is a unbiased sample for beauty, that is too obvious to argue otherwise. Her own room is always red so the 9 rooms is obviously biased for her. However from (an imaginary) selector's perspective if he finds the same 9 rooms it is an unbiased sample. Thirders think she should answer from the selector's perspective, (I think the most likely reason being she is repeatedly memory wiped makes her perspective somewhat "compromised") therefore she would estimate R to be 27. Is this version something you would agree?

In this version I highlighted the disagreement between the selector and beauty, the disagreement is not some numerical value but they disagree on whether a sample is biased. In my 4 posts all I'm trying to do is arguing for the validity and importance of perspective disagreement. If we recognize the existence of this disagreement and let each agent answers from her own perspective we get another system of reasoning different from SIA or SSA. It provides an argument for double halving, give a framework where frequentist and bayesians agrees with each other, reject Doomsday Argument, disagree with Presumptuous Philosopher, and rejects the Simulation Argument. I genuinely think this is the explanation to sleeping beauty problem as well as many problems related to anthropic reasoning. Sadly only the part arguing against thirding gets some attention.

Anyways, I digressed. Bottomline is, though I do no think it is the best position, I feel your argument is reasonable and well thought. I can understand it if people want to take it as their position.

Replies from: simon
comment by simon · 2017-07-27T09:48:55.732Z · LW(p) · GW(p)

Thanks for the kind words.

However, I don't agree. The additional 8 rooms is an unbiased sample of the remaining 80 rooms for beauty. The additional 8 rooms is only an unbiased sample of the full set of 81 rooms for beauty if the first room is also an unbiased sample (but I would not consider it a sample but part of the prior).

Actually I found a better argument against your original anti-thirder argument, regardless of where the prior/posterior line is drawn:

Imagine that the selector happened to encounter a red room first, before checking out the other 8 rooms. At this point in time, the selector's state of knowledge about the rooms, regardless of what you consider prior and what posterior, is in the same position as beauty's after she wakes up. (from the thirder perspective, which I generally agree with in this case). Then they both sample 8 more rooms. The selector considers this an unbiased sample of the remaining 80 rooms. After both have taken this additional sample of 8, they again agree. Since they still agree, beauty must also consider the 8 rooms to be an unbiased sample of the remaining 80 rooms. Beauty's reasoning and the selector's are the same regarding the additional 8 rooms, and Beauty has no more "supernatural predicting power" than the selector.

About only thirding getting the attention: my apologies for contributing to this asymetry. For me, the issue is, I found the perspectivism posts at least initially hard to understand, and since subjectively I feel I already know the correct way to handle this sort of problem, that reduces my motivation to persevere and figure out what you are saying. I'll try to get around to carefully reading them and providing some response eventually (no time right now).

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-27T16:58:15.898Z · LW(p) · GW(p)

Ok, I should have use my words more carefully. We meant the same thing. When I say beauty think the 8 rooms are unbiased sample I meant what I listed as C: It is an unbiased for the other 80 rooms. So yes to what you said, sorry for the confusion. it is obvious because it is a simple random sample chosen from the 80 rooms. So that part there is no disagreement. The disagreement between the two is about whether or not the 9 rooms are an unbiased sample. Beauty as a thirder should not think it is unbiased but bases her estimation on it anyway to answer the question from the selector's perspective. If she does not answer from selector's perspective she would use the 8 rooms to estimate the reds in the other 80 rooms and then add her own room in, as halfers does.

Regarding the selector chooses a room and finds out it is red. Again they agree on whether or not the 8 rooms are unbiased, however because the first room is always red for beauty but not so for the selector they see the 9 rooms differently. From beauty's perspective dividing the 9 rooms into 2 parts and she gets a unbiased sample (8 rooms) and a red room. It is not so for the selector. We can list the three points from the selector's perspective and it poses no problem at all.

A: the 9 room is an unbiased sample for 81 rooms

B: the first room is randomly selected from all rooms

C: the other 8 rooms is an unbiased sample for other 80 rooms.

alternatively we can divid the 9 rooms as follows:

A: the 9 rooms is an unbiased sample for 81 rooms

B: the first red room he saw (if he saw one) is always red

C: the other 8 rooms in the sample is biased towards blue

Either way there is no problem. In terms of the predicting power, think of it this way. Once the selector sees a red room he knows if he ignore it and only consider the other 8 rooms then the sample is biased towards blue, nothing supernatural. However, for beauty if she thinks the 9 rooms are unbiased then the 8 rooms she chooses must be biased even though they are selected at random. Hence the "supernatural". It is meant to point out for beauty the 9 and 8 rooms cannot be unbiased at same time. Since you already acknowledged the 9 rooms is biased (for her perspective at least), then yes she does not have supernatural predicting power of course.

I guess the bottomline is because they acquire their information differently, the selector and thirder beauty must disagree somewhere. Either on the numerical value of estimate, or on if a sample is biased.

About the perspectivism posts. The concept is actually quite simple: each beauty only counts what she experienced/remembered. But I feel maybe I'm not doing a good job explaining it. Anyway, thank you for promising to check it out.

comment by Manfred · 2017-07-20T17:19:04.705Z · LW(p) · GW(p)

Are the rooms biased towards blue? Let's do this with balls from an urn.

Suppose there are four balls in an urn. 1,2,3,or 4 of them can be red, I've chosen the number (R) by a die roll. The rest are blue. We draw the balls from the urn in order.

Conditioned on the first ball being red, do we see a similar effect between the first two and last two balls?

Conditioning on the first ball being red means that it's more likely that my die roll was high - the distribution becomes proportional to R. Then if I see the second ball is blue, the probability distribution over R is proportional to R*(4-R), which is symmetrical about 2. So I expect that about one of the other balls will be red. Which means that the "other observed room" (ball 2) is systematically less red than an unknown room.

So this effect also occurs with drawing balls out of an urn! But what does it mean in this case? Does it mean we made a mistake with our math? No - you can do the experiment yourself easily. Hmm. Let's go back to your three possible desiderata:

A: Are the first two balls an unbiased sample of the balls? No - we conditioned that the first one was red. Even if R=1, we cannot get two blues in the first two balls. That's bias.

Note that when conditioning on the fact that the first ball was red, we used Bayes' rule for conditioning. We didn't assume that someone looked into the urn and pulled out a red ball, instead there was in some sense a fair process that "just happened" to give us a red ball first. But this does not mean that the first two balls are an unbiased sample. The fairness of the hypothetical process that gave us a red ball first does not carry over in any important way once we condition on it giving us a specific result. I think this may be a tricky point.

B: The first ball is guaranteed to be red. Yup.

C: The second ball is an unbiased sample of the other three balls. Yes. Even though it's less red than unknown ball in this example. But on the other hand, if it was red, it would be more red than an unknown ball.

Wait. I think you're not doing the math properly, which means that last statement doesn't hold in your post.

Back to the rooms and Sleeping Beauty. I think you're claiming that no matter how many red rooms S.B. sees, she expects even more, and therefore expects her surroundings to always be magically blue-biased. But if you do things correctly, then if Sleeping Beauty opens up 8 more rooms and they're all red, she doesn't jump straight to a belief that R=81. Instead, she thinks that R is on average somewhere around 75, and also she got a red-biased batch of rooms. But this only works if you keep track of these possibilities for R.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-20T23:53:09.629Z · LW(p) · GW(p)

Very clear argument, thank you for the reply.

The question is if we do not use bayesian reasoning, just use statistics analysis can we still get an unbiased estimation? The answer is of course yes. Using fair sample to estimate population is as standard as it gets. The main argument is of course what is the fair sample. Depending on the answer we get estimation of r=21 or 27 respectively.

SIA states we should treat beauty's own room as a randomly selected from all rooms. By applying this idea in bayesian analysis is how we get thirdism. To oversimplify it: we shall reason as some selector randomly chose a day and find beauty awake, which in itself is a coincidence. However there is no reason for SIA to apply only to bayesian analysis but not statistical analysis. If we use SIA reasoning in statistical analysis, treating her own room as randomly selected from all 81 rooms, then the 9 rooms are all part of a simple random sample, which by definition is unbiased. There is no baye's rule or conditioning involved because here we are not treating it as a probability problem. Beauty's own red room is just a coincidence as in bayesian analysis, it suggest a larger number of reds the same way the other 2 red rooms does.

If one want to argue those 9 rooms are biased, why not use the same logic in a bayesian analysis? Borrowing cousin_it's example. If there are 3 rooms with the number of red rooms uniformly distributed between 1 and 3. If beauty wakes up and open another door and sees another red what should her credence of R=3 be? If I'm not mistaken thirders will say 3/4. Because by randomly selecting 2 room out of 3 and both being red there are 3 ways for R=3 and 1 way for R=2. Here thirders are treating her own room the same way as the second room. And the two rooms are thought to be randomly selected aka unbiased. If one argues the 2 rooms are biased towards red because her own room is red, then the calculation above is no longer valid.

Even if one takes the unlikely position that SIA is only applicable in bayesian but not statistical analysis there are still strange consequences. I might be mistaken but in problems of simple sampling, in general, not considering some round off errors, the statistical estimation would also be the case with highest probability in a bayesian analysis with an uniform prior. By using SIA in a bayesian analysis, we get R=27 as the most likely case. However statistics gives an estimate of R=21. This difference cannot be easily explained.

To answer the last part of your statement. If beauty randomly opens 8 doors and found them all red then she has a sample of pure red. By simple statistics she should give R=81 as the estimation. Halfer and thirders would both agree on that. If they do a bayesian analysis R=81 would also be the case with the highest probability. I'm not sure where 75 comes from I'm assuming by summing the multiples of probability and Rs in the bayesian analysis? But that value does not correspond to the estimation in statistics. Imagine you randomly draw 20 beans from a bag and they are all red, using statistics obviously you are not going to estimate the bag contains 90% red bean.

Replies from: Manfred
comment by Manfred · 2017-07-28T19:46:47.861Z · LW(p) · GW(p)

Sorry for the slow reply.

The 8 rooms are definitely the unbiased sample (of your rooms with one red room subtracted).

I think you are making two mistakes:

First, I think you're too focused on the nice properties of an unbiased sample. You can take an unbiased sample all you want, but if we know information in addition to the sample, our best estimate might not be the average of the sample! Suppose we have two urns, urn A has 10 red balls and 10 blue balls, while urn B has 5 red balls and 15 blue balls. We choose an urn by rolling a die, such that we have a 5/6 chance of choosing urn A and a 1/6 chance of choosing urn B. Then we take a fair, unbiased sample of 4 balls from whatever urn we chose. Suppose we draw out 1 red ball and 3 blue balls. Since this is an unbiased sample, does the process that you are calling "statistical analysis" have to estimate that we were drawing from urn B?

Second, you are trying too hard to make everything about the rooms. It's like someone was doing the problem with two urns from the previous paragraph, but tried to mathematically arrive at the answer only as a function of the number of red balls drawn, without making any reference to the process that causes them to draw from urn A vs. urn B. And they come up with several different ideas about what the function could be, and they call those functions "the Two-Thirds-B-er method" and "the Four-Tenths-B-er method." When really, both methods are incomplete because they fail to take into account what we know about how we picked the urn to draw from.

To answer the last part of your statement. If beauty randomly opens 8 doors and found them all red then she has a sample of pure red. By simple statistics she should give R=81 as the estimation. Halfer and thirders would both agree on that. If they do a bayesian analysis R=81 would also be the case with the highest probability. I'm not sure where 75 comes from I'm assuming by summing the multiples of probability and Rs in the bayesian analysis? But that value does not correspond to the estimation in statistics. Imagine you randomly draw 20 beans from a bag and they are all red, using statistics obviously you are not going to estimate the bag contains 90% red bean.

Think of it like this: if Beauty opens 8 doors and they're all red, and then she goes to open a ninth door, how likely should she think it is to be red? 100%, or something smaller than 100%? For predictions, we use the average of a probability distribution, not just its highest point.

Replies from: Xianda_GAO_duplicate0.5321505782395719
comment by Xianda_GAO_duplicate0.5321505782395719 · 2017-07-29T15:30:04.713Z · LW(p) · GW(p)

No problem, always good to have a discussion with someone serious about the subject matter.

First of all, you are right: statistic estimation and expected value in bayesian analysis are different. But that is not what I'm saying. What I'm saying is in a bayesian analysis with an uninformed prior (uniform) the case with highest probability should be the unbiased statistic estimation (it is not always so because round offs etc).

In the two urns example, I think what you meant is that using the sample of 4 balls a fair estimation would be 5 reds and 15 blues as in the case of B but bayesian analysis would give A as more likely? However this disagreement is due to the use of an informed prior, that you already know we are more likely to draw from A right from the beginning. Without knowing this bayesian would give B as the most likely case, same as statistic estimate.

Think of it like this: if Beauty opens 8 doors and they're all red, and then she goes to open a ninth door, how likely should she think it is to be red? 100%, or something smaller than 100%? For predictions, we use the average of a probability distribution, not just its highest point.

Definitely something smaller than 100%. Just because beauty thinks r=81 is the most likely case doesn't mean she think it is the only case. But that is not what the estimation is about. Maybe this question would be more relevant: If after opening 8 doors and they are all red and beauty have to guess R. what number should she guess (to be most likely correct)?

comment by cousin_it · 2017-07-20T11:54:49.724Z · LW(p) · GW(p)

This supernatural predicting power is a strong evidence against SIA and thirding.

I don't see why.

Let's simplify the problem, remove all the anthropics and amnesia. I paint two balls red or blue randomly. If none are red, the experiment is over. If there's at least one red ball, I choose a red ball randomly and give it to you. Now the other ball is 2x more likely to be blue than red. Is that also supernatural to you?