The Ultimate Sleeping Beauty Problem

scott-garrabrant

The Ultimate Sleeping Beauty Problem

post by Scott Garrabrant · 2013-09-30T00:48:08.395Z · LW · GW · Legacy · 39 comments

39 comments

I got into a heated debate a couple days ago with some of my (math grad student) colleagues about the Sleeping Beauty Problem. Out of this discussion came the following thought experiment:

Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: She will be put to sleep. During the experiment, Beauty will be wakened, interviewed, and put back to sleep with an amnesia-inducing anti-aging drug that makes her forget that awakening. A fair coin will be tossed until it comes up heads to determine which experimental procedure to undertake: if the coin takes n flips to come up heads, Beauty will be wakened and interviewed exactly 3^n times. Any time Sleeping Beauty is wakened and interviewed, she is asked, "What is your subjective probability now that the coin was flipped an even number of times?"

I will defer my analysis to the comments.

39 comments

Comments sorted by top scores.

comment by DSherron · 2013-09-30T04:00:34.297Z · LW(p) · GW(p)

She responds "I'm sorry, but while I am a highly skilled mathematician, I'm actually from an alternate universe which is identical to yours except that in mine 'subjective probability' is the name of a particularly delicious ice cream flavor. Please precisely define what you mean by 'subjective probability', preferably by describing in detail a payoff structure such that my winnings will be maximized by selecting the correct answer to your query."

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T04:05:07.385Z · LW(p) · GW(p)

I agree with that response to the sleeping beauty problem, and the way you set up the payoff structure will probably make this problem equivalent to the St. Petersburg Paradox.

Replies from: Manfred, alexflint

↑ comment by Manfred · 2013-09-30T04:23:14.915Z · LW(p) · GW(p)

It's like the St. Petersburg game except you gain money on even n and lose money on odd n.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T04:25:54.668Z · LW(p) · GW(p)

That's right. What I said above was just stupid. Oops. It is not equivalent at all. I think the two problems are two instances of the same "paradox" though.

↑ comment by Alex Flint (alexflint) · 2013-10-05T03:49:50.141Z · LW(p) · GW(p)

This is unlike the St Petersburg paradox because it involves amnesia, so assigning probabilities arguably forces you to decide on some SIA/SSA-like quandary. But I do agree that making this into a decision problem is the key.

comment by Scott Garrabrant · 2013-09-30T00:57:48.249Z · LW(p) · GW(p)

If you answered 1/2 to the original Sleeping Beauty Problem, the answer to this one should be reasonable to calculate. The probability of exactly n flips is (1/2)^n, so the probability of an even number of flips is (1/2)^2+(1/2)^4+(1/2)^6...=1/3.

If you answered 1/3 to the original Sleeping Beauty Problem, I do not think that there is any sensible answer to this one. I do not however consider this strong evidence that the answer of 1/3 is incorrect for the original problem. This could be an example of evidence for infinite set atheism. Analyzing this problem requires taking as given that the experiment can actually be repeated an arbitrarily large number of times, and we have thus far seen mostly evidence that this is not possible in our universe.

Replies from: orthonormal, CoffeeStain

↑ comment by orthonormal · 2013-09-30T02:11:50.631Z · LW(p) · GW(p)

To expand on this: the thought experiment is a problem for SIA precisely in the sense that the St. Petersburg Paradox is a problem for expected utility. I'm not overly freaked out by either thought experiment.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T03:58:07.132Z · LW(p) · GW(p)

I agree. I am also not very bothered by this paradox. I just thought it was worth pointing out.

↑ comment by CoffeeStain · 2013-09-30T03:54:33.785Z · LW(p) · GW(p)

If you answered 1/3 to the original Sleeping Beauty Problem, I do not think that there is any sensible answer to this one. I do not however consider this strong evidence that the answer of 1/3 is incorrect for the original problem.

To also expand on this: 1/3 is also the answer to the "which odds should I precommit myself to take" question and uses the same math as SIA to yield that result for the original problem. And so it is also undefined which odds one should take in this problem. Precommitting to odds seems less controversial, so we should transplant our indifference to the apparent paradox there to the problem here.

comment by CoffeeStain · 2013-09-30T03:56:35.927Z · LW(p) · GW(p)

The expected value for "number of days lived by Sleeping Beauty" is an infinite series that diverges to infinity. If you think this is okay, then the Ultimate Sleeping Beauty problem isn't badly formed. Otherwise...

comment by JeffJo · 2013-11-01T21:25:39.672Z · LW(p) · GW(p)

Coscott’s original problem is unsolvable by standard means because the expected number of wakings is infinite, so you can’t determine a frequency. That doesn’t mean it is unanswerable – we just need an isomorphism. After informing SB of the procedure and putting her to sleep the first time:

1) Set M=0. 2) Select a number N (I’ll discuss how later). 3) Flip a coin. a. If this (odd-numbered) flip lands heads, wake SB N times and end the experiment. b. If this flip (odd-numbered) lands tails, continue to step 4. 4) Flip the coin again. a. If this (even-numbered) flip lands heads, wake SB 3*N times and end the experiment; b. If this (even-numbered)flip lands tails, set M=M+1 go to step #2.

In Coscott’s version, we start with N=1 and multiply it by 9 each time we choose a new one; that is, N=9^M. But does the answer depend on N in any way? Halfers don’t think the answer depends on the number of wakings at all, and thirders think it depends only on the ratio of wakings in step 3a to those in step 4a, not the specific values.

So I maintain that my problem is the same as coscott’s, except in scale, no matter how we choose N. We can answer the original question by choosing N=1 every time.

There is a 2/3 chance of ending after an odd number of flips, and a 1/3 chance of ending after an even number. A halfer should claim SB gains no new knowledge by being awake, so P(odd|awake)=2/3 and P(even|awake)=1/3. A thirder should say there are four possible situations that awake SB could be in , and she cannot differentiate between them. Since 3 of them correspond to an even number of flips, P(odd|awake)=1/4 and P(even|awake)=3/4.

But like coscott’s, this variation, by itself, sheds no light on the original problem. We can even change numbers to something else:

1) Flip a coin. a. If this (odd-numbered) flip lands heads, wake SB N times and end the experiment. b. If this flip (odd-numbered) lands tails, continue to step 2. 2) Flip the coin again. a. If this (even-numbered) flip lands heads, wake SB M times and end the experiment; b. If this (even-numbered)flip lands tails, go to step #1.

You can decide for yourself whether you think the answer should depend on M and N, but I suspect most people will decide that based on whether they are halfers (“it can’t depend on M and N!”) or thirders (“it must depend on M and N!”), rather than what makes mathematical sense. (I’m not saying they will ignore mathematical, I’m saying they will define it by getting the answer they prefer.)

But as long as we are accepting the possibility of infinite wakings, what happens if we hold N constant and let M approach infinity? Halfers will still say the answers don’t change, thirders will say P(odd)=N/(M+N) and P(even)=M/(M+N).

But is it, or is it not, the same if we hold M constant, at a very large number, and let N approach 0? Because at N=0, P(odd)=0, which it can’t be if the halfers are right.

comment by Andreas_Giger · 2013-10-01T20:40:52.081Z · LW(p) · GW(p)

(1/2 3^0 + 1/8 3^2 + ...) / (1/2 3^0 + 1/4 3^1 + 1/8 * 3^2 + ...)

... which can be transformed into an infinite series with a Cesàro sum of 0.5, so that's my answer.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-10-01T20:51:17.365Z · LW(p) · GW(p)

You are taking a Cesàro sum by treating each possible series of coinflips as a summand. This is arbitrary. If you instead took a series over time, it would not have a Cesàro sum. (i.e. for the partial sum at time t, you only count Beauties that might exist in the first t days)

Replies from: Andreas_Giger

↑ comment by Andreas_Giger · 2013-10-01T21:00:46.915Z · LW(p) · GW(p)

What do you mean by "time" in this case? It sounds like you want to interrupt the interviews at an arbitrary point even though Beauty knows that interviews are quantised in a 3^n fashion.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-10-01T21:28:32.665Z · LW(p) · GW(p)

Yes, but that is a function of of this specific problem, I could have done it in a more continuous fashion if I wanted to. For example, I could have said that the beauty was simulated floor(3^x) times where x is a random real between 0 and n.

I think you are using an arbitrary grouping of the possible outcomes that seems not so arbitrary in this specific problem, but is overall arbitrary.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-10-01T21:55:32.183Z · LW(p) · GW(p)

That is wrong. Sorry. floor(3^x) doesn't work because sqrt(3)<2. Try floor(5^x).

Replies from: Andreas_Giger

↑ comment by Andreas_Giger · 2013-10-01T22:04:54.262Z · LW(p) · GW(p)

I could have said that the beauty was simulated floor(5^x) times where x is a random real between 0 and n

Ah, I see now what you mean. Disregarding this new problem for the moment, you can still formulate my original expression on a per-interview basis, and it will still have the same Cesàro sum because it still diverges in the same manner; it just does so more continuously. If you envision a graph of an isomorphic series of my original expression, it will have "saw teeth" where it alternates between even and odd coin flips, and if you formulate that series on a per-interview basis, those saw teeth just get increasingly longer, which has no impact on the Cesàro sum (because the series alternates between those saw teeth).

Concerning your new problem, it can still be expressed as a series with a Cesàro sum, it's just a lot more complicated. If I were you, I'd first try to find the simplest variant of that problem with the same properties. Still, the fact that this is solvable in an analogous way should be clear, because you can essentially solve the "floor(5^x) times where x is a random real between 0 and n" part with a series for x (similar to the one for the original problem) and then have a series of those series for n. Basically you're adding another dimension (or recursion level), but not doing anything fundamentally different.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-10-01T22:27:56.343Z · LW(p) · GW(p)

I haven't actually done the math yet, but I don't believe you. I think that if your terms are "per interview" then the more recent chunk of 100% even will overpower all the older stuff because there are so many of them, and the series of averages will oscillate.

Replies from: Andreas_Giger

↑ comment by Andreas_Giger · 2013-10-02T00:15:10.130Z · LW(p) · GW(p)

I take it that my approach was not discussed in the heated debate you had? Because it seems a good exercise for grad students.

Also, I don't understand why you think a per interview series would net fundamentally different results than a per coin toss series. I'd be interested in your reports after you (or your colleagues) have done the math.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-10-02T06:01:22.097Z · LW(p) · GW(p)

We did not discuss Cesàro sums.

I have no need for the new continuous question, if you are happy with saying that a per day analysis is no less arbitrary than a coin flip analysis.

The math is proving to be too much work to write up, so ill just tell you why I think there is a difference between per day and per coin flip. In the per coin flip, you take each of the possible coin flip sequences with equal weight when you are taking the averages of the partial sums in the Cesàro sums. In the per day analysis, you are putting much much more weight on the coin flip sequences which have more flips, because there are many more days which include them.

Replies from: Andreas_Giger

↑ comment by Andreas_Giger · 2013-10-02T08:54:59.950Z · LW(p) · GW(p)

The "many more days that include them" is the 3^n part in my expression that is missing from any per day series. This 3^n is the sum of all interviews in that coin flip sequence ("coin flip sequence" = "all the interviews that are done because one coin flip showed up tails", right?) and in the per day (aka per interview) series the exact same sum exists, just as 3^n summands.

In both cases, the weight of the later coin flip sequences increases, because the number of interviews (3^n) increases faster than the probabilistic weight of the coin flip (1/2^n) decreases.

However, this doesn't mean that there exists no Cesàro sum. In fact the existence of such a sum can be proven for my original expression because the quotient of the last two numerators (if we include both odd and even coin flips) of the isomorphic series is always 3:1, regardless of wether the last coin flip was even or odd. (The same thing can be said for the quotient of the last 3^n and 3^(n-1) summands of your series. Basically, the per day series is just a dragged out per coin flip series.)

The reason why my estimation for the Cesàro sum is 0.5 is that if we express that quotient in a way that one coin state is written first, then it alternates between 3:1 and 1:3, which results in 1:1 which is 0.5. Obviously this is not exact maths, but it's a good way for a quick estimation. (Alternatively, you could intuitively infer that if there exists a Cesàro sum it must be 0.5, because whether you look for even or odd coin flips gets increasingly irrelevant as the series approaches infinity.)

Also, since I haven't previously touched upon the subject of the isomorphic series: If we call my original expression f, then we can construct a function g where g(n) = f(n)-f(n-1) with f(-1) = 0, and a series a = g(0) + g(1) + g(2) + ...

Does that all make sense?

comment by drnickbone · 2013-09-30T21:13:56.500Z · LW(p) · GW(p)

You might want to read some papers by Jacob Ross for similar examples, and discussion.

comment by Shmi (shminux) · 2013-09-30T04:44:46.665Z · LW(p) · GW(p)

Note that if she has no way to test whether her calculation is correct, the notion of probability does not make sense in her situation. In other words, what would she do differently if she estimates the probability to be, say, 1/2 instead of, say, 1/3?

Replies from: Luke_A_Somers, mwengler, Scott Garrabrant

↑ comment by Luke_A_Somers · 2013-09-30T13:51:13.927Z · LW(p) · GW(p)

What if after the experiment is over they'll tell her all the coin flip results, and have some bets resolved? Or they could do it each day after she answers. It wouldn't really change the interview, but it would change whether she can test her calculations.

↑ comment by mwengler · 2013-10-02T19:12:33.644Z · LW(p) · GW(p)

If she was told she would be woken up 3^n times if n is even, 0 times if n is odd, then it seems obvious enough that when asked upon being woken up what she thought the probability that n is even, she would rationally and correctly say 100%. And that this would make sense. So Why wouldn't it make sense if the answer is some number other than 100%?

What she would do differently is bet on things she cared about based on the odds. Like "would you rather your relatives are given $5 if the number of coin flips is odd or $3 if the number of coinflips are even?" The answer for a rational beauty would depend on the probability that the number of coin flips is even.

↑ comment by Scott Garrabrant · 2013-09-30T04:56:01.778Z · LW(p) · GW(p)

I do not see how your first and second sentences are related. I might agree with the first sentence, but I do not agree with the second.

For example if I said "After this interview, before I put you to sleep, I will torture you with probability equal to the square of 1 minus the probability you give to the truth." If she estimates 1/2, she will say 1/2. If she estimates 1/3, she will say 1/3.

Edit: I interpreted your question as a rhetorical question implying that she would no do anything differently.

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-09-30T05:56:26.726Z · LW(p) · GW(p)

I should clarify. The universe doesn't ask you for numbers or ideas, it's your behavior that matters. Your example makes sense in a psychotic simulation, of course. Hopefully we are not living in one.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T06:24:03.621Z · LW(p) · GW(p)

Ah, I think I understand your position now. You define "probability" as a measure of anticipation of outcomes (or possibly what her actions say about what outcomes she anticipate). If she is not anticipating learning whether or not the statement is true, then "probability" is not well defined.

Is this correct?

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-09-30T06:30:39.059Z · LW(p) · GW(p)

Well, one cannot always collect enough statistics to use a frequentist approach to check a Bayesian estimate. But one at least should be able to imagine possible worlds and, as you say, estimate measures of various outcomes, and make the best possible decision based on that. In the Sleeping Beauty problem there is no outcome and no decision to make.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T06:41:49.938Z · LW(p) · GW(p)

So is the problem drastically different if after I ask you the interview question, I tell you how many times the coin was flipped? If so, assume that was the original problem.

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-09-30T06:50:30.306Z · LW(p) · GW(p)

How do we decide if your answer is correct? If you have all the power, you might as well just make up a random number between 0 and 1 and call it the answer. That's why the whole SSA vs SIA argument makes little sense.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T07:00:00.296Z · LW(p) · GW(p)

Oops. You are right, what I said didn't make sense. I just edited the above post by changing "I tell you if you are right" to "I tell you how many times the coin was flipped"

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-09-30T07:16:51.621Z · LW(p) · GW(p)

OK, so, we can reformulate the question as "Dear Sleeping Beauty, what odds would you bet on the coin having been flipped an even number of times?", right? At least in this case the correctness of her answer can be explicitly tested by a frequentist simulation.

Replies from: Scott Garrabrant

↑ comment by Scott Garrabrant · 2013-09-30T07:21:53.123Z · LW(p) · GW(p)

Sure, that is fine.

I am still curious though in the case where we do not reformulate the interview to that whether or not you think that the interviewer telling the beauty how many times the coin was flipped afterwords changes the question.

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-09-30T07:37:09.099Z · LW(p) · GW(p)

Well, there are two issues there, one is the divergent weights given to the lower-probability flip sequences (the St. Petersburg paradox), the other is the meaning of the term "subjective probability". Asking for the odds gives a concrete interpretation to the latter. As for the former, you can probably get any answer you want, depending on how you choose to sum the divergent series.

comment by Manfred · 2013-09-30T01:05:07.521Z · LW(p) · GW(p)

50%. Upon finding that the expected physical situation is undefined (a limit that does not converge), sensible agents should default to using a more limited set of information.

EDIT: All right then, if you downvoters are so smart, what would you bet if you were in sleeping beauty's place?

Replies from: DanielLC, CoffeeStain

↑ comment by DanielLC · 2013-09-30T02:10:27.512Z · LW(p) · GW(p)

So if I ask you if you think I can hit immeasurable set A on a dartboard, you'd say 50%. Same with disjoint immeasurable set B. Same with A U B. I now offer to bet with 1:1 odds that you can hit A, can hit B and can't hit A U B. If you hit A, you win the first bet, but lose the second two. If you hit B, you win the second, but lose the other two. If you miss both, you lose all three. No matter what, I get money.

Replies from: Manfred

↑ comment by Manfred · 2013-09-30T04:02:41.613Z · LW(p) · GW(p)

Hm. The simplest way around this is to treat the fact that an immeasurable disjoint set B exists as new information to our agent. E.g. if you just tell me to bet on hitting some immeasurable set A, I'll think the possibilities are just (A) or (not A), and in my state of ignorance will bet at 1:1 odds. But if you then tell me there's some disjoint set B, now the possibilities are (A), (B), (neither). Maximum entropy dictates that I only assign a 1/3 probability to hitting A or B. This handles the dutch book correctly.

If you add knowledge about relationships between a jillion more immeasurable sets, it still produces sensible answers. The biggest trouble I can see is that representing the things we know about relationships between immeasurable sets in this way is tedious.

↑ comment by CoffeeStain · 2013-10-02T21:30:25.409Z · LW(p) · GW(p)

EDIT: All right then, if you downvoters are so smart, what would you bet if you were in sleeping beauty's place?

This is a fair point. Your's is an attempt at a real answer to the problem. Mine and most answers here seem to say something like that the problem is ill-defined, or that the physical situation described by the problem is impossible. But if you were actually Sleeping Beauty waking up with a high prior to trust the information you've been given, what else could you possibly answer?

If you had little reason to trust the information you've been given, the apparent impossibility of your situation would update that belief very strongly.

The Ultimate Sleeping Beauty Problem

Contents

39 comments