The Dice Room, Human Extinction, and Consistency of Bayesian Probability Theory

post by ksvanhorn · 2015-07-28T16:27:03.328Z · LW · GW · Legacy · 16 comments

Contents

16 comments

I'm sure that many of you here have read Quantum Computing Since Democritus. In the chapter on the anthropic principle the author presents the Dice Room scenario as a metaphor for human extinction. The Dice Room scenario is this:

1. You are in a world with a very, very large population (potentially unbounded.)

2. There is a madman who kidnaps 10 people and puts them in a room.

3. The madman rolls two dice. If they come up snake eyes (both ones) then he murders everyone.

4. Otherwise he releases everyone, then goes out and kidnaps 10 times as many people as before, and returns to step 3. 

The question is this: if you are one of the people kidnapped at some point, what is your probability of dying? Assume you don't know how many rounds of kidnappings have preceded yours.

As a metaphor for human extinction, think of the population of this world as being all humans who ever have or ever may live, each batch of kidnap victims as a generation of humanity, and rolling snake eyes as an extinction event.

The book gives two arguments, which are both purported to be examples of Bayesian reasoning:

1. The "proximate risk" argument says that your probability of dying is just the prior probability that the madman rolls snake eyes for your batch of kidnap victims -- 1/36.

2. The "proportion murdered" argument says that about 9/10 of all people who ever go into the Dice Room die, so your probability of dying is about 9/10.

Obviously this is a problem. Different decompositions of a problem should give the same answer, as long as they're based on the same information.

I claim that the "proportion murdered" argument is wrong. Here's why. Let pi(t) be the prior probability that you are in batch t of kidnap victims. The proportion murdered argument relies on the property that pi(t) increases exponentially with t: pi(t+1) = 10 * pi(t). If the madman murders at step t, then your probability of being in batch t is

  pi(t) / SUM(u: 1 <= u <= t: pi(u))

and, if pi(u+1) = 10 * pi(u) for all u < t, then this does indeed work out to about 9/10. But the values pi(t) must sum to 1; thus they cannot increase indefinitely, and in fact it must be that pi(t) -> 0 as t -> infinity. This is where the "proportion murdered" argument falls apart.

For a more detailed analysis, take a look at

http://bayesium.com/doomsday-and-the-dice-room-murders/

This forum has a lot of very smart people who would be well-qualified to comment on that analysis, and I would appreciate hearing your opinions.

16 comments

Comments sorted by top scores.

comment by Shmi (shminux) · 2015-07-28T21:28:13.330Z · LW(p) · GW(p)

Consider an example of a 10-people world ( not counting the madman). What happens there? What happens in a 10^n-people world? what happens in the limit of n->infinity?

Replies from: ksvanhorn
comment by ksvanhorn · 2015-07-29T18:17:15.083Z · LW(p) · GW(p)

An earlier analysis (previous blog post) looks at the case of a finite world population and shows that, using either the "proximate risk" or "proportion murdered" approach, you still get P(you die | kidnapped) = 1/36, because the fact that you are kidnapped is strong evidence that the murderer never murders at all due to running out of people to kidnapped.

comment by Jiro · 2015-07-29T14:29:15.677Z · LW(p) · GW(p)

Go to a casino. Bet $1 on something with a 50% chance of winning. If you win, you have won $1; try again. If you lose, double your bet size (which means that winning will leave you having won $1 total over the sequence of doubled bets) and repeat.

One argument says that in the long run, you will come out a winner, because every bet you make is part of a sequence and at the end of that sequence, you are $1 richer. Another argument says that in the long run, you will only break even, because each bet has a 50% chance of winning and a 50% chance of losing the same amount of money.

Of course, the answer is that you can't increase your bet infinitely, and when you stop increasing your bet, the statistical loss at the point where you stop increasing your bet exactly makes up for the statistical win all the other times you finished the sequence and won $1.

Furthermore, if you could increase your bet infinitely, this problem wouldn't happen, but if you could increase your bet infinitely, the expectation isn't well defined, because you are trying to compute it for a non-converging infinite series.

All this problem is is the same idea applied to probability of death instead of expectation of win. If the madman ever runs out of people, the overall probability depends exactly on what the madman does when he runs out of people (since it's not as well defined as it is for bets). If the madman never runs out of people, the probability involves a non-converging infinite series and so is not well defined.

If this is a metaphor for extinction, then when the madman runs out of people, he keeps rolling the dice on the remaining people until it eventually comes up snake eyes, in which case the chance of extinction is 100%. On the other hand, they can last arbitrarily long given an arbitrarily small probability of extinction.

Replies from: ksvanhorn
comment by ksvanhorn · 2015-07-29T18:11:04.174Z · LW(p) · GW(p)

An earlier version of my analysis (the previous blog post) looked at the case of finite n and found, as you suggest, that the possibility of running out of people to kidnap is an important consideration. You can choose the number of batches n to be so large that it is virtually certain a priori that the madman will eventually murder:

P(eventually murders) = 1 - epsilon for some small epsilon

However, it turns out that conditioning on the fact that you are kidnapped changes the probability dramatically:

P(eventually murders | you are kidnapped) = about 10/9 * 1/36

The reason for this is that there are about 9 times as many people in the final batch as in all other batches combined, so the fact that you are kidnapped is strong evidence that the madman is on his last batch of potential victims.

comment by Slider · 2015-07-28T22:13:14.386Z · LW(p) · GW(p)

The madman murders only almost always. It is possible but vanishingly unlikely that he just never rolls snake eyes (or he runs outside of the total population with the growth so he can't get a full patch). Option 1 doesn't care whether the doom ultimately happens while option 2 assumes that the doom will happen.

The proper enlish version of option two would be "Given that the dice came up snake eyes and that you were kidnapped at some point what is the probabilty that it did so while you were kidnapped?". Notice also that this is independent off what dice readings result in doom. That is if the world is only saved on snake eyes the chance is still "only" 9/10.

Replies from: ksvanhorn
comment by ksvanhorn · 2015-07-29T18:32:39.605Z · LW(p) · GW(p)

Note that

P(you are in batch t | murders batch t & you are kidnapped)

cannot be 9/10 for all t; in fact, this probability must go to 0 in the limit as t -> infinity, regardless of what prior you use.

comment by CronoDAS · 2015-08-01T07:00:48.345Z · LW(p) · GW(p)

I'm reminded of a Bayesian solution to the "two envelopes problem" - if you conclude that you should always switch, you're implicitly assuming that the expected value of each envelope is infinite.

comment by Irgy · 2015-07-30T05:02:10.880Z · LW(p) · GW(p)

To my view, the 1/36 is "obviously" the right answer, what's interesting is exactly how it all went wrong in the other case. I'm honestly not all that enlightened by the argument given here nor in the links. The important question is, how would I recognise this mistake easily in the future? The best I have for the moment is "don't blindly apply a proportion argument" and "be careful when dealing with infinite scenarios even when they're disguised as otherwise". I think the combination of the two was required here, the proportion argument failed because the maths which normally supports it couldn't be used without at some point colliding with the partly-hidden infinity in the problem setup.

I'd be interested in more development of how this relates to anthropic arguments. It does feel like it highlights some of the weaknesses in anthropic arguments. It seems to strongly undermine the doomsday argument in particular. My take on it is that it highlights the folly of the idea that population is endlessly exponentially growing. At some point that has to stop regardless of whether it has yet already, and as soon as you take that into account I suspect the maths behind the argument collapses.

Edit: Just another thought. I tried harder to understand your argument and I'm not convinced it's enough. Have you heard of ignorance priors? They're the prior you use, in fact the prior you need to use, to represent a state of no knowledge about a measurement other than an invariance property which identifies the type of measurement it is. So an ignorance prior for a position is constant, and for a scale is 1/x, and for a probability has been at least argued to be 1/x(1-x). These all have the property that their integral is infinite, but they work because as soon as you add some knowledge and apply Bayes rule the result becomes integrable. These are part of the foundations of Bayesian probability theory. So while I agree with the conclusion, I don't think the argument that the prior is unnormalisable is sufficient proof.

Replies from: ksvanhorn
comment by ksvanhorn · 2015-08-05T16:48:02.881Z · LW(p) · GW(p)

Actually, no, improper priors such as you suggest are not part of the foundations of Bayesian probability theory. It's only legitimate to use an improper prior if the result you get is the limit of the results you get from a sequence of progressively more diffuse priors that tend to the improper prior in the limit. The Marginalization Paradox is an example where just plugging in an improper prior without considering the limiting process leads to an apparent contradiction. My analysis (http://ksvanhorn.com/bayes/Papers/mp.pdf) is that the problem there ultimately stems from non-uniform convergence.

I've had some email discussions with Scott Aaronson, and my conclusion is that the Dice Room scenario really isn't an appropriate metaphor for the question of human extinction. There are no anthropic considerations in the Dice Room, and the existence of a larger population from which the kidnap victims are taken introduces complications that have no counterpart when discussing the human extinction scenario.

You could formalize the human extinction scenario with unrealistic parameters for growth and generational risk as follows:

  • Let n be the number of generations for which humanity survives.

  • The population in each generation is 10 times as large as the previous generation.

  • There is a risk 1/36 of extinction in each generation. Hence, P(n=N+1 | n >= n) = 1/36.

  • You are a randomly chosen individual from the entirety of all humans who will ever exist. Specifically, P(you belong to generation g) = 10^g / N, where N is the sum of 10^t for 1 <= t <= n.

Analyzing this problem, I get

P(extinction occurs in generation t | extinction no earlier than generation t) = 1/36

P(extinction occurs in generation t | you are in generation t) = about 9/10

That's a vast difference depending on whether or not we take into account anthropic considerations.

The Dice Room analogy would be if the madman first rolled the dice until he got snake-eyes, then went out and kidnapped a bunch of people, randomly divided them into n batches, each 10 times larger than the previous, and murdered the last batch. This is a different process than what is described in the book, and results in different answers.

Replies from: Irgy
comment by Irgy · 2015-08-10T07:49:07.323Z · LW(p) · GW(p)

Thanks, interesting reading.

Fundamental or not I think my point still stands that "the prior is infinite so the whole thing's wrong" isn't quite enough of an argument, since you still seem to conclude that improper priors can be used if used carefully enough. A more satisfying argument would be to demonstrate that the 9/10 case can't be made without incorrect use of an improper prior. Though I guess it's still showing where the problem most likely is which is helpful.

As far as being part of the foundations goes, I was just going by the fact that it's in Jaynes, but you clearly know a lot more about this topic than I do. I would be interested to know your answer to the following questions though: "Can a state of ignorance be described without the use of improper priors (or something mathematically equivalent)?", and "Can Bayesian probability be used as the foundation of rational thought without describing states of ignorance?".

On the Doomsday argument, I would only take the Dice Room as a metaphor not a proof of anything, but it does help me realise a couple of things. One is that the setup you describe of a potentially endlessly exponentially growing population is not a reasonable model of reality (irrespective of the parameters themselves). The growth has to stop, or at least converge, at some point, even without a catastrophe.

It's interesting that the answer changes if he rolls the dice first. I think ultimately the different answers to the Dice Room correspond to different ways of handling the infinite population correctly - i.e. taking limits of finite populations. For any finite population there needs to be an answer to "what does he do if he doesn't roll snake-eyes in time?" and different choices, for all that you might expect them to disappear in the limit, lead to different answers.

If the dice having already being rolled is the best analogy for the Doomsday argument then it's making quite particular statements about causality and free will.

comment by DanArmak · 2015-07-28T19:46:30.006Z · LW(p) · GW(p)

Oops, didn't read correctly. Retracting.

comment by Lumifer · 2015-07-28T17:42:59.459Z · LW(p) · GW(p)

The "proportion murdered" argument says that about 9/10 of all people who ever go into the Dice Room die, so your probability of dying is about 9/10.

This is the classic confusion between probability and frequency, see e.g. this

comment by MrMind · 2015-07-29T08:25:51.565Z · LW(p) · GW(p)

Scott has always had a problem grasping Bayesian probability.
When he says that it's a problem to have two different probabilities from the same situation, he doesn't realize that it's a problem for Bayesian only if the two calculations starts from the same prior information. Bayesians have no qualms at all with having two different probabilities for the same situation, provided that they start from different information. On the other hand, it's even more of a problem for frequentists, for whom probabilities are objective physical properties of a situation.

Replies from: ike
comment by ike · 2015-07-29T12:11:18.197Z · LW(p) · GW(p)

Exactly one of you (so far as I know) has proven a theorem extending Aumann's agreement theorem. I wouldn't be so hasty to charge that he doesn't understand basic probability.

Besides, your critique is irrelevant to this scenario, unless you have an argument for why each way of calculating is using different implicit priors.

Replies from: MrMind
comment by MrMind · 2015-07-30T07:27:25.296Z · LW(p) · GW(p)

People have come up with theorems about frequentist probability for almost three centuries and still failed to grasp the Bayesian (Laplacian?) framework.
It is also commendable that you equate Bayesian with basic, but that's not the reality in the average mathematical education. Surely Scott understand basic probability enough, but he is demonstrably not aware of the foundations of probability as extended logic.

My critique surely is irrlevant to the scenario, indeed it was a commentary on the sentence

If you’re a Bayesian, then this kind of seems like a problem

found in the book, which so totally misses the point to be almost backwards.

Replies from: ike
comment by ike · 2015-07-30T11:46:54.418Z · LW(p) · GW(p)

When he says that it's a problem to have two different probabilities from the same situation, he doesn't realize that it's a problem for Bayesian only if the two calculations starts from the same prior information.

It's kind of impossible to prove theorems about when Bayesians should agree without knowing that.

And I don't see the problem with the sentence you quoted, unless you claim that each way encodes different priors (and even so, that would be an answer to the problem, not a reason to say the problem doesn't deserve a response).