The Doomsday Argument and Self-Sampling Assumption are wrong, but induction is alive and well.
post by RonPisaturo · 2011-08-14T18:15:46.747Z · LW · GW · Legacy · 8 commentsContents
8 comments
Since the Doomsday Argument still is discussed often on Less Wrong, I would like to call attention to my new, short, self-published e-book, The Longevity Argument, which is a much-revised and much-expanded work that began with my paper, “Past Longevity as Evidence for the Future,” in the January 2009 issue of Philosophy of Science. In my judgment, my work provides a definitive refutation of the Doomsday Argument, identifying two elementary errors in the argument.
The first elementary error is that the Doomsday Argument conflates total duration and future duration. Although the Doomsday Argument’s Bayesian formalism is stated in terms of total duration, all attempted real-life applications of the argument—with one exception, a derivation by Gott (1994, 108) of his delta t argument introduced in Gott 1993—actually plug in prior probabilities for future duration.
For example, Leslie (1996, 198–200) presents a Bayesian equation stated in terms of prior probabilities of total instances. But then Leslie (1996, 201–203) plugs into this equation prior probabilities for future instances: humans being born for the next 150 years vs. humans being born for the next many thousands of centuries. Bostrom (2002, 94–96) recounts Leslie’s general argument in terms of births instead of durations of time, using 200 billion total births vs. 200 trillion total births. (A closer parallel to Leslie 1996 would be 80 billion total births vs. 80 trillion total births.) But the error persists: the actual prior probabilities that are plugged in to Leslie’s Bayesian equation, based on all of the real-life risks actually considered by Leslie (1996, 1–153) and Bostrom (2002, 95), are of future births, not total births.
In other words, Leslie supposes a prior probability of doom within the next 150 years or roughly 20 billion births. (The prior probabilities supposed in the Doomsday Argument are prior to knowledge of one’s birth rank.) Leslie then assumes that—since there have already been, say, 60 billion births—this prior probability is equal to the prior probability that the total number of births will have been 80 billion births. However, in the absence of knowledge of one’s birth rank, this assumption is absurd.
The second elementary error is the Doomsday Argument’s use of the Self-Sampling Assumption, which is contradicted by the prior information in all attempts at real-life applications in the literature.
For example, many risks to the human race—including most if not all the real-life risks discussed by Leslie and Bostrom—can reasonably be described mathematically as Poisson processes. Then the Self-Sampling Assumption implies that the risk per birth—the ‘lambda’ in the Poisson formula—is constant throughout the duration of the human race. But Leslie (1996, 202) also supposes that if mankind survives past the next century and a half, then the risk per birth will drop dramatically, because mankind will begin spreading throughout the galaxy. (The Doomsday Argument implicitly relies on such a drop in lambda—and the resultant bifurcation of risk into ‘doom soon’ and ‘doom very much later’—for the argument’s significant claims.) In other words, Leslie’s prior probabilities of doom are mathematical contradictions of the Self-Sampling Assumption that Leslie and Bostrom invoke in the Doomsday Argument.
In my book, I perform Bayesian analyses that correct these errors. These analyses demonstrate that gaining more knowledge of the past can indeed update one’s assessment of the future; but this updating is consistent with common sense instead of with the Doomsday Argument. In short, while refuting the Doomsday Argument, I vindicate induction.
The price of my e-book is $4. However, professional scholars and educators are invited to email me to request a complimentary evaluation copy (not for further distribution, of course). I extend the same offer to the first ten Less Wrong members with a Karma Score of 100 or greater who email me. (I may send to more than ten, or to some with lower Karma Scores, but I don’t want to make an open-ended commitment.)
For an abstract of the e-book, see this entry on PhilPapers. For a non-technical introduction, see here on my blog.
The e-book covers much more than the Doomsday Argument; here is a one-sentence summary: The Doomsday Argument, Self-Sampling Assumption, and Self-Indication Assumption are wrong; Gott’s delta t argument (Gott 1993, 315–316; 1994) underestimates longevity, providing lower bounds on probabilities of longevity, and is equivalent to Laplace’s Rule of Succession (Laplace 1812, xii–xiii; [1825] 1995, 10–11); but Non-Parametric Predictive Inference based on the work of Hill (1968, 1988, 1993) and Coolen (1998, 2006) forms the basis of a calculus of induction.
References
Bostrom, Nick (2002), Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York & London: Routledge.
Coolen, Frank P.A. (1998), “Low Structure Imprecise Predictive Inference For Bayes' Problem”, Statistics & Probability Letters 36: 349–357.
——— (2006), On Probabilistic Safety Assessment in the Case of Zero Failures. Journal of Risk and Reliability 220 (Proceedings of the Institute of Mechanical Engineers O): 105–114.
Gott, J. Richard III (1993), “Implications of the Copernican Principle for our Future Prospects”, Nature 363: 315–319.
——— (1994), “Future Prospects Discussed”, Nature 368: 108.
Hill, Bruce M. (1968), “Posterior Distribution of Percentiles: Bayes' Theorem for Sampling from a Population”, Journal of the American Statistical Association 63: 677–691.
——— (1988), “De Finetti’s Theorem, Induction, and A(n) or Bayesian Nonparametric Predictive Inference”, Bayesian Statistics 3, Edited by Bernardo J.M., DeGroot, M.H., Lindley, D.V. & Smith A.F.M. Oxford: Oxford University Press: 211–241.
——— (1993), “Parametric Models for An: Splitting Processes and Mixtures”, Journal of the Royal Statistical Society B 55: 423–433.
Laplace, Pierre-Simon (1812), Theorie Analytique des Probabilités. Paris: Courcier.
——— ([1825] 1995), Philosophical Essay on Probabilities. Translated by Andrew I. Dale. Originally published as Essai philosophique sur les probabilite´s (Paris: Bachelier). New York: Springer-Verlag.
Leslie, John (1996), The End of the World: The Science and Ethics of Human Extinction. London: Routledge.
Here is and Addendum addressing the question by Manfred to elaborate on my statement, "the Self-Sampling Assumption implies that the risk per birth—the ‘lambda’ in the Poisson formula—is constant throughout the duration of the human race."
To avoid integrals, let me discuss a binomial process, which is a discrete version of a Poisson process.
Suppose you are studying a species from another planet. Suppose the only main risk to the species is an asteroid hitting the planet. Suppose the risk of an asteroid hit in a year is q. Given that the present moment is within a window (from the past through to the future) of N years without an asteroid hit, what is the probability P(Y) that the present moment is within year Y of that window?
P(Y) = [q(1 – q)Y(1 – q)N–Yq]/B, where B is the probability that the window is N years.
P(Y) = [q2(1 – q)N]/B.
Since Y does not appear in this formula, it is clear that P(Y) is constant for all Y. That is, since q is constant, P(Y) is uniform in [1, N], and P(Y) = 1/N. This result is equivalent to the Self-Sampling Assumption with units of time (years) as the reference class.
But suppose that the risk of an asteroid hit in the past was q, but the species has just built an asteroid destroyer, and the risk in the future is r where r << q. Then
P(Y) = [q(1 – q)Y(1 – r)N–Yr]/B.
[8/16/2011: Corrected the final 'r' in the above equation from a 'q'.] Y does appear in this formula. Clearly, the greater the value of Y, the smaller the value of P(Y). That is, contrary to the Self-Sampling Assumption, it is very likely that the present moment is in the early part of the window of N years.
The above argument demonstrates why the choice of ‘reference class’ matters. If the risk is constant per unit time, then the correct reference class is units of time. If the risk is constant per birth, then the correct reference class is births. Suppose birth rates increase exponentially. Then constant risk per unit time precludes constant risk per birth, and vice versa. The two reference classes cannot both be right. More generally, if the prior information stipulates that risk per birth is not constant, then the Self-Sampling Assumption using a reference class of births does not apply.
This passage is from my book (p. 59):
Here is a more philosophical and less mathematical perspective on the same point. SSA [the Self-Sampling Assumption] rests on the premise that all indexical information has been removed from the prior information. One's birth rank, which applies only to oneself, is such indexical information that is removed from the prior information before SSA is invoked. But even in the absence of birth rank, the prior information may—and usually does—include information that is indexical. For example, if the prior information states that λpast is large and λfuture is small, then the prior information is stating something that is true only of the present—namely, that the present is when λ changes abruptly from a large value to a small value. It turns out that this indexical information contradicts the mathematical conclusion of SSA. Moreover, this indexical information cannot be removed without consequence from the prior information, because the prior probabilities rest on it.
Perhaps the statement that Manfred quotes would have been clearer if I had instead written the following: The Self-Sampling Assumption implies that the risk per birth—the ‘lambda’ in the Poisson formula—is constant throughout the past and present.
8 comments
Comments sorted by top scores.
comment by Manfred · 2011-08-14T22:47:35.007Z · LW(p) · GW(p)
the Self-Sampling Assumption implies that the risk per birth—the ‘lambda’ in the Poisson formula—is constant throughout the duration of the human race
This sentence makes me think you may have flaws in your reasoning. Could you provide a more in-depth outline of your argument?
Replies from: RonPisaturo↑ comment by RonPisaturo · 2011-08-15T04:07:54.701Z · LW(p) · GW(p)
I do not know how to get nice formatting into a comment, so I will try to address your question in an addendum to my original post.
Replies from: Manfred↑ comment by Manfred · 2011-08-15T16:51:04.186Z · LW(p) · GW(p)
Thanks.
It looks like your reasoning is incorrect. What your equations are really saying is "you're more likely to live to year N if you build safety systems earlier." That is, your year Y isn't (just) the "present moment," it's "the year you build an asteroid deflector." However, you do not show that, given that our existential risk can decrease dramatically in the future, we should expect to have a long future ahead of us.
Also:
The above argument demonstrates why the choice of ‘reference class’ matters. If the risk is constant per unit time, then the correct reference class is units of time. If the risk is constant per birth, then the correct reference class is births.
The risk isn't what matters to reference class. The reference class does refer to some class with constant probability. But that probability is not the probability of existential risk. It is the probability that I am in some state, given some information about me. Unless that information is "I am about to die from an existential threat," these probabilities are not the same and so the existential risk will not be constant over the reference class for you.
Replies from: RonPisaturo↑ comment by RonPisaturo · 2011-08-16T20:30:37.629Z · LW(p) · GW(p)
Is your first objection that, in my scenario, the decrease in lambda occurs in the present year, while Leslie assumes that the decrease in lambda will not occur until 150 years from now? That’s a fair issue to raise. In my book, I work through numerical examples in detail (using Poisson processes instead of binomial processes), including an example using plausible numbers based on Leslie’s own scenario, and I also identify more general mathematical formulas. But I will try to defend the basic ideas here.
Suppose I amend my scenario as follows: the asteroid destroyer will not be completed until 150 years from now.
Recall that the Doomsday Argument (DA) invokes the Self-Sampling Assumption (SSA) twice: once for a small N and once for a much larger N. Since the case of the larger N is simpler to deal with, I will consider that case first.
For the larger N, change the exponents in the last equation in my addendum from Y and N–Y to Y+150 and N–Y–150, respectively when N >= Y+150; when N < Y+150, also change each 'r' to a 'q'. (Note also that today I corrected the last equation in my addendum, changing the final ‘q’ to an ‘r’) The same conclusion holds as before: given this very large N, it is very likely that we are near the beginning of the window of safety, contradicting SSA.
The case for the smaller N is more complicated, and depends on the relative values of q, r, N, and 150. In general, SSA will be much less wrong (no pun intended) for the small N. But recall that DA takes the ratio of the two applications of SSA. The fact that the application for large N is much more wrong than the application for small N makes the ratio very wrong.
The risk isn't what matters to reference class.
I think I have shown that the risk is what determines the correct choice of reference class. But I do not claim to be the only one to have identified this fact. Gott (1994) makes a similar point. Willard Wells (Apocalypse When?: Calculating How Long the Human Race Will Survive. Berlin: Springer Praxis, 2009, p.37) makes the same point when he writes, “When bias nullifies the principle of statistical indifference, look for a different variable that spreads the risk evenly and thereby restores the principle.”
The reference class does refer to some class with constant probability.
I take it you mean that if there are N members in the reference class, then the Self-Sampling Assumption asserts that the probability P that the present member has rank Y is 1/N for all Y in [1,N].
But that probability is not the probability of existential risk.
Do you think I'm claiming that the existential risk q per unit in the reference class is equal to 1/N? Of course I am not claiming that; indeed, such a claim would be meaningless, since there is an entire range of possible values for N given any particular q. But I have shown that, for a binomial or Poisson process, constant q implies constant P, and vice versa. Therefore, if you invoke the Self-Sampling Assumption for a reference class for which q is not constant, you are asserting a contradiction. Conversely, if you want to use a uniform-distribution assumption akin to the Self-Sampling Assumption, it is the assessment of the risk that should determine the choice of reference class. (I cover the issue of the reference class in much depth in my book.) The failure to understand this fact is, in my judgment, the reason that Leslie and Bostrom have not been able to solve the problem of the reference class.
Replies from: Manfred↑ comment by Manfred · 2011-08-18T03:58:42.704Z · LW(p) · GW(p)
Is your first objection that, in my scenario, the decrease in lambda occurs in the present year, while Leslie assumes that the decrease in lambda will not occur until 150 years from now?
No. The trouble is that your argument about "greater when Y is closer to the beginning" hinges on imagining varying the value of Y - moving it around. Currently when you move around the year Y, you are moving not just the present year but also the year when asteroid defense is built. What I would like to see (among other things) is you moving around the present year without moving around when asteroid defense is built.
I appear to be using a different meaning of "self-sampling assumption" than you. Rather than worrying about it, do we agree that when used correctly it's just an expression of underlying statistics? Then we could just talk in terms of statistics and not have to worry about the definition.
Replies from: RonPisaturo↑ comment by RonPisaturo · 2011-08-18T20:32:34.937Z · LW(p) · GW(p)
Your comment touches on the crux of the matter.
Of course, what is moving and what is fixed depends on the point of reference. In my analysis, I take the present as the fixed point of reference. When I vary the unknown Y, I am varying the unknown number of years ago when the last asteroid strike occurred. The time when the asteroid destroyer is built remains fixed at 150 years after the present.
Keep in mind the first error I noted in my post. Leslie starts with prior information and prior probabilities about future births, not total births. Leslie assumes that mankind will be able to colonize the galaxy 150 years--equivalently, roughly 20 billion births--from now, regardless of how many unknown births have already occurred.
What I would like to see (among other things) is ...
I am new to Less Wrong, and so I'm not as familiar as you are with how things are done on this site. Manfred, I do appreciate your comments and your interest in my thesis. But I think that, at some point, scholarship demands that you turn to original sources, which in this case include my e-book. If I thought I could give a complete argument in a discussion post, I would not have written a whole book. Rather than my trying to recapitulate the book in a haphazard order and with impromptu formulations based on your particular questions, don't you think it would be more productive for us to refer to the book? Or at the least, you could refer to my published paper (upon which the book is based), which is part of the peer-reviewed literature on the subject. The book is clearer than the paper though, and I have already offered the book to LWers such as you for free. (Only one LWer has taken me up on the offer.) The book is only 22,000 words long, and I think you would have little trouble homing in on the sections and formulations that interest you most.
I appear to be using a different meaning of "self-sampling assumption" than you. Rather than worrying about it, do we agree that when used correctly it's just an expression of underlying statistics?
If it were used ‘correctly’, yes, SSA would just be one perspective on the prior information. But I know of no real-life applications of the Doomsday Argument in which SSA has been used ‘correctly’.
Added 8/19/2011: On second thought, I really do not know what you mean by this statement without your providing context. I think SSA is wrong, at least as it has been used in the Doomsday Argument; I don't know what it would mean to use correctly something that is wrong. To be as charitable as possible, I could say that if the mathematical formulas implied by SSA happened to match up with the prior information in a given case (and I have never seen such a case related to DA), then SSA would just be one perspective on the prior information.
comment by DanielLC · 2011-08-15T02:44:19.305Z · LW(p) · GW(p)
I've made a post about a Bayesian doomsday argument.
The first elementary error is that the Doomsday Argument conflates total duration and future duration.
I don't fully understand the problem. In mine, I did everything with total duration, and used the fact that we know we've been here this long to update on the total not being less than the current.
Then I noticed that there being other planets actually makes a difference, and I can find the average of the totals for different planets, but it can fall below our current value. The actual value depends on a probability distribution I'm not sure how to find, but I think the difference will be around lasting one or two orders of magnitude longer.
The second elementary error is the Doomsday Argument’s use of the Self-Sampling Assumption, which is contradicted by the prior information in all attempts at real-life applications in the literature.
I'm not sure what's going on here. My assumption is that we don't fully understand what the dangers are, and thus have to rely on our priors. To the extent that we haven't processed the evidence, its expected value will match our priors in accordance with conservation of expected evidence.
Replies from: RonPisaturo↑ comment by RonPisaturo · 2011-08-15T16:41:22.222Z · LW(p) · GW(p)
Your use of the Jeffreys prior--P(T=n) ∝ 1/n--is the exception I mention: Gott (1994, 108) uses the Jeffreys prior.