Anthropic Probabilities: a different approachpost by sil ver (sil-ver) · 2018-07-20T13:02:29.349Z · score: 5 (4 votes) · LW · GW · 35 comments
Introduction Diving in: Existential Risk A complete Model of Randomness Logical Uncertainty Re: updating Random Probability Tying up loose Ends Reference Classes Foundations of Randomness Preservation of Expected Evidence Problems with existing theories Summary An Example None 35 comments
Short Summary: One way to determine the probability for a proposition upon making an observation is to look at other instances of the same observation being made and whether the proposition was true there. Formalizing this leads to a principle that is applicable to some but not all problems, which suggests that there is a fundamental difference between both classes, and that the latter must be treated differently. The result is a theory that has the advantages of both SIA/FNC and SSA: it rejects the doomsday argument, doesn't care about reference classes, doesn't give answers that are exploitable through bets, and also doesn't claim that a trillion times larger universe must be a trillion times more likely.
Neither SSA nor SIA or FNA give answers I accept in every problem. Moreover, all of them are indifferent about aspects that seem to me to be crucial. This post outlines my suggestion for how we should think about and do calculations involving anthropic probability. I arrive at a theory which is formalized enough to be applicable to every thought experiment I know, and which outputs answers that agree with my intuition in every case.
Diving in: Existential Risk
Consider the classical anthropic question: humanity created nuclear weapons, which seem pretty dangerous, but so far they haven't killed us. Is that reason to update downward on the probability that they are exceedingly dangerous? To start things off, we'll look at the problem more generally. Let stand for any hypothesis with prior , and for any observation. Applying Bayes and the law of total probability, we get the following:
First, observe that if , then this simply equals , as you would expect. (We can't also have as that would imply which we can assume to be false since we conditioned on to begin with.)
Second, if , we can define
Equivalently, . Plugging this into the formula yields
By assumption, we know the value of . Thus, the only other thing we need in order to compute is the value of , that is, the probability of observing when is true relative to the probability of observing when is false.
How does one obtain that value? The classical way is to consider the case where is true, take the probability that we make the observation in that case, and assign that probability to ; then consider the case where is false, take the probability that we make the observation in that case, and assign that probability to . Then divide the first by the second.
Problem: if [many worlds is true and you share consciousness with all of your copies] (in short: if is true), and if is something like "I observe that nukes haven't destroyed humanity in the past 40 years", then , and . No update takes place. But this is widely inaccurate: if some species are more responsible with their nukes than others, then more of the responsible species will still be around after seventy years, so the fact that you are still alive is most definitely evidence to consider. The classical approach is insufficient.
Another way to proceed is to count the number of instances where has been made before, and check in how many of them has been true and false, respectively; then once again divide the first by the second.
Although it is more novel, I consider the second approach to be the more fundamental one. The idea is that, if one knows exactly how often is true upon observing (we shall formalize this later), then one must use that proportion as the probability to assign to . This is a strictly weaker claim than the one SIA is making, as it does not make any statements about cases where the assumption is false, which is quite often. In this weaker form, I consider the requirement to be axiomatic: if probability is to mean anything, then a probability of put on a proposition has to mean that this proposition is true half of the time this happens, else the probability isn't correct (again, given that the above assumption holds). In general, however, the classical approach does not meet this criterion.
Before we focus on cases where the assumption is false, let's look at what happens if it is given. Our initial example becomes such a case once we obtain further information. Thus, assume we have somehow been handed all relevant statistics about extinction rates of species with nuclear weapons in the universe, but we haven't been told where humanity falls into that spectrum. Let "I observe that the species I am a part of has survived 70 years of nuclear weapons", and "the species I am a part of is relatively safe". Now, let's examine what we would conclude depending on the data. We look at how often is made at different "places" in the universe and take that as information about which place we are in. In this case, the places correspond to being a member of a safe or unsafe species, respectively.
It might be that we live in a universe in which half of all species that develop nuclear weapons are relatively safe, and the other half is relatively unsafe (more precisely, the number of observers in them is the same at the point that nukes are first invented). In this case, warrants a seizable update of , because it is made more frequently by members of species which are relatively safe than by those which are relatively unsafe (since the latter tend to go extinct before making ). For example, if the odds for surviving the first seventy years are 0.01 for relatively unsafe species and 0.9 for relatively safe species, then is made by members of species which are relatively safe 90 times as often as it is made by members of species which are relatively unsafe, so the correct probability to assign to is . Observe that this does not depend on consciousness or many worlds, it holds equally regardless of whether (a classical universe in which personal death implies the permanent end of subjective experience), or , or neither, is true.
Given this value, we have and so
Of course, it might also be that we live in a different universe. For example, it might be that we live in a universe in which all technologically advancing species are relatively safe. If that's the case, the value of is ; now it is independent of the details of , it wouldn't matter if we had phrased it in terms of one year rather than fifty. Similarly, it might be that we live in a universe in which all technologically advancing species are unsafe, in which case the answer is always 0. And it might be that we live in a universe where some species are safe and some are unsafe, but with a different distribution, in which case the specifics of do matter.
So far, so good. Now let's formalize this approach.
A complete Model of Randomness
Forget everything you know about how probability is classically computed. Instead, begin by considering the space , where is the set of all possible observations. is the powerset of the set of all possible observations: it contains every combination of things you could possibly observe. This includes not just things you observe at the moment, but also things you observed in the past. "I hit my head an hour ago," for example, is a different observation from "I hit my head five seconds ago". At the most fundamental level, would be defined over something like particle structures of the brain, but it is more useful to think of it as observations. Certainly, every observation or set of observations you could ever want to use in the term appears in .
Now, assume you have all information about stored inside of a hypercomputer , and its interface has a nice and powerful filter. Type anything into the blue text field and will translate it into a set of observations , and it will focus on that particular set of observations only.
Moreover, also knows, for each , every instance in which has ever occurred and ever will occur in the history of the universe, and everything about the external circumstances of each such instance.
The red and bright green blobs symbolize the instances in which the set of observations in the blue box have been made and the proposition in the red box is true and false, respectively. Note, however, that this is a toy example: in reality, all observations available to the observer must be written into the blue box, including the size of their room and their opinion on Pink Floyd.
Under this model, the term doesn't make sense, not for any . There is no 'probability' of a set of observations occurring. But the term does make sense. Thus, we can remedy the previous abuse of notation by using a different operator. Rather than using , we define by
Where is the set of all instances in which has been made, and and are the subsets in which is true and false, respectively. Thus, we shall not write and in formulas anymore, but we'll still use and . In summary, this means that
These numbers are usually infinite, but this isn't a serious problem. In case of , we could use density functions; and in case of infinite time, we could use limit points. It won't have to concern us.
Still, this is clearly not the entire story. One, it says nothing about how to proceed if one doesn't have access to , which is always. Two, there are situation in which it doesn't give any justification to update, even though updating is clearly warranted, as we shall see. And three, it presupposes the ability to perfectly take all available information into account, no matter how hard it is to deduce their relevance. Then why did I call it a complete model in the headline?
Well, because it is a complete model, but not a model of all probability, merely one of randomness as I define it: given and , the value of as computed using is random. In other words, randomness is all probability that is left even with full knowledge of past and future observations and perfect deductive ability, i.e. with access to . Under this definition, randomness is fully compatible with determinism.
Let's deal with the third problem. In the model as laid out thus far, the probability may turn on aspects of whose importance we are not smart enough to deduce ourselves, even if they could in theory be explained to us. The solution is to, for each proposition , define an equivalence relation on , where
Then, given a set of observations , shall always condition on , that is, on the set of all possible combinations of observations that are indistinguishable in regard to from the perspective of the observer. In other words, after typing something into the blue box, will show all rather than just , and it will work with the instance set rather than . For example, in our opening question about existential risk, would be invariant under changes of the name of our species or the birthday of our pet, but a change in the number of functional nuclear weapons on the planet might lead to a different equivalence class. This is how the image might now look after the search (elements of that don't match are gray in this picture, rather than being left out entirely).
This lends naturally into the next definition. Relative to an observer, given and , the value of as computed using conditioned on is pseudo-random. What's left is being ignorant about knowledge available to ; any probability originating due to that reason is defined as logical uncertainty (we shall look at this in more depth later). Taken together, (pseudo-)randomness and logical uncertainty are exhaustive, any probability is one or the other.
A note on terminology: I use the term logical uncertainty for every statement that is absolute, i.e. not relative to any places, observer, or world. I make no distinction between statements like "a third of all species inventing the wheel also invent air travel" and statements which are more strictly logical in nature, such as "".
Now, it's time to deal with the big problem: is an idealized model that requires full knowledge of everything and is impossible to use. To have something more manageable, we introduce the semi-formal notion of an experiment. Relative to an observation and a hypothesis , an experiment is defined as a sequence of random events that fully specify how often is made in each possible branch, and in which cases is true. Importantly, we do not demand to know everything about ; for example, we might not know the probability of internal random events. We do, however, demand that all randomness have static probabilities.
Whereas runs on the data of the entire history of the universe (past and future), any one encompasses but a miniature slice of that information, just enough to specify a single situation. Whereas is unimaginably large, any one may be specified in just a few sentences, or in a small graph. Given this new definition, can be viewed as the universal experiment, that gives perfect answers in every situation, as supposed to any one , which will give answers for a single problem only. From now on, we shall tackle problems primarily by designing experiments (or, later, sets of possible experiments) that describe them.
Recall the case of a universe where one half of all species that have invented nukes is relatively safe, and the other half is relatively unsafe. Knowing this, we can update on based on the fact that it is made more often by members of relatively safe species. This is how an experiment describing this problem might look like (the leaves being blue indicates that is being made from them, in our case just once):
Although this appears to have a very different structure than , we can obtain the same results. Recall that
In an experiment like the above, we obtain the value of by checking where ends up as we run it many times. Formally,
where is the set of instances of made in iterations of . Since all probabilities in are static, this limit always exists by the law of large numbers. One could obtain the same results by defining in terms of products from probabilities on branches. In this case, we have .
Take a moment to consider the way in which we went about reducing complexity here. Clearly, no formal model can create information specific to a problem; we don't know any more about extinction rates in the universe now than we did before. What knowing about does is tell us which information matters, which instances of ignorance hurt, and why. For example, it is obvious that the universe won't actually look like this. Obviously, there aren't just safe and unsafe species out there and nothing in between. However, the model of tells us that if the universe did look like this, then the above experiment would indeed output perfect results. Consequently, it also tells us that an experiment which does output perfect results exists. Thus, we can think of our uncertainty about this problem as uncertainty about how exactly this perfect experiment looks like.
Our modelling of imperfect deductive ability as an equivalence relation is implicit in the construction of : by assigning the top branches a probability of each, we assert that there is no information aside from the length of time since the invention of nuclear weapons and the observation that we're still alive that helps us decide whether our species is safe or unsafe. Given that we have no point of comparison, this particular assumption might not even be unreasonable.
One potential problem with that I left out is the question of whether it can be biased, of in other words, whether there exists a set of observations such that only a hand full of instances of observing have ever occurred and will ever occur, and because of their low sample size, the proportion of times that a certain proposition was true would strike us as unreasonable even given perfect deductive ability. If that were the case, then we'd have instances of correct probabilities of random events be potentially dubious. But whether or not such instances exist, we can now see that it really doesn't matter. In order to construct an instance of , we already rely merely on our expectations about how randomness looks like, and thus, whether or not the true random numbers would seem reasonable to us is of no concern.
All in all, experiments are far more manageable than . But they don't answer the question of how to deal with logical uncertainty. This is what we turn to next.
Let's consider a variant of the most recent problem: suppose we know that we are not in the universe where species are split equally across both sides, rather we know that we are in one of the other two universes – the one where everyone is safe or the one where everyone is unsafe – but we don't know in which; for all we know, both are equally likely.
In both cases, we have a prior of on , and both times we want to update based on an ostensibly identical observation . But, clearly, they are different. In the former case, we can construct so that we know it outputs correct probabilities. In the latter case, we don't. Certainly, it is wrong to model as starting off with a coin toss: this would imply that repeated iterations can go down different branches, but that is not so. Rather, consists of one random event only; this random event has a fixed probability, we just don't know it. Put differently, there are two possible experiments and , one of which is running all the time, but we don't know which one it is.
In the former case, we can justify our updates by reasoning that is made 90 times as often when is true than when it's false. In the latter case this is not so; is either always true or always false.
Nonetheless, one might still wish to update in the second case. The mere fact that a particular argument doesn't work hardly proves that it should behave any differently. Indeed, doubting that the nature of probability plays a role could lead one to conclude that the second case must behave just like the first. This brings us to relevant points which have already been made; let's review the 2002 debate between Olum and [Bostrom and Ćirković].
Suppose that God tosses a fair coin. If it comes up heads, he creates ten people, each in their own room. If tails, he creates one thousand people, each in their own room. The rooms are numbered 1-10 or 1-1000. The people cannot see or communicate with the other rooms. Suppose that you know all this, and you discover that you are in one of the first ten rooms. How should you reason that the coin fell?
To illustrate the difference, he then extrapolates two different experimental protocols that give the SIA answer of and the SSA answer of , respectively (for heads).
Imagine that you are one of a very large number of experimental subjects who have been gathered, in case of need, into an experimental pool. Each subject is in a separate waiting room and cannot communicate with the others. First the experiment will be described to you, and then it will be performed. The experiment will have one of the following two designs.
Protocol 1 (random): The experimenter will flip a fair coin. If the coin lands heads, she will get ten subjects, chosen randomly from the pool, and put them in rooms numbered 1-10. If the coin lands tails, she will do the same with one thousand subjects in rooms numbered 1-1000.
Protocol 2 (guaranteed): The experimenter will flip a fair coin. If the coin lands heads, she will get you and nine other subjects, and put you randomly into rooms numbered 1-10. If the coin lands tails, she will get you and 999 other subjects and put you randomly into rooms numbered 1-1000.
He argues for protocol 1 on the basis that it's symmetric with respect to participants, and that protocol 2 introduces a sharp discontinuity among the number of subjects, among other arguments. In their reply, Bostrom and Ćirković start off with a thought experiment that has since come to be known as the Presumptuous Philosopher problem:
It is the year 2100 and physicists have narrowed down the search for a theory of everything to only two remaining plausible candidate theories, and (using considerations from super-duper symmetry). According to , the world is very, very big but finite and there are a total of a trillion trillion observers in the cosmos. According to , the world is very, very, very big but finite and there are a trillion trillion trillion observers. The super-duper symmetry considerations are indifferent as between these two theories. Physicists are preparing a simple experiment that will falsify one of the theories. Enter the presumptuous philosopher: “Hey guys, it is completely unnecessary for you to do the experiment, because I can already show to you that is about a trillion times more likely to be true than !” (whereupon the philosopher runs the argument that appeals to SIA).
SIA and FNC have no tools to differentiate between both problems. They just have to bite the bullet and agree that the philosopher is correct. But for us, the two experiments are widely different. The probability in God's Coin Toss is random, at least if is true. We know the exact configuration of the underlying experiment : by the definition of a fair coin toss, it has to come up heads in half of all worlds, so instances of observing "I am in room 1-10" are split evenly across cases where the coin has come up heads and tails.
The probability in the above problem, on the other hand, appears to be logically uncertain; we don't know how looks like. As it is run many times, the hypothesis " is true" is either always true or never, but which of those two holds depends entirely on a logically uncertain fact, a property of that we don't know.
How does one deal with a logically uncertain statement? Certainly, this isn't an easy problem. There's nothing analogous to randomness as we defined it because the proposition isn't relative to an observer: it is either always true or always false. However, it is still coherent to talk about probability in the Bayesian sense, as subjective uncertainty.
SIA and FNC don't seem to have the right answer, given that both of them agree with the philosopher. If the presumptuous philosopher problem isn't convincing to you, however, consider the following variant. Let there be two logically uncertain propositions, one of them false and the other true. Both propositions relate to the size of the universe in two independent ways: for each, the expected number of observers in the cosmos would be 1000 as large if it were true rather than false. This means that, if both were false, there would be observers; if one were true and the other false, there would be ; and if both were true, there would be . Given that one proposition is true and the other false, the actual number of observers is 1000.
Both propositions are extraordinarily hard to falsify, so for a long time, physicists assign both of them a prior of . And, just like in the presumptuous philosopher problem, SIA demands one favor the possibility with the most observers, which in this case means asserting that both propositions be almost certainly true. But, if everyone reasons like this, the result is that all people are horribly calibrated. Alternatively, if is true, all people are horribly calibrated in every world.
On the other hand, clearly some updates on logically uncertain propositions must be possible, otherwise, we couldn't do physics. Yet, if we're not allowed to count instances for logically uncertain propositions, then there is no justification for updating on anything by relying on . Even if we wanted to test whether coins come up heads half of the time or all of the time, flipped a coin 100 times and got 100 heads, wouldn't give us any justification to update towards the heads-only universe. Either we live in the heads-only universe or we don't; in the first case all observations of 100 times heads happen in the heads-only universe, and if we don't all observations of 100 times heads happen in the heads-sometimes universe. If we're not allowed to count instances...
The problem is not that is flawed. The problem is that is designed to output answers to questions of randomness. can tell an observer where she is within the universe, but it is utterly useless to figure out things about the universe. And this is a fundamental difference. It is why everything we have typed into , observations and propositions likewise, have been statements relative to an observer. Formally, they have all been formulas that are free in one variable. And this has to be so; once again, is there to help an observer figure out her place in the universe, so naturally, the observation has to be relative to her. And if the proposition were absolute, well then it would be a trivial example, as its value would either be zero or one, regardless of the observation. Remember that, by assumption, already knows everything there is to know about the universe.
If we wish to update our probability on a logically uncertain proposition, it is crucial that the opposite be the case: both the proposition and the observation must be absolute so that they describe the universe, not the observer. Both must be complete sentences in which every variable is bound. Given that, updates are possible and may be calculated via Bayes.
To hammer down the difference, when it comes to updating logical uncertainty, we shall no longer refer to the object we update on as an observation but as a fact, and denote it as rather than .
Let's go back to our initial example. We know we either live in the universe where every species is safe, or in the universe where every species is unsafe, but between the two, it's a toss-up. Can we update our probability based on still being alive?
First, we need to define and . The former is easy; "every species is in the universe is relatively safe" is a fine choice for , as it is a sentence with no free variables that does what we want. is a bit trickier. It can't be "I am alive" as interpreted as an observation like we did before, as that observation is made all the time by everyone. Actually, it cannot be any observation, or at least it's better not to think of it as one, since the term has a subjective connotation, particularly as it relates to . Instead, what we want to be is "I have conscious experience right now". This may sound like it's relative to an observer, but it absolutely isn't. It refers to a particular consciousness, namely yours, and its truth value is dependent entirely on the universe we live in. As such, it is no different from the statement "there are more than a trillion hydrogen atoms right now", both are simply true, and neither you nor anyone else has any say on the matter. Alas,
where and are the odds for you to have conscious experience right now given a safe and unsafe universe, respectively.
It is easy to see that the value of this term depends on quantum physics and on unknown facts about consciousness. If is true, then we presumably have , in which case . If is true, one might or might not still argue for on the basis that one would be more likely to have been born at all. If we do have , then ; no update takes place.
This suffices to remedy the scientific method. In the coin flip example, one can set "the coin I flipped has come up heads a hundred times" and update via Bayes, thereby getting the result one would expect. Once again, one mustn't be confused by the appearance of the word "I" in the sentence; it does not mean that it is relative to anything, as we now take "I" to mean one particular observer (which we never did when working with ). The difference may be subtle, but it is real and important.
Care must be taken to do this properly. For example, even if is true, one might still be tempted to update on "I am alive", where "I" is meant to refer to the current instance of oneself in particular, rather than being left unbound. Then it's not free in any variable, right? But this is an instance of the survivorship bias. If a million people jump of a cliff and you know at least one survived, and you talk to a random survivor, then the fact that the person you're talking to survived is not a surprise. This is what happens when we condition on "I am alive", interpreted as above. But updating on "I have conscious experience right now" is different, because the "I" in the sentence refers to the entire tree of your copies, not one leaf in particular. Now, it's like noting that at least one of the cliff-jumpers is still alive, which might be a trivial fact, but it is not an instance of the survivorship bias. The survivorship bias applies if the elements we refer to as fulfilling have been pre-selected to fulfill . This is not the case when noting that there are any survivors, but it is the case when we already know that there are survivors, talk to a random one among them, and then note that that person is a survivor.
The presumptuous philosopher problem also depends on facts about consciousness, which strikes me as exactly the right answer. If it is indeed true that we win a lottery at birth, tossed into a single body and seize to exist once it dies, then the odds of winning this lottery must be a trillion times higher in the case of , and then the problem must be isomorphic to the variant where both and predict people to have existed initially, but in the case of , of them have died as the universe has shrunk due to super-duper-evil physical phenomenon . Claiming to live in is thus equivalent to claiming that one has been among the relatively few people who have been close enough to the center of the universe to be spared by . This seems no different from claiming that one has survived 70 highly dangerous years of nuclear weapons. In both cases, this is evidence for preferring the theory with a safe humanity / a universe where has not occurred. Conversely, if one imagines to exist either way, this is of no concern.
Below is a summary, which also includes what would happen if the choice between and were random (we will see how this could be the case in the next section).
Re: updating Random Probability
Recall that one of the problems with the approach for updating random probability has been that it requires the full knowledge contained in . We've since covered updating logically uncertain propositions, but this leaves open the question of how to update a proposition in a case where we are logically uncertain about things relevant to the proposition, but where the proposition itself is random.
As an example, consider the version of Sleeping Beauty where she isn't asked about her probability estimate on the coin being heads, but about her probability estimate on it being Monday. According to SSA, the answer is . We, on the other hand, would count instances of observations, and conclude that the answer is . SIA and FNC, giving in the original problem, would answer likewise.
But now consider a second variant, where the experiment's protocol doesn't rely on a coin flip but on the value of the chromatic number of the plane (which the experimenters have secretly found out but not yet published). Suppose the protocol demands Sleeping Beauty be interviewed once per day, as many times as the number is large, that is, either 5, 6, or 7 times. (That means the first interview is on Monday, and the last is either on Friday, Saturday, or Sunday.) Being an amateur mathematician, Sleeping Beauty doesn't know anything about the problem except for the lower and upper bound, so her probability distribution on the value of the chromatic number is simply .
As she is being interviewed, how likely, from her perspective, is it Monday? Going by SSA, all possibilities for the chromatic number are equally likely, and so
Going by SIA, the probabilities are proportional to their number of observations, and so
This experiment is meant to illustrate both how my theory deals with this kind of uncertainty in principle, and why the "principled" approach is usually false.
Said approach is straight-forward. Given a random proposition and a logically uncertain proposition with possibility space and prior distribution , we define in the obvious way.