Self-indication assumption is wrong for interesting reasons

post by neq1 · 2010-04-16T04:51:23.166Z · LW · GW · Legacy · 24 comments

Contents

  Argument for SIA posted on Less Wrong
  General argument for SIA and why it's wrong
  Egotism
None
24 comments

The self-indication assumption (SIA) states that

Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.

The reason this is a bad assumption might not be obvious at first.  In fact, I think it's very easy to miss.

Argument for SIA posted on Less Wrong

First, let's take a look at a argument for SIA that appeared at Less Wrong (link).  Two situations are considered.

1.  we imagine that there are 99 people in rooms that have a blue door on the outside (1 person per room).  One person is in a room with a red door on the outside.  It was argued that you are in a blue door room with probability 0.99.

2.  Same situation as above, but first a coin is flipped.  If heads, the red door person is never created.  If tails, the blue door people are never created.  You wake up in a room and know these facts.  It was argued that you are in a blue door room with probability 0.99.

So why is 1. correct and 2. incorrect?  The first thing we have to be careful about is not treating yourself as special.  The fact that you woke up just tells you that at least one conscious observer exists. 

In scenario 1 we basically just need to know what proportion of conscious observers are in a blue door room.  The answer is 0.99.

In scenario 2 you never would have woken up in a room if you hadn't been created.  Thus, the fact that you exist is something we have to take into account.  We don't want to estimate P(randomly selected person, regardless of if they exist or not, is in a blue door room).  That would be ignoring the fact that you exist.  Instead, the fact that you exist tells us that at least one conscious observer exists.  Again, we want to know what proportion of conscious observers are in blue door rooms.  Well, there is a 50% chance (if heads landed) that all conscious observers are in blue door rooms, and a 50% chance that all conscious observers are in red door rooms.  Thus, the marginal probability of a conscious observer being in a blue door room is 0.5.

The flaw in the more detailed Less Wrong proof (see the post) is when they go from step C to step D.  The *you* being referred to in step A might not exist to be asked the question in step D.  You have to take that into account.

General argument for SIA and why it's wrong

Let's consider the assumption more formally.

Assume that the number of people to be created, N, is a random draw from a discrete uniform distribution1 on {1,2,...,Nmax}.  Thus, P(N=k)=1/Nmax, for k=1,...,Nmax.  Assume Nmax is large enough so that we can effectively ignore finite sample issues (this is just for simplicity).

Assume M= Nmax*(Nmax+1)/2 possible people exist, and we arbitrarily label them 1,...,M.  After the size of the world, say N=n, is determined, then we randomly draw n people from the M possible people.

After the data are collected we find out that person x exists.

We can apply Bayes' theorem to get the posterior probability:

P(N=k|x exists)=k/M, for k=1,...,Nmax.

The prior probability was uniform, but the posterior favors larger worlds.  QED.

Well, not really.

The flaw here is that we conditioned on person x existing, but person x only became of interest after we saw that they existed (peeked at the data).

What we really know is that at least one conscious observer exists -- there is nothing special about person x.

So, the correct conditional probability is:

P(N=k|someone exists)=1/Nmax, for k=1,...,Nmax.

Thus, prior=posterior and SIA is wrong.

Egotism

The flaw with SIA that I highlighted here is it treats you as special, as if you were labeled ahead of time.  But the reality is, no matter who was selected, they would think they are the special person.  "But I exist, I'm not just some arbitrary person.  That couldn't happen in small world.  It's too unlikely."  In reality, that fact that I exist just means someone exists. I only became special after I already existed (peeked at the data and used it to construct the conditional probability).

Here's another way to look at it.  Imagine that a random number between 1 and 1 trillion was drawn.  Suppose 34,441 was selected.  If someone then asked what the probability of selecting that number was, the correct answer is 1 in 1 trillion.  They could then argue, "that's too unlikely of an event.  It couldn't have happened by chance."  However, because they didn't identify the number(s) of interest ahead of time, all we really can conclude is that a number was drawn, and drawing a number was a probability 1 event.

I give more examples of this here.

I think Nick Bostrom is getting at the same thing in his book (page 125):

..your own existence is not in general a ground for thinking that hypotheses are more likely to be true just by virtue of implying that there is a greater total number of observers. The datum of your existence tends to disconfirm hypotheses on which it would be unlikely that any observers (in your reference class) should exist; but that’s as far as it goes. The reason for this is that the sample at hand—you—should not be thought of as randomly selected from the class of all possible observers but only from a class of observers who will actually have existed. It is, so to speak, not a coincidence that the sample you are considering is one that actually exists. Rather, that’s a logical consequence of the fact that only actual observers actually view themselves as samples from anything at all

Related arguments are made in this LessWrong post.  


1 for simplicity I'm assuming a uniform prior... the prior isn't the issue here

24 comments

Comments sorted by top scores.

comment by byrnema · 2010-04-16T19:09:23.265Z · LW(p) · GW(p)

Many forms of the anthropic argument just don't hold water. You can bend over backwards to find the fault in the logic, and I applaud your effort here to do that.

I think an easier way to dismiss the set of arguments is to think of two different cases, one in which there are few observers and one in which there are many, and then ask how the subjective observer could use the anthropic argument to distinguish these two. She can't.

Then these arguments can be discounted with the line of reasoning that says if a theory can't tell you which world you're in, then it predicts everything, so it tells you nothing. (Evidence for a given theory is the observation of an event that is more likely to occur if the theory is true than if it is false.)

Consider the argument that since we're observers at a relatively early time in human technological development, this means we should update that there is a higher probability that humans don't persist for a hugely long time after. This argument kind of makes sense when worded exactly as "if humans persisted for billions of years, what is the probability I would be a human in the first .005 billion years?".

But the way to test if that line of reasoning works is to ask, suppose you have two realities, one in which humans persisted for .1 billion years and one in which humans persisted for 100 billion years. How could the set of observers at .005 billion years use the anthropic argument to distinguish between the two? They couldn't. The anthropic argument has no power to select among these two realities; the anthropic principle predicts exactly the same set of observations for the set of observers at time point .005 billion years for the two different realities. Likewise, consider that there are 50 red rooms or 5000 red rooms, and one blue door. The person who wakes in the blue room has no evidence about the number of red rooms, because her observations (a blue room) are exactly the same for both cases.

comment by Tehom · 2010-04-16T22:15:38.073Z · LW(p) · GW(p)

One suggestion, instead of putting "proof" in suspicion-quotes, you could say "argument" instead. A proof is just an air-tight argument.

Replies from: magfrump, neq1
comment by magfrump · 2010-04-18T22:56:40.084Z · LW(p) · GW(p)

Voted up for use of the term "suspicion-quotes."

comment by neq1 · 2010-04-17T00:25:10.595Z · LW(p) · GW(p)

Good point. Thanks

comment by Jordan · 2010-04-16T18:29:04.559Z · LW(p) · GW(p)

Interesting. Consider a more realistic case:

There is a large pool of people (say, the entire world), and we randomly select from that pool to fill the rooms (either picking 1 person or 99 people, depending on the toss of the coin). In this case you should conclude that the door is mostly likely blue, because you must condition on the chance of having been selected at all (which is 99 times greater if 99 people are randomly chosen instead of 1).

The question is does the fact "I was chosen randomly from a pool" in this example affect the probability in the same way that the fact "I exist at all" in your original example does, where the people are created rather than chosen.

If we use frequentist logic and repeat the experiment, then those people that guess 'blue' will be right 99% of the time. But frequentist logic is not to be trusted, I guess?

Replies from: neq1
comment by neq1 · 2010-04-16T19:14:11.838Z · LW(p) · GW(p)

If you repeat the experiment, does who you call I stay the same (e.g., they might not get selected at all)? If so, then that person was labeled as special a priori, and if they find themself in a room, then the probability of blue is 0.99.

But.. I'm arguing that anyone who is selected, whether it is one person or 99 people, will all think of themselves as I. When you think of frequentist properties then, you have to think about the label switching each time. That changes everything. The fact that you were selected just means that someone was selected, and that was a probability 1 event. Thus, probability of blue door is .5.

Replies from: Jordan, orthonormal
comment by Jordan · 2010-04-17T00:52:24.184Z · LW(p) · GW(p)

The frequentist perspective requires no special labeling. It's an outside observation that requires no concept of I. Keeping tabs on all the results we would simply see that 99% of people would have been correct to guess they were in a blue room.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2010-04-17T23:01:02.772Z · LW(p) · GW(p)

The frequentist perspective requires no special labeling. It's an outside observation that requires no concept of I. Keeping tabs on all the results we would simply see that 99% of people would have been correct to guess they were in a blue room.

It's hard to keep the indexicals out. You used one yourself when you wrote "99% of people would have been correct to guess they were in a blue room." (Emphasis added.) Granted: this particular use of an indexical can be avoided by writing "99% of people put into rooms would have been correct to guess that everyone who is in a room was in a blue room." Then there are no indexicals in the conclusions that the people reach.

However, there is still an indexical implicit in each person's reasoning procedure. Each person reasons according to the following rule: "If I am put into a room, guess that everyone who is in a room is in a blue room."

That, as I understand it, is neq1's point. If indexicals are ruled out of the reasoning process, then the people in your scenario can get no further than "If someone is put into a room, then guess that everyone who is in a room is in a blue room." With this reasoning procedure, only half the people will guess right.

Replies from: Jordan
comment by Jordan · 2010-04-18T03:20:33.136Z · LW(p) · GW(p)

I did use an indexical, you're right, damn.

If you rule out indexicals completely how can you even begin to reason about the probability of a statement ("I am in a blue room") that uses an indexical?

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2010-04-18T23:09:18.722Z · LW(p) · GW(p)

If you rule out indexicals completely how can you even begin to reason about the probability of a statement ("I am in a blue room") that uses an indexical?

We shouldn't rule out indexicals in your scenario, but we should understand their meaning in a non-indexical way.

In your scenario, where everyone in the pool of people exists, we can just suppose that each person has a unique identifier, such as a unique proper name. Then, for each proper name N, the person named "N" can reason according to the rule "Upon learning that N is in a room, guess that N is in a blue room." This allows them to achieve the 0.99 success rate that indexical reasoning allows.

[ETA: Note that this means that each person N is employing a different rule. This is reasonable because N will have learned that information regarding N is especially reliable. We can imagine minds that could go through this reasoning process without ever thinking to themselves "Hey, wait a minute — I myself am N."]

In real life, people share proper names. But we can still suppose that each person can be picked out uniquely with some set of non-indexical properties.

For example, there might be more than one person who is named "Bob". There might be more than one person who is named "Bob" and was born on January 8th, 1982. There might even be more than one person who is named "Bob", was born on January 8th, 1982, and has red hair. But, if we keep adding predicates, we can eventually produce a proper definite description that is satisfied by exactly one person in the pool.

This is what justifies the kind of indexical reasoning that works so well in your scenario.

What makes the scenario in the OP different is this: Some of the possible people in the "pool" are distinguished from the others only by whether they exist. The problem here is that existence is not a predicate (according to most analytic philosophers). Thus, "exists" is not among the properties that we can use to pick out a unique individual with a proper definite description. That's what makes it problematic to carry over indexical reasoning to the scenario in the OP.

Replies from: Jordan
comment by Jordan · 2010-04-21T01:15:19.070Z · LW(p) · GW(p)

Interesting. Thanks for clarifying that.

Regardless of if "I" is a valid index in this case though, certainly "person P used the word 'I' and concluded 'I am in a blue room' " is a valid predicate, even if person P's use of "I" was gibberish.

We can then say that 99% of people, if they concluded that gibberish, would have gone on to conclude the gibberish, "I was, in fact, right to conclude that I was in a blue room."

comment by orthonormal · 2010-04-16T21:03:50.960Z · LW(p) · GW(p)

Um, no. It's not even controversial that you're wrong in this case.

(For purposes of intuition, let's say there are just 100 people in the world. Do you really think that finding yourself selected is no evidence of blue?)

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2010-04-16T21:12:12.752Z · LW(p) · GW(p)

Um, no. It's not even controversial that you're wrong in this case.

About what, precisely, is neq1 wrong? neq1 agreed with Jordan that the probability of blue in Jordan's scenario was 0.99. However, as neq1 rightly points out, in Jordan's scenario a specific individual is distinguished prior to the experiment. This doesn't happen in neq1's scenario.

Replies from: orthonormal
comment by orthonormal · 2010-04-16T21:52:32.210Z · LW(p) · GW(p)

If neq1 was saying that any person who finds themselves selected in that scenario should conclude "blue" with probability 0.99, then I've misunderstood his/her last sentence.

Replies from: neq1, Tyrrell_McAllister
comment by neq1 · 2010-04-17T11:31:46.573Z · LW(p) · GW(p)

It's a hidden label switching problem.

If Laura exists, she'll ask P(blue door | laura exists). Laura=I

If Tom exists, he'll ask P(blue door | Tom exists). Tom=I

If orthonormal exists, s/he will ask P(blue door | orthonormal exists). orthonormal=I

and so on. Notice how the question we ask depends on the result of the experiment? See how the label switches?

What do Tom, Laura and orthonormal have in common? They are all conscious observers.

So, if orthonormal wakes up in a room, what orthonormal knows is that at least one conscious observer exists. P(blue room | at least one conscious observer exists)=0.5

comment by Tyrrell_McAllister · 2010-04-16T23:51:09.283Z · LW(p) · GW(p)

neq1's first paragraph refers to Jordan's scenario. neq1's second paragraph alters the scenario to be more like the one in the OP. In the altered version, we view the situation "from the outside". We have no way to specify any particular individual as I before the experiment begins, so our reasoning can only capture the fact that someone ended up in a room. Since we already knew that that would happen, we are still left with the prior probability of .5 that the coin came up heads.

comment by byrnema · 2010-04-16T19:54:14.879Z · LW(p) · GW(p)

You may want to make a link to this post. There were a few different descriptions of why that problem (a very similar one) didn't work, and the one I pinpointed as "the pointer problem" is more or less that same as the one you pinpointed.

The flaw here is that we conditioned on person x existing, but person x only became of interest after we saw that they existed (peeked at the data).

Perhaps we could call the error a "pre-selection bias"?

(In a separate daughter comment, I'll summarize the anthropic problem described in the other post, emphasizing the similarity of the problems and the solutions. )

Replies from: byrnema, neq1
comment by byrnema · 2010-04-16T19:54:57.385Z · LW(p) · GW(p)

In a nutshell, the simplified problem in the post was this: You have a hotel with green and red rooms, 4 of one color and 1 of another. If you ask an observer at random which case they think it is, on average they will be correct 80% of the time. However, if you ask someone in a green room, they will only be correct 50% of the time.

(Here's the detailed explanation. Skip if you prefer.)


Suppose you ask a random person. 10 trials would like this on average:

  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- red room guys says '4 red' and is incorrect
  • RRRRG -- red room guys says '4 red' and is correct
  • RRRRG -- red room guys says '4 red' and is correct
  • RRRRG -- red room guys says '4 red' and is correct
  • RRRRG -- red room guys says '4 red' and is correct
  • RRRRG -- green room guys says '4 green' and is incorrect

On average, the observers are correct 80% of the time because the frequency of a red verses green observer is information about the true distribution.

Suppose you ask a person in a green room. In this case, 10 trials on average would like this:

  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • GGGGR -- green room guys says '4 green' and is correct
  • RRRRG -- green room guys says '4 green' and is incorrect
  • RRRRG -- green room guys says '4 green' and is incorrect
  • RRRRG -- green room guys says '4 green' and is incorrect
  • RRRRG -- green room guys says '4 green' and is incorrect
  • RRRRG -- green room guys says '4 green' and is incorrect

Now, the observers are only correct 50% of the time because their distribution doesn't reflect the true distribution. You skewed the frequency of green roomers by pre-selecting green.


This was my summary solution to the problem:

When a person is asked to make a prediction based on their subjective observation, they should agree only if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, dependent upon having a certain outcome, they should recognize this as [what we might call] pre-selection bias. Your prediction is meaningful only if your prediction could have been either way.

comment by neq1 · 2010-04-17T00:27:01.297Z · LW(p) · GW(p)

Thanks for directing me to that post.

I think calling it 'pre-selection bias' makes sense. Would be good to have a name for it, is it's an error that is common and easy to miss.

comment by Unknowns · 2010-04-16T13:20:25.662Z · LW(p) · GW(p)

I upvoted this because it is correct, but unfortunately you won't be able to persuade anyone. We had this whole argument on the discussion you linked to.

comment by PhilGoetz · 2010-05-07T19:44:52.067Z · LW(p) · GW(p)

Sorry, I disagree. If you repeated the coin-flip 200 times, there would be (on average) 9900 people in blue rooms, and 100 people in red rooms. You know only that you are one of these 10000 people. The odds are 99% you are one of those behind a blue door.

comment by CronoDAS · 2010-04-16T18:05:09.365Z · LW(p) · GW(p)

You mean "discrete" not "discreet".

Replies from: neq1
comment by neq1 · 2010-04-16T18:13:09.524Z · LW(p) · GW(p)

fixed. thanks.

comment by PlaidX · 2010-04-17T14:03:43.522Z · LW(p) · GW(p)

I think you have it completely backwards, SIA isn't based on egotism, but precisely the reverse. You're more likely, as a generic observer, to exist in a world with more generic observers, because you AREN'T special, and in the sense of being just a twinkle of a generic possible person, could be said to be equally all 99 people in a 99 people world.

You are more likely to be in a world with more people because it's a world with more of YOU.

Here's the problem. YOU'RE the egoist, in the sense that you're only tallying the score of one random observer out of 99, as though the other 98 don't matter. We have a possible world where one person is right or wrong, and a possible world where 99 people are right or wrong, but for some reason you only care about 1 of those 99 people.

EDIT: more talking

Under anthropic reasoning, if we flip a coin, and create 5 observers if it's heads, or 95 observers if it's tails, and if all you know is that you are an observer created after the coin flip, the way you guess which of the 100 possible observers you are is to pick randomly among them, giving you a 5% chance of being a heads observer and a 95% chance of being a tails observer.

Under nonanthropic reasoning, it's a little more complicated. We have to stretch the probabilities of being the 5 tails-world observers so that they take up as much probability space as the 95 heads-world observers. Because, so the thinking goes, your likelihood to be in a possible world doesn't depend on the number of observers in that world. Unless the number is zero, then it does. Please note that this special procedure is performed ONLY when dealing with situations involving possible worlds, and not when both worlds (or hotels, or whatever) actually exist. This means that nonanthropic reasoning depends on the many-worlds interpretation of quantum mechanics being false, or at least, if it's false, coin flips go back to being covered by anthropic reasoning and we have to switch to situations that are consequent on some digit of pi or something.

This smells a little fishy to me. It seems like there's a spanner in the works somewhere, ultimately based on a philosophical objection to the idea of a counterfactual observer, which results in a well-hidden but ultimately mistaken kludge in which certain data (the number of observers) is thrown out under special circumstances (the number isn't zero and they only exist contingent on some immutable aspect of the universe which we do not know the nature of, such as a particular digit of pi).