A confused model of the self-indication assumption

post by AdeleneDawner · 2011-04-17T13:40:20.112Z · LW · GW · Legacy · 30 comments

Contents

30 comments

Imagine that I write a computer program that starts by choosing a random integer W between 0 and 2. It then generates 10^(3W) random simple math problems, numbering each one and placing it in list P. It then chooses a random math problem from P and presents it to me, without telling me what the problem number is for that particular math problem.

In this case, being presented with a single math problem tells me nothing about the state of W - I expect it to do that in any case. Similarly, if I subsequently find out that I was shown P(50), that rules out W=0 and makes W=1 1,000 times more likely than W=2.

 

Given that W represents which world we're in, each math problem in P represents a unique person, and being presented with a math problem represents experiencing being that person or knowing that that person exists, the self indication assumption says that my model is flawed.

 

According to the self-indication assumption, my program needs to do an extra step to be a proper representation. After it generates a list of math problems, it needs to then choose a second random number, X, and present me with a math problem only if there's a math problem numbered X. In this case, being presented with a math problem or not does tell me something about W - I have a much higher chance of getting a math problem if W=2 and a much lower chance if W=0 - and finding out that the one math problem I was presented with was P(50) tells me much more about X than it does about W.

I don't see why this is a proper representation, or why my first model is flawed, though I suspect it relates to thinking about the issue in terms of specific people rather than any person in the relevant set, and I tend to get lost in the math of the usual discussions. Help?

30 comments

Comments sorted by top scores.

comment by Tyrrell_McAllister · 2011-04-17T17:57:55.894Z · LW(p) · GW(p)

According to the self-indication assumption, my program needs to do an extra step to be a proper representation. After it generates a list of math problems, it needs to then choose a second random number, X, and present me with a math problem only if there's a math problem numbered X. In this case, being presented with a math problem or not does tell me something about W - I have a much higher chance of getting a math problem if W=2 and a much lower chance if W=0 - and finding out that the one math problem I was presented with was P(50) tells me much more about X than it does about W.

That is not how I would describe SIA in terms of your setup. I would model SIA in this way:

The program starts by generating three pairwise-disjoint lists of math problems: a short list of length 10^(3*0), a medium list of length 10^(3*1), and a long list of length 10^(3*2). The program then chooses a problem P at random from the union of these three lists, and then presents the problem to you. Let W be such that P was on the list of length 10^(3*W). Now, whatever P you get, P was more likely to appear on a longer list than on a shorter list. Therefore, being presented with P makes it more likely that W was larger than that W was smaller.

In contrast, under SSA, the program, having generated the three pairwise-disjoint lists, chooses a number W at random from {0,1,2}, and then presents you with a random problem P from the list of length 10^(3*W).

In general, the distinction between SIA and SSA is this:

  • Under SIA, the program first chooses an individual at random from among all possible individuals. (Distinct worlds contain disjoint sets of individual.) The world containing the chosen individual is "the actual world". Note that this selection process yields a prior probability distribution for the number of individuals in the actual world, and this prior distribution favors larger values. The program then reports the experience of the chosen individual to you. You then use the fact that the program selected an individual with this experience to get a posterior probability distribution for the number of individuals in the actual world. But you are updating a prior distribution that favored larger worlds.

  • Under SSA, the program first chooses a world at random from among all possible worlds. The chosen world is "the actual world". Note that this selection process shows no favor to larger worlds as such. Then the program chooses an individual at random from among the individuals within the actual world. The program then reports the experience of the chosen individual to you. You then use the fact that the program selected an individual with this experience to get a posterior probability distribution for the number of individuals in the actual world. But you are updating a prior distribution that did not favor larger worlds as such.

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T18:07:44.163Z · LW(p) · GW(p)

Ahh. Okay, that makes sense.

I'm still not clear on why anyone would think that the world works as indicated by SIA, but that seems likely to be a rather less confusing problem.

Replies from: cousin_it, Tyrrell_McAllister
comment by cousin_it · 2011-04-17T18:24:50.638Z · LW(p) · GW(p)

After the discussion in my previous post I became quite certain that the world can't work as indicated by SSA (your model), and SIA is by far more likely. If you're the only person in the world right now, and Omega is about to flip a fair coin and create 100 people in case of heads, then SSA tells you to be 99% sure of tails, while SIA says 50/50. There's just no way SSA is right on this one.

Bostrom talks about such paradoxes in chapter 9 of his book, then tries really hard to defend SSA, and fails. (You have to read and settle this for yourself. It's hard to believe Bostrom can fail. I was surprised.)

Also maybe it'll help if you translate the problem into UDT-speak, "probability as caring". Believing in SSA means you care about copies of yourself in little worlds much more than about your copies in big worlds. SIA means you care about them equally.

Replies from: AlephNeil, AdeleneDawner, Tyrrell_McAllister
comment by AlephNeil · 2011-04-17T20:03:49.170Z · LW(p) · GW(p)

Now might be a good time to mention "full non-indexical conditioning", which I think is incontestably an advance on SSA and SIA.

To be sure, FNC still faces the severe problem that observer-moments cannot be individuated, leading (for instance) to variations on Sleeping Beauty where tails causes only a 'partial split' (like an Ebborian midway through dividing) and the answer is indeterminate. But this is no less of a problem for SSA and SIA than for FNC. The UDT approach of bypassing the 'Bayesian update' stage and going straight to the question 'what should I do?' is superior.

Replies from: CarlShulman
comment by CarlShulman · 2011-04-19T03:31:33.691Z · LW(p) · GW(p)

Neal's approach (even according to Neal) doesn't work in Big Worlds, because then every observation occurs at least once. But full non-indexical conditioning tells us with near certainty that we are in a Big World. So if you buy the approach, it immediately tells you with near certainty that you're in the conditions under which it doesn't work.

Replies from: AlephNeil
comment by AlephNeil · 2011-04-19T11:20:35.393Z · LW(p) · GW(p)

Sure, that's a fair criticism.

What I especially like about FNC is that it refuses to play the anthropic game at all. That is, it doesn't pretend that you can 'unwind all of a person's observations' while retaining their Mind Essence and thereby return to an anthropic prior under which 'I' had just as much chance of being you as me. (In other words, it doesn't commit you to believing that you are an 'epiphenomenal passenger'.)

FNC is just 'what you get if you try to answer those questions for which anthropic reasoning is typically used, without doing something that doesn't make any sense'. (Or at least it would be if there was a canonical way of individuating states-of-information.)

comment by AdeleneDawner · 2011-04-17T18:55:39.518Z · LW(p) · GW(p)

If you're the only person in the world right now, and Omega is about to flip a fair coin and create 100 people in case of heads, then SSA tells you to be 99% sure of tails, while SIA says 50/50. There's just no way SSA is right on this one.

If the program has already generated one problem and added it to P, and then generates 1 or 0 randomly for W and adds 100W problems to P - which is basically the same as my first model, and should be equivalent to SSA - then I should expect a 50% chance of having 1 problem in P and a 50% chance of having 101 problems in P, and also a 50% chance of W=1.

If it does the above, and then generates a random number X between 1 and 101, and only presents me with a problem if there's a problem numbered X, and I get shown a problem, I should predict a ~99% chance that W=1. I think this is mathematically equivalent to SIA. (It is if my second formulation in the OP is, which I think is true even if it's rather round-about.)

Replies from: cousin_it
comment by cousin_it · 2011-04-17T19:02:26.995Z · LW(p) · GW(p)

If the program has already generated one problem and added it to P, and then generates 1 or 0 randomly for W and adds 100W problems to P - which is basically the same as my first model, and should be equivalent to SSA - then I should expect a 50% chance of having 1 problem in P and a 50% chance of having 101 problems in P, and also a 50% chance of W=1.

Yeah, that's what SSA says you should expect before updating :-) In my example you already know that you're the first person, but don't know if the other 100 will be created or not. In your terms this is equivalent to updating on the fact that you have received math problem number 1, which gives you high confidence that the fair coinflip in the future will come out a certain way.

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T19:11:06.011Z · LW(p) · GW(p)

And after updating, as well. The first math problem tells you basically nothing, since it happens regardless of the result of the coin flip/generated random number.

Ignore the labels for a minute. Say I have a box, and I tell you that I flipped a coin earlier and put one rock in the box if it was heads and two rocks in the box if it was tails. I then take a rock out of the box. What's the chance that the box is now empty? How about if I put three rocks in for tails instead of two?

Replies from: cousin_it
comment by cousin_it · 2011-04-17T19:25:46.422Z · LW(p) · GW(p)

I refuse to ignore the labels! :-) Drawing the first math problem tells me a lot, because it's much more likely in a world with 1 math problem than in a world with 101 math problems. That's the whole point. It's not equivalent to drawing a math problem and refusing to look at the label.

Let's return to the original formulation in your post. I claim that being shown P(1) makes W=0 much more likely than W=1. Do you agree?

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T19:31:03.168Z · LW(p) · GW(p)

If I know that it's P(1), and I know that it was randomly selected from all the generated problems (rather than being shown to me because it's the first one), then yes.

If I'm shown a single randomly selected problem from the list of generated problems without being told which problem number it is, it doesn't make W=0 more likely than W=1 or W=2.

comment by Tyrrell_McAllister · 2011-04-17T18:38:50.907Z · LW(p) · GW(p)

After the discussion in my previous post I became quite certain that the world can't work as indicated by SSA (your model), and SIA is by far more likely. If you're the only person in the world right now, and Omega is about to flip a fair coin and create 100 people in case of heads, then SSA tells you to be 99% sure of tails, while SIA says 50/50. There's just no way SSA is right on this one.

Bostrom talks about such paradoxes in chapter 9 of his book, then tries really hard to defend SSA, and fails. (You have to read and settle this for yourself. It's hard to believe Bostrom can fail. I was surprised.)

To be fair, Bostrom's version of SSA ("strong" SSA, or SSSA) does not "[tell] you to be 99% sure of tails" when you are still the only person in the world. In whatever sense his defense might fail, it is not because his SSSA leads to the implication that you describe, because it does not.

ETA: Prior to the copying, there is only one individual in your reference class—namely, the one copy of you. That is, the "reference class" contains only a single individual in all cases, so there is no anthropic selection effect. Therefore, SSSA still says 50/50 in this situation.

Replies from: cousin_it
comment by cousin_it · 2011-04-17T18:52:42.412Z · LW(p) · GW(p)

Bostrom's proposal fails even harder than "naive" SSA: it refuses to give a definite answer. He says selecting a reference class may be a "subjective" problem, like selecting a Bayesian prior. Moreover, he says that giving the "intuitively right" answer to problems like mine is one of the desiderata for a good reference class, not a consequence of his approach. See this chapter.

Re your ETA: Bostrom explicitly rejects the idea that you should always use subjectively indistinguishable observer-moments as your reference class.

Replies from: Tyrrell_McAllister, Tyrrell_McAllister
comment by Tyrrell_McAllister · 2011-04-17T19:14:31.184Z · LW(p) · GW(p)

Re your ETA: Bostrom explicitly rejects the idea that you should always use subjectively indistinguishable observer-moments as your reference class.

Right. I don't think that I implied otherwise . . .

comment by Tyrrell_McAllister · 2011-04-17T19:06:57.522Z · LW(p) · GW(p)

Bostrom's proposal fails even harder than "naive" SSA: it refuses to give a definite answer. He says selecting a reference class may be a "subjective" problem, like selecting a Bayesian prior. Moreover, he says that giving the "intuitively right" answer to problems like mine is one of the desiderata for a good reference class, not a consequence of his approach.

He does not solve the problem of defining the reference class. He doesn't refuse to give a definite answer. He just doesn't claim to have given one yet. As you say, he leaves open the possibility that choosing the reference class is like choosing a Bayesian prior, but he only offers this as a possibility. Even while he allows for this possibility, he seems to expect that more can be said "objectively" about what the reference class must be than what he has figured out so far.

Replies from: cousin_it
comment by cousin_it · 2011-04-17T19:18:10.081Z · LW(p) · GW(p)

So, it's a work in progress. If it fails, it certainly isn't because it gives the wrong answer on the coin problem that you posed.

To me it looks abandoned, not in progress. And it doesn't give any definite answer. And it's not clear to me whether it can be patched to give the correct answer and still be called "SSA" (i.e. still support some version of the Doomsday argument). For example, your proposed patch (using indistinguishable observers as the reference class) gives the same results as SIA and doesn't support the DA.

Anyway. We have a better way to think about anthropic problems now: UDT! It gives the right answer in my problem, and makes the DA go away, and solves a whole host of other issues. So I don't understand why anyone should think about SSA or Bostrom's approach anymore. If you think they're still useful, please explain.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2011-04-17T20:34:21.604Z · LW(p) · GW(p)

Anyway. We have a better way to think about anthropic problems now: UDT! It gives the right answer in my problem, and makes the DA go away, and solves a whole host of other issues. So I don't understand why anyone should think about SSA or Bostrom's approach anymore. If you think they're still useful, please explain.

When it comes to deciding how to act, I agree that the UDT approach to anthropic puzzles is the best I know. Thinking about anthropics in the traditional way, whether via SSA, SIA, or any of the other approaches, only makes sense if you want to isolate a canonical epistemic probability factor in the expected-utility calculation.

Replies from: cousin_it
comment by cousin_it · 2011-04-17T21:01:40.824Z · LW(p) · GW(p)

In the context of the Doomsday Argument, or Great Filter arguments, etc., UDT is typically equivalent to SIA.

comment by Tyrrell_McAllister · 2011-04-17T18:26:18.706Z · LW(p) · GW(p)

I'm still not clear on why anyone would think that the world works as indicated by SIA,

I also don't see the appeal of SIA. As far as I know, its only selling point is that it nullifies the Doomsday Argument. But that doesn't seem to me to be the right basis for choosing a method of anthropic reasoning.

Moreover, Katja Grace points out that even SIA implies "Doomsday" in the sense that SIA, with some reasonable assumptions, makes the Great Filter likely to be ahead of us instead of behind us. For it seems plausible that, among the universes with Great Filters, most individuals live prior to their lineage's getting hit with the Great Filter. So, if we update on the fact that we live in a universe with a Great Filter (which follows from the Fermi Paradox), then SIA tells us to expect that our Great Filter is in our future, not in our past (as it would be if the Great Filter were something like the difficulty of evolving intelligence).

Replies from: CarlShulman, cousin_it
comment by CarlShulman · 2011-04-19T03:45:02.925Z · LW(p) · GW(p)

Katja agrees that this only holds if you assume we are not simulations. SIA hugely supports the simulation hypothesis, and then the SIA-Doomsday argument fails.

comment by cousin_it · 2011-04-17T20:02:25.435Z · LW(p) · GW(p)

Hmm. It seems to me that Katja's argument fails if huge interstellar civilizations are likely to stop other civilizations from reaching our current stage (deliberately or unwittingly), which sounds plausible to me.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2011-04-17T20:43:32.171Z · LW(p) · GW(p)

It seems to me that Katja's argument fails if huge interstellar civilizations are likely to stop other civilizations from reaching our current stage (deliberately or unwittingly), which sounds plausible to me.

Could you explain? Wouldn't that just tell you with even greater certainty that there are no huge interstellar civilizations around, which would argue even more strongly that we live in a universe with a Great Filter? And couldn't it still be the case that most individuals would live prior to their lineage's encounter with the Great Filter? So, why wouldn't Katja's argument still go through?

ETA: Okay, I think that I see your point: If, in each universe where life arises, some civilization gets huge and nips all other life in that universe in the bud, and if the civilization gets so huge that it outnumbers the sum of the populations of all the lineages that it squelches, then it would not be the case that "most individuals live prior to their lineage's getting hit with the Great Filter". On the contrary, across all possible worlds, most individuals would live in one of these huge civilizations, which never get hit with a Great Filter. In that case, Katja's argument would not go through.

Replies from: cousin_it
comment by cousin_it · 2011-04-17T21:04:08.751Z · LW(p) · GW(p)

Yep, that's what I meant. I wonder if anyone raised this point before, it sounds kinda obvious.

Replies from: Tyrrell_McAllister
comment by Tyrrell_McAllister · 2011-04-17T22:31:08.461Z · LW(p) · GW(p)

I think that a lot of people don't consider "We just happen to be the first technical civilization" to be a satisfactory solution to the Fermi paradox. It is the fact that this region wasn't already teeming with life that points to the presence of a Great Filter.

Your proposal conjoins this response to the Fermi paradox with the further claim that we will go on to squelch any subsequent technical civilizations. So your proposal can only be less satisfying than the above response to the Fermi paradox. The problem is that, if we are going to be this region's Great Filter, then we have come too late to explain why this region isn't already teeming with life.

comment by AlephNeil · 2011-04-17T16:58:31.150Z · LW(p) · GW(p)

It then chooses a random math problem from P and presents it to me, without telling me what the problem number is for that particular math problem.

In the scenario you describe, you know at the outset that there is only one copy of you. To be able to apply anthropic assumptions like SIA and SSA, you would need to amend the scenario so that there are multiple 'copies' of you.

Rather than generating 10^(3W) random simple math problems and having a random one shown to you, say you arrange for 10^(3W) copies of yourself to be created. And then let each copy be shown a different math problem.

Then SSA says that, upon finding yourself looking at a math problem, you learn nothing at all about W, whereas SIA says you need to multiply your prior odds by 1:1000:1000000.

An interesting variation to consider is where 10^6 copies of you are created, and then 10^(3W) of them are chosen at random to be shown a math problem. Then to be able to apply SSA, you need to decide whether to regard yourself as (i) a random person or (ii) a random person-who-received-a-math-problem. If (i) then SSA and SIA will both recommend updating your odds as above. If (ii) then SSA says you learn nothing whereas SIA recommends that you update your odds. (SSA cares about the reference class whereas SIA doesn't.)

Perhaps the purpose of your 'model' of SIA is precisely to find a way of understanding it without bringing in multiple observers or 'copies'. To be honest, I don't think this makes much sense (like trying to explain relativity without reference to space or time (or spacetime)).

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T17:24:54.823Z · LW(p) · GW(p)

In the scenario you describe, you know at the outset that there is only one copy of you.

Sort of yes, sort of no. For my formulation to work, different observer-moments have to be considered as separate; seeing one math problem represents the entire experience of being a particular person or knowing that a particular person exists. If I set the program up to shuffle list P like a deck of cards and let me go through the list one by one, and I look at 10 math problems, that's equivalent to knowing that the world contains at least 10 unique individuals.

In other words, 'I' am not an individual in the world represented by W; the math problems are the individuals, and the possibility of there being many of them is already included.

(Is the fact that randomly generated simple math problems aren't sentient a problem in some way?)

Replies from: AlephNeil
comment by AlephNeil · 2011-04-17T21:07:36.318Z · LW(p) · GW(p)

In other words, 'I' am not an individual in the world represented by W; the math problems are the individuals, and the possibility of there being many of them is already included.

Then 'the observer' in your scenario doesn't correspond to anything that exists in the real world. After all, there is no epiphenomenal 'passenger' who chooses a person at random and watches events play out on the theatre of their mind.

Anthropic probabilities are meaningless without an epiphenomenal passenger. If p is "the probability of being person X" then what does "being person X" mean? Assuming X exists, the probability of X being X is 1. What about the probability of "me" being X? Well who am I? If I am X then the probability of me being X is 1. It's only if I consider myself to be an epiphenomenal passenger who might have ridden along with one of many different people that it makes sense to assign a value other than 0 or 1 to the probability of 'finding myself as X'.

To calculate anthropic probabilities requires some rules about how the passenger chooses who to 'ride on'. Yet it's impossible to state these rules without arbitrariness, in cases where there's no right way to count up observers and draw their boundaries. I think the whole idea of anthropic reasoning is untenable.

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T21:33:03.508Z · LW(p) · GW(p)

I basically agree. This particular case (and perhaps others, though I haven't checked) seems to be able to be formulated in non-anthropic terms, though. The observer not corresponding to anything in the real world shouldn't be a problem, I expect; a fair 6-sided die should have a 1/6 chance of showing 1 when rolled even if nobody's around to watch that happen.

Replies from: AlephNeil
comment by AlephNeil · 2011-04-17T21:59:49.603Z · LW(p) · GW(p)

What you've done is constructed an analogy that looks like this:

Generation of 10^(3W) math problems <---> Generation of 10^(3W) people

Funny set of rules A whereby an observer is assigned a problem <---> SSA

Funny set of rules B whereby an observer is assigned a problem <---> SIA

Probability that the observer is looking at problem X <---> Anthropic probability of being person X

But whereas "the probability that the observer is looking at problem X" depends on whether we arbitrarily choose rules A or B, the anthropic probability of being person X is supposed (by those who believe anthropic probabilities exist) to be a determinate matter. It's not supposed to be a mere convention that we choose SSA or SIA, it's supposed to be that one is 'correct' and the other 'wrong' (or both are wrong and something else is correct).

If we only consider non-anthropic problems then we can resolve everything satisfactorily by choosing 'rules' like A or B (and note that unless we add an observer and choose rules, there won't be any questions to resolve) but that won't tell us anything about SSA and SIA. (This is a clearer explanation than I gave in my first comment of what I think 'doesn't make sense' about your approach.)

Replies from: AdeleneDawner
comment by AdeleneDawner · 2011-04-17T22:36:31.232Z · LW(p) · GW(p)

It makes sense to look at it that way, yes.

I do think that something like A or B should be able to accurately be said to be true of the world, though.