Reply to Stuart on anthropics

post by cousin_it · 2011-01-31T07:06:39.673Z · LW · GW · Legacy · 31 comments

Contents

31 comments

You wake up in a hospital bed, remembering nothing of your past life. A stranger sits beside the bed, smiling. He says:

"I happen to know an amusing story about you. Many years ago, before you were born, your parents were arguing about how many kids to have. They settled on flipping a coin. If the coin came up heads, they would have one child. If it came up tails, they would have ten."

"I will tell you which way the coin came up in a minute. But first let's play a little game. Would you like a small piece of chocolate, or a big tasty cake? There's a catch though: if you choose the cake, you will only receive it if you're the only child of your parents."

Stuart Armstrong has proposed a solution to this problem (see the fourth model in his post). Namely, you switch to caring about the average that all kids receive in your branch. This doesn't change the utility all kids get in all possible worlds, but makes the problem amenable to UDT, which says all agents would have precommitted to choosing cake as long as it's better than two pieces of chocolate (the first model in Stuart's post).

But.

Creating two physically separate worlds with probability 50% should be decision-theoretically equivalent to creating them both with probability 100%. In other words, a correct solution should still work if the coin is quantum. In other words, the problem should be equivalent to creating 11 kids, offering them chocolate or cake, and giving cake only if you're the first kid. But would you really choose cake in this case, knowing that you could get the chocolate for certain? What if there were 1001 kids? This is a hard bullet to swallow, and it seems to suggest that Stuart's analysis of his first model may be incorrect.

I await comments from Stuart or anyone else who can figure this out.

31 comments

Comments sorted by top scores.

comment by CronoDAS · 2011-01-31T17:33:00.462Z · LW(p) · GW(p)

Random thought:

Even if "I" do have siblings, it would be really weird if they all happened to end up in this odd situation of having amnesia and being offered chocolate or cake. I'll ask for the cake.

Replies from: cousin_it
comment by cousin_it · 2011-01-31T17:50:15.028Z · LW(p) · GW(p)

Hah, thanks, you're right, I didn't think of that. Omega would need to simultaneously induce amnesia in all siblings. There goes my nice setup :-) Maybe we can invent some other plausible-sounding scenario to that effect?

Replies from: rhollerith_dot_com, ciphergoth
comment by RHollerith (rhollerith_dot_com) · 2011-02-01T17:51:56.129Z · LW(p) · GW(p)

Parent (by Cousin It) and grandparent (by Doug) are wrong.

If I have nine siblings about whom I know absolutely nothing except that they are people, intelligent agents or observers in our reference class and that they exist, it is a error in reasoning for me to assume (like Doug's argument does) that just because something unlikely happened to me, it happened or will happen to them, too.

That said, I do not know the solution to the original problem.

comment by Paul Crowley (ciphergoth) · 2011-02-01T08:47:00.474Z · LW(p) · GW(p)

"Greetings, citizen. I, the King of this land, decided to perform an experiment on anthropics. This morning I flipped a coin, resolving that if it landed tails, then ten of my citizens would be drugged and wake up here..."

Replies from: cousin_it
comment by cousin_it · 2011-02-01T09:58:59.631Z · LW(p) · GW(p)

If you pick observers randomly from some pool of fixed size instead of creating them, the problem becomes non-anthropic. An ordinary citizen before the experiment should precommit to choosing chocolate, because this precommitment gives the average citizen higher expected utility.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2011-02-01T12:26:48.569Z · LW(p) · GW(p)

In that case can we just resolve to track down and amnesify all ten siblings in the case where there are ten?

Replies from: cousin_it
comment by cousin_it · 2011-02-01T12:37:56.077Z · LW(p) · GW(p)

I don't understand... Maybe we're talking about different things? Could you explain again?

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2011-02-01T13:23:13.858Z · LW(p) · GW(p)

So the setup is as before, we flip a coin to decide whether to create one or ten children, but now we don't wait for an accident to make the chocolate/cake offer - on their thirtieth birthday, we track them all down and give them an amnesia drug.

Replies from: cousin_it
comment by cousin_it · 2011-02-01T14:36:51.941Z · LW(p) · GW(p)

This seems to give the same answer as the case with accidental amnesia, right?

comment by Stuart_Armstrong · 2011-02-01T10:45:49.151Z · LW(p) · GW(p)

I think Wei's has a good take on the nub of the problem.

Let us make everyone altruistic. Instead of "I will give you cake/chocolate", say instead "I will give your mother cake/chocolate if you all agree". If we stipulate that everyone here cares about their mother equally to themselves where treats are concerned, this should result in the same utility for everyone in the experiment (this is like the "averaging", but maybe easier to see).

Then here, my model says you should go for cake (as long as it's better than two chocolates). What is the equivalent model for 11 people? Well, here it would be "I will choose one random person among you. If that person chooses chocolate, I will give half a chocolate to your mother. If that person choose cake, I will give half a cake to your mother. If the remaining 10 people choose chocolate, I will give half a chocolate to your mother".

Then under a sensible division of responsibility or such, you should still choose cake.

However, if I gave you the 11 situation and then made your indexical preferences altruistic, it would be "if everyone chooses chocolate, your mother gets chocolate, and if everyone chooses cake, I will give 1/11 of a cake to your mother".

Something has happened here; it seems that the two models have different altruistic/average equivalents, despite feeling very similar. I'll have to think more.

comment by Wei Dai (Wei_Dai) · 2011-01-31T22:16:03.845Z · LW(p) · GW(p)

It seems that our intuitions about whether to add or average utilities between individuals depend on the "distance" between the individuals under consideration. The "farther apart" they are (from spatially nearby in the same quantum branch, to far apart in the same branch, to different branches, to different universes), the more our intuitions tend toward adding instead of averaging.

My post The Moral Status of Independent Identical Copies raised this issue:

Clearly, our intuition of identical copy immortality does not extend fully to quantum branches, and even less to other possible worlds, but we don't seem to have a theory of why that should be the case.

"Identical copy immortality" in retrospect is just a case of the averaging intuition. What to do about this situation? It seems to me there are at least three plausible resolutions we might end up with:

  1. We eventually come up with some sort of satisfactory explanation for when we should average, and when we should aggregate.
  2. Different people find different explanations satisfactory, and we split into various averagist and aggregationist camps.
  3. We accept these varying intuitions as essentially arbitrary, just part of the morality that evolution "gifted" us with.
comment by DanielLC · 2011-02-01T03:53:36.657Z · LW(p) · GW(p)

I'd go for the cake, because I think it's more than ten times better. You should do it with one piece of chocolate or three, or something like that. In that case, I'd go with the one.

comment by Stuart_Armstrong · 2011-01-31T15:29:06.241Z · LW(p) · GW(p)

This is getting interesting.

I'm going down some strange avenues here; I'll let you know if I find anything rewarding in them...

comment by Manfred · 2011-01-31T11:33:56.863Z · LW(p) · GW(p)

This is written a bit confusingly. Of course you'd care about the average over all the circumstances you might be in - that's what maximizing expected utility is. You say that Stuart does this in your paragraph 4. But in paragraph 6 you seem to say he doesn't - that if there were 1001 kids you wouldn't just say "is average(cake) better than average(chocolate)?"

I'm not totally sure what Stuart would do with this situation - and I don't know that he is either, but I'm even more confused now.

Replies from: cousin_it
comment by cousin_it · 2011-01-31T12:13:03.413Z · LW(p) · GW(p)

Of course you'd care about the average over all the circumstances you might be in - that's what maximizing expected utility is.

I don't understand what your statement means in the presence of indexical uncertainty, and anyway it doesn't seem to be a correct summary of Stuart's idea. He only takes the average within each branch where all kids get the same outcome anyway, not the average between branches. The averaging here is justified, not by referring to indexical probabilities (which we don't understand), but by noting that it preserves the utility obtained by each agent in each possible world. After that Stuart uses UDT (not another layer of averaging, which would be unjustified) to obtain the "correct" decision in the aggregate for all branches. Reread his solution to model 4 and then model 1 which it refers to - the reasoning is quite tricky.

Replies from: Manfred
comment by Manfred · 2011-02-01T02:54:43.873Z · LW(p) · GW(p)

Ah, sorry, I'm not very familiar with UDT. Avoiding indexical probabilities (they aren't that poorly understood) seems to be at the root of the unintuitive answer you cite, though: would you still give your world being world 1 a probability of 1/2 if there were 10000 kids in world 2?

Replies from: cousin_it
comment by cousin_it · 2011-02-01T10:02:38.505Z · LW(p) · GW(p)

What use is my intuitive answer to you? We're trying to figure out what the correct answer should be, not holding an opinion poll.

Replies from: Manfred
comment by Manfred · 2011-02-01T10:19:47.350Z · LW(p) · GW(p)

What use is intuition to anyone? Since you mentioned your intuitive reaction to the UDT answer in your post, your intuition seems to have at least a little credibility.

I could talk about the logical reasons to assign greater chance of being in circumstances with more people, if you'd prefer.

Replies from: cousin_it
comment by cousin_it · 2011-02-01T10:39:42.620Z · LW(p) · GW(p)

I think that would be counterproductive too, unless you have already reviewed the previous discussions of anthropics on LW and are certain that your argument is new.

We don't understand much about indexicals right now, but we do understand that questions about subjective probabilities should be reformulated as decision problems, otherwise you get confused on easy problems like Sleeping Beauty and never get to the challenging stuff. The reason: we all agree now that decision problems ask which precommitment would be viewed by the agent as most beneficial (Eliezer's intuition behind TDT), and precommitments take place before you look at any evidence, indexical or otherwise.

Do you feel that talking about probabilities instead of decisions brings new clarity to our problem? Then explain.

ETA: maybe rwallace's post The I-Less Eye will be helpful to you.

Replies from: Manfred
comment by Manfred · 2011-02-01T11:13:00.670Z · LW(p) · GW(p)

Well, both the numbers and the logical steps required are identical in this case, no matter whether you think about it as decision theory or probability. The reason Sleeping Beauty is tricky is that assigning probabilities the obvious way leads to breaking normalization, but that's not a problem in simple cases like this.

Either way, you should still use the fact that you have an equal chance of being any person whose circumstances fit yours, lacking any other information (this can be shown to be the correct prior, following from the principle of indifference, which says that if the information in a case is identical, the probability should be too).

Replies from: cousin_it, cousin_it
comment by cousin_it · 2011-02-01T12:48:55.948Z · LW(p) · GW(p)

Sorry, what's a "correct prior"?

Also, this post might change your mind about the merits of subjective probabilities vs decision problems, even if rwallace's post failed to.

Replies from: Manfred
comment by Manfred · 2011-02-01T16:07:06.467Z · LW(p) · GW(p)

Correct as in uniquely fulfills the desiderata of probability theory, on which the whole thing can be based. Ooh, found a link, I didn't know that was online. Particularly important for these purposes is the principle that says that states with identical information should be assigned identical probabilities. You just know that you are one of the people in the problem, which breaks the symmetry between the two coin-flip outcomes (since there are different numbers of people depending on the outcome), but it creates a symmetry between all the states specified by "you are one of the people in this problem."

It's not that I'm saying it's wrong to approach this as a decision problem. Just that ordinary probability applies fine in this case. If you get a different result with a decision theory than with bayesian probability, though, that is bad. Bad in the sense of provably worse by the measure of expected utility, unless circumstances are very extreme.

Replies from: cousin_it
comment by cousin_it · 2011-02-01T16:53:57.390Z · LW(p) · GW(p)

We're still talking past each other, I'm afraid.

What's "expected utility" in situations with indexical uncertainty? If you take the "expectation" according to an equal weighting of all indistinguishable observer-moments, isn't your reasoning circular?

Also I'm interested in hearing your response to rwallace's scenario, which seems to show that assigning equal probabilities to indistinguishable observer-moments leads to time-inconsistency.

Replies from: Manfred, Manfred
comment by Manfred · 2011-02-03T14:29:16.056Z · LW(p) · GW(p)

Ah, wait: the probability in rwallace's post was 1/99 all along. This is because each time you do the "clone me" operation, it's not like it automatically halves the probability; it only works that way for a very certain set of evidence. When you are careful and specify which evidence is available, the apparent problem is resolved.

Replies from: cousin_it
comment by cousin_it · 2011-02-03T17:02:42.998Z · LW(p) · GW(p)

I'm afraid I can't take your word for that, please show me the calculations.

Replies from: Manfred
comment by Manfred · 2011-02-03T18:11:38.155Z · LW(p) · GW(p)

Hmm, actually I might be making the wrong correction since this would contradict the rule that P(AB) = P(A)*P(B|A). But my plan was to specify the "anthropic evidence" (memories, body, etc.) as being exactly the same stuff that makes you "you" at the start of the process, then clarifying the question as P(original | anthropic evidence).

Upon reflection, this is very shaky, but still possibly correct. I'll try and formalize the change to the product rule and see what it says about Sleeping Beauty, which has a relatively known answer.

Replies from: cousin_it
comment by cousin_it · 2011-02-04T10:53:26.820Z · LW(p) · GW(p)

This still doesn't look like the calculations that I asked for... ?

Replies from: Manfred
comment by Manfred · 2011-02-04T20:58:41.321Z · LW(p) · GW(p)

Oh, sorry; the calculations are trivial. It's the parts that aren't math that are the problem.

Take a person imagining whether to get copied 98 times. He wants to know "after I get out of the machine but before I get told if I'm the original, what is the probability that I am the orginial?" There are two different "edge cases."

1) If during the copying process you generate no new evidence, i.e. all your copies get near-identical memories- if that's the case, then the principle of indifference applies with overwhelming obviousness. You have nothing with which to differentiate yourself, so you must assign an equal probability, so the probability that you're the original is 1/99.

2) You have to go in in 98 different sessions to get copied, thus generating extra evidence (memories) in between each time. Here, the only place you can apply the principle of indifference is within each session, so you have a probability of 1/2 of being the original after every session. The product rule then says that since P(AB) = P(A)*P(B|A), your final probability of being the original is (1/2)^98. But this feels odd because at no time during the process could this probability be realized - when waking up you always have a probability estimate of 1/2 - rendering this immune from the sort of betting game you might play in e.g. the Sleeping Beauty problem.

With these two cases in mind, we can consider a copying scheme with the causal structure of (2) but the evidence of (1) (copied in series, but no distinguishing memories). Logic would say that the evidence wins, since the evidence usually wins. But either the product rule stops working or the probability P(B|A) changes in an unusual way that must exactly parallel what the evidence requires - this provides a license for just using the evidence and not using the product rule in these sorts of cases, but it would be interesting to see whether it's possible to save the product rule.

Saving the product rule would require changing the P(B|A) in P(AB)=P(A)*P(B|A). If we just look at the case where you're copied 2 times in a row with identical memories, A = original the first time and B = original the second time. One woudl expect that P(A) = 1/2. But P(AB)=1/3, so P(B|A) must equal 3/4. For 98 copies the last conditional probability would be 97/98! This is really weird. So I'm still thinking about it.

Replies from: cousin_it
comment by cousin_it · 2011-02-05T18:18:57.119Z · LW(p) · GW(p)

Hey, I just had this idea. To get the required values of P(B|A) you may try to consider the possibility that the "subjective continuation" of a copy (which was made from the original) can jump into another copy (also made from the original, not from the first copy). There seems to be no apriori reason why that shouldn't happen, if the information states of the copies are equivalent. Why focus on the physical continuity within the copying process anyway? Information is all that matters.

This way you get to keep the illusion that you "are" one specific copy at all times, rather the set of all identical information states at once (my preferred point of view up till now). I wonder if some other thought experiment could break that illusion more conclusively.

comment by Manfred · 2011-02-02T03:33:38.822Z · LW(p) · GW(p)

Hm, I guess it is circular. Dang. The question is really "is bayesian probability correct in general?"

Do you mean rwallace's scenario with the copies? The probabilities seem correct, though since there are multiple copies, normalization might be broken (or artificially enforced) somewhere I didn't notice, like in Sleeping Beauty. I'm a bit unsure. What is clear is that there isn't actually a discontinuity at short times - since the probability comes form the evidence of your memories, not how it "really happened."

EDIT: There does appear to be an inconsistency when making serial copies - computing it different ways gives different answers. Freaking normalization.

comment by cousin_it · 2011-02-01T12:45:32.216Z · LW(p) · GW(p)

if the information in a case is identical, the probability should be too

So if I have a biased coin that comes up heads 90% of the time, flip it and hide the result from you, your credence for heads should be 50%? Nah. I'm willing to believe in the principle of indifference only if the cases are completely identical, not just observably identical. In our problem the cases aren't completely identical - you may be the sole child or one of 10 children, and these correspond to very different and non-symmetrical states of the world.

It seems to me that you're confused. Sleeping Beauty isn't tricky at all, but our problem is. If rwallace's post didn't work for you, try this.