# Outlawing Anthropics: An Updateless Dilemma

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-08T18:31:49.270Z · LW · GW · Legacy · 208 commentsLet us start with a (non-quantum) logical coinflip - say, look at the heretofore-unknown-to-us-personally 256th binary digit of pi, where the choice of binary digit is itself intended not to be random.

If the result of this logical coinflip is 1 (aka "heads"), we'll create 18 of you in green rooms and 2 of you in red rooms, and if the result is "tails" (0), we'll create 2 of you in green rooms and 18 of you in red rooms.

After going to sleep at the start of the experiment, you wake up in a green room.

With what degree of credence do you believe - what is your posterior probability - that the logical coin came up "heads"?

There are exactly two tenable answers that I can see, "50%" and "90%".

Suppose you reply 90%.

And suppose you also happen to be "altruistic" enough to care about what happens to all the copies of yourself. (If your current system cares about yourself and your future, but doesn't care about very similar xerox-siblings, then you will tend to self-modify to have *future* copies of yourself care about each other, as this maximizes your *expectation *of pleasant experience over *future* selves.)

Then I attempt to force a reflective inconsistency in your decision system, as follows:

I inform you that, after I look at the unknown binary digit of pi, I will ask all the copies of you in green rooms whether to pay $1 to every version of you in a green room and steal $3 from every version of you in a red room. If they all reply "Yes", I will do so.

(It will be understood, of course, that $1 represents 1 utilon, with actual monetary amounts rescaled as necessary to make this happen. Very little rescaling should be necessary.)

(Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.)

Suppose that you wake up in a green room. You reason, "With 90% probability, there are 18 of me in green rooms and 2 of me in red rooms; with 10% probability, there are 2 of me in green rooms and 18 of me in red rooms. Since I'm altruistic enough to at least care about my xerox-siblings, I calculate the expected utility of replying 'Yes' as (90% * ((18 * +$1) + (2 * -$3))) + (10% * ((18 * -$3) + (2 * +$1))) = +$5.60." You reply yes.

However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions.

This is a dynamic inconsistency - different answers at different times - which argues that decision systems which update on anthropic evidence will self-modify not to update probabilities on anthropic evidence.

I originally thought, on first formulating this problem, that it had to do with *double-counting *the *utilons *gained by your variable numbers of green friends, and the *probability *of being one of your green friends.

However, the problem also works if we care about paperclips. No selfishness, no altruism, just paperclips.

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90% * +1 paperclip) + (10% * -3 paperclips)) = +0.6 paperclips.

This argues that, in general, decision systems - whether they start out selfish, or start out caring about paperclips - will not want their future versions to update on anthropic "evidence".

Well, that's not too disturbing, is it? I mean, the whole anthropic thing seemed very confused to begin with - full of notions about "consciousness" and "reality" and "identity" and "reference classes" and other poorly defined terms. Just throw out anthropic reasoning, and you won't have to bother.

When I explained this problem to Marcello, he said, "Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning", which is a fascinating sort of reply. And I responded, "But when you have a problem this confusing, and you find yourself wanting to build an AI that just doesn't use anthropic reasoning to begin with, maybe that implies that the correct resolution involves *us* not using anthropic reasoning either."

So we can just throw out anthropic reasoning, and relax, and conclude that we are Boltzmann brains. QED.

In general, I find the sort of argument given here - that a certain type of decision system is not reflectively consistent - to be pretty damned compelling. But I also find the Boltzmann conclusion to be, ahem, more than ordinarily unpalatable.

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

EDIT: On further reflection, I also wouldn't want to build an AI that concluded it was a Boltzmann brain! Is there a form of inference which rejects this conclusion without relying on any reasoning about subjectivity?

EDIT2: Psy-Kosh has converted this into a non-anthropic problem!

## 208 comments

Comments sorted by top scores.

## comment by Psy-Kosh · 2009-09-09T15:37:59.986Z · LW(p) · GW(p)

Actually... how is this an anthropic situation *AT ALL*?

I mean, wouldn't it be equivalent to, say, gather 20 rational people (That understand PD, etc etc etc, and can certainly manage to agree to coordinate with each other) that are allowed to meet with each other in advance and discuss the situation...

I show up and tell them that I have two buckets of marbles, some of which are green, some of which are red

One bucket has 18 green and 2 red, and the other bucket has 18 red and 2 green.

I will (already have) flipped a logical coin. Depending on the outcome, I will use either one bucket or the other.

After having an opportunity to discuss strategy, they will be allowed to reach into the bucket without looking, pull out a marble, look at it, then, if it's green choose if to pay and steal, etc etc etc. (in case it's not obvious, the payout rules being equivalent to the OP)

As near as I can determine, this situation is entirely equivalent to the OP and is in no way an anthropic one. If the OP actually is an argument against anthropic updates in the presence of logical uncertainty... then it's actually an argument against the general case of Bayesian updating in the presence of logical uncertainty, even when there's no anthropic stuff going on at all!

EDIT: oh, in case it's not obvious, marbles are *not* replaced after being drawn from the bucket.

## ↑ comment by Vladimir_Nesov · 2009-09-14T07:52:40.628Z · LW(p) · GW(p)

Right, and this is a perspective very close to intuition for UDT: you consider different instances of yourself at different times as separate decision-makers that all share the common agenda ("global strategy"), coordinated "off-stage", and implement it without change depending on circumstances they encounter in each particular situation. The "off-stageness" of coordination is more naturally described by TDT, which allows considering *different* agents as UDT-instances of the same strategy, but the precise way in which it happens remains magic.

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-14T18:09:45.043Z · LW(p) · GW(p)

Nesov, the reason why I regard Dai's formulation of UDT as such a significant improvement over your own is that it does not require offstage coordination. Offstage coordination requires a base theory and a privileged vantage point and, as you say, magic.

Replies from: Vladimir_Nesov## ↑ comment by Vladimir_Nesov · 2009-09-20T14:33:19.364Z · LW(p) · GW(p)

Nesov, the reason why I regard Dai's formulation of UDT as such a significant improvement over your own is that it does not require offstage coordination. Offstage coordination requires a base theory and a privileged vantage point and, as you say, magic.

I still don't understand this emphasis. Here I sketched in what sense I mean the global solution -- it's more about definition of preference than the actual computations and actions that the agents make (locally). There is an abstract concept of global strategy that can be characterized as being "offstage", but there is no offstage computation or offstage coordination, and in general complete computation of global strategy isn't performed even locally -- only approximations, often approximations that make it impossible to implement the globally best solution.

In the above comment, by "magic" I referred to exact mechanism that says in what way and to what extent *different* agents are running the same algorithm, which is more in the domain of TDT, UDT generally not talking about separate agents, only different possible states of the same agent. Which is why neither concept solves the bargaining problem: it's out of UDT's domain, and TDT takes the relevant pieces of the puzzle as given, in its causal graphs.

For further disambiguation, see for example this comment you made:

We're taking apart your "mathematical intuition" into something that invents a causal graph (this part is still magic) and a part that updates a causal graph "given that your output is Y" (Pearl says how to do this).

## ↑ comment by Vladimir_Nesov · 2009-09-09T17:04:42.673Z · LW(p) · GW(p)

That uncertainty is logical seems to be irrelevant here.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-09T17:11:13.300Z · LW(p) · GW(p)

Agreed. But I seem to recall seeing some comments about distinguishing between quantum and logical uncertainty, etc etc, so figured may as well say that it at least is equivalent given that it's the same type of uncertainty as in the original problem and so on...

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T18:18:33.248Z · LW(p) · GW(p)

Again, if we *randomly* selected someone to ask, rather than having *specified in advance* that we're going to make the decision depend on the unanimous response of all people in green rooms, then there would be no paradox. What you're talking about here, pulling out a random marble, is the equivalent of asking a random single person from either green or red rooms. But this is not what we're doing!

## ↑ comment by Psy-Kosh · 2009-09-09T18:40:49.769Z · LW(p) · GW(p)

Either I'm misunderstanding something, or I wasn't clear.

To make it explicit: *EVERYONE* who gets a green marble gets asked, and the outcome depends their consent being unanimous, just like everyone who wakes up in a green room gets asked. ie, all twenty rationalists draw a marble from the bucket, so that by the end, the bucket is empty.

Everyone who got a green marble gets asked for their decision, and the final outcome depends on all the answers. The bit about them drawing marbles individually is just to keep them from seeing what marbles the others got or being able to talk to each other once the marble drawing starts.

Unless I completely failed to comprehend some aspect of what's going on here, this is effectively equivalent to the problem you described.

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T19:29:17.747Z · LW(p) · GW(p)

Oh, okay, that wasn't clear actually. (Because I'm used to "they" being a genderless singular pronoun.) In that case these problems do indeed look equivalent.

Hm. Hm hm hm. I shall have to think about this. It is a an extremely good point. The more so as anyone who draws a green marble should indeed be assigning a 90% probability to there being a mostly-green bucket.

Replies from: Psy-Kosh, None## ↑ comment by Psy-Kosh · 2009-09-09T19:44:58.321Z · LW(p) · GW(p)

Sorry about the unclarity then. I probably should have explicitly stated a step by step "marble game procedure".

My personal suggestion if you want an "anthropic reasoning is confooozing" situation would be the whole anthropic updating vs aumann agreement thing, since the disagreement would seem to be predictable in advance, and everyone involved would appear to be able to be expected to agree that the disagreement is right and proper. (ie, mad scientist sets up a quantum suicide experiment. Test subject survives. Test subject seems to have Bayesian evidence in favor of MWI vs single world, external observer mad scientist who sees the test subject/victim survive would seem to not have any particular new evidence favoring MWI over single world)

(Yes, I know I've brought up that subject several times, but it does seem, to me, to be a rather more blatant "something funny is going on here")

(EDIT: okay, I guess this would count as quantum murder rather than quantum suicide, but you know what I mean.)

Replies from: byrnema## ↑ comment by byrnema · 2009-09-10T02:48:03.437Z · LW(p) · GW(p)

I don't see how being assigned a green or red *room* is "anthropic" while being assigned a green or red *marble* is not anthropic.

I thought the anthropic part came from updating on your own individual experience in the absence of observing what observations others are making.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-10T03:16:30.545Z · LW(p) · GW(p)

The difference wasn't marble vs room but "copies of one being, so number of beings changed" vs "just gather 20 rationalists..."

But my whole point was "the original wasn't really an anthropic situation, let me construct this alternate yet equivalent version to make that clear"

Replies from: CarlShulman, byrnema, byrnema## ↑ comment by CarlShulman · 2009-09-10T06:05:59.073Z · LW(p) · GW(p)

Do you think that the Sleeping Beauty problem is an anthropic one?

Replies from: Psy-Kosh## ↑ comment by byrnema · 2009-09-10T04:03:25.628Z · LW(p) · GW(p)

I see. I had always thought of the problem as involving 20 (or sometimes 40) different people. The reason for this is that I am an intuitive rather than literal reader, and when Eliezer mentioned stuff about copies of me, I just interpreted this as meaning to emphasize that each person has their own independent 'subjective reality'. Really only meaning that each person doesn't share observations with the others.

So all along, I thought this problem was about challenging the soundness of updating on a single independent observation involving yourself as though you are some kind of special reference frame.

... therefore, I don't think you took *this element* out, but I'm glad you are resolving the meaning of "anthropic" because there are probably quite a few different "subjective realities" circulating about what the essence of this problem is.

## ↑ comment by Psy-Kosh · 2009-09-11T06:05:01.977Z · LW(p) · GW(p)

Sorry for delay.

Copies as in "upload your mind. then run 20 copies of the uploaded mind".

And yes, I know there's still tricky bits left in the problem, I merely established that those tricky bits didn't derive from effects like mind copying or quantum suicide or anything like that and could instead show up in ordinary simple stuff, with no need to appeal to anthropic principles to produce the confusion. (sorry if that came out babbly, am getting tired)

## ↑ comment by **[deleted]** ·
2009-09-10T18:58:40.960Z · LW(p) · GW(p)

anyone who draws a green marble should indeed be assigning a 90% probability to there being a mostly-green bucket.

I don't think so. I think the answer to both these problems is that if you update correctly, you get 0.5.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-11T05:59:07.796Z · LW(p) · GW(p)

*blinks* mind expanding on that?

P(green|mostly green bucket) = 18/20

P(green|mostly red bucket) = 2/20

likelihood ratio = 9

if one started with no particular expectation of it being one bucket vs the other, ie, assigned 1:1 odds, then after updating upon seeing a green marble, one ought assign 9:1 odds, ie, probability 9/10, right?

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-11T07:59:31.811Z · LW(p) · GW(p)

I guess that does need a lot of explaining.

I would say:

P(green|mostly green bucket) = 1

P(green|mostly red bucket) = 1

P(green) = 1

because P(green) is not the probability that you will get a green marble, it's the probability that someone will get a green marble. From the perspective of the priors, all the marbles are drawn, and no one draw is different from any other. If you don't draw a green marble, you're discarded and the people who did get a green vote. For the purposes of figuring out the priors for a group strategy, your draw being green is not an event.

Of course, you know that you've drawn green. But the only thing you can translate it into that has a prior is "someone got green."

That probably sounds contrived. Maybe it is. But consider a slightly different example:

- Two marbles and two people instead of twenty.
- One marble is green, the other will be red or green based on a coin flip (green on heads, red on tails).

I like this example because it combines the two conflicting intuitions in the same problem. Only a fool would draw a red marble and remain uncertain about the coin flip. But someone who draws a green marble is in a situation similar to the twenty marble scenario.

If you were to plan ahead of time how the greens should vote, you would tell them to assume 50%. But a person holding a green marble might think it's 2/3 in favor of double green.

To avoid embarrassing paradoxes, you can base everything on the four events "heads," "tails," "someone gets green," and "someone gets red." Update as normal.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-11T08:10:31.213Z · LW(p) · GW(p)

yes, the probability that *someone* will get a green marble is rather different than the probability that I, personally, will get a green marble. But if I do personally get a green marble, that's evidence in favor of green bucket.

The decision algorithm for how to respond to that though in this case is skewed due to the rules for the payout.

And in your example, if I drew green, I'd consider the 2/3 probability the correct one for whoever drew green.

Now, if there's a payout scheme involved with funny business, that may alter some decisions, but not magically change my epistemology.

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-11T08:30:55.910Z · LW(p) · GW(p)

What kind of funny business?

Replies from: wedrifid## ↑ comment by wedrifid · 2009-09-11T09:26:52.110Z · LW(p) · GW(p)

Let's just say that you don't draw blue.

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-11T16:05:09.375Z · LW(p) · GW(p)

OK, but I think Psy-Kosh was talking about something to do with the payoffs. I'm just not sure if he means the voting or the dollar amounts or what.

Replies from: Psy-Kosh, wedrifid## ↑ comment by Psy-Kosh · 2009-09-13T18:34:14.483Z · LW(p) · GW(p)

Sorry for delay. And yeah, I meant stuff like "only greens get to decide, and the decision needs to be unanimous" and so on

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-13T19:23:38.552Z · LW(p) · GW(p)

I agree that changes the answer. I was assuming a scheme like that in my two marble example. In a more typical situation, I would also say 2/3.

To me, it's not a drastic (or magical) change, just getting a different answer to a different question.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-13T19:28:50.582Z · LW(p) · GW(p)

Um... okay... I'm not sure what we're disagreeing about here, if anything:

my position is "given that I found myself with a green marble, it is right and proper for me to assign a 2/3 probability to both being green. However, the correct choice to make, given the pecuiluarities of this specific problem, may require one to make a decision that seems, on the surface, as if one didn't update like that at all."

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-13T19:38:57.700Z · LW(p) · GW(p)

Well, we might be saying the same thing but coming from different points of view about what it means. I'm not actually a bayesian, so when I talk about assigning probabilities and updating them, I just mean doing equations.

What I'm saying here is that you should set up the equations in a way that reflects the group's point of view because you're telling the group what to do. That involves plugging some probabilities of one into Bayes' Law and getting a final answer equal to one of the starting numbers.

## ↑ comment by Christian_Szegedy · 2009-09-09T18:13:02.269Z · LW(p) · GW(p)

Very enlightening!

It just shows that the OP was an overcomplicated example generating confusion about the update.

[EDIT] Deleted rest of the comment due to revised opinion here: http://lesswrong.com/lw/17c/outlawing_anthropics_an_updateless_dilemma/13hk

## ↑ comment by SilasBarta · 2009-09-09T15:56:04.344Z · LW(p) · GW(p)

Good point. After thinking about this for a while, I feel comfortable simultaneously holding these views:

1) You shouldn't do anthropic updates. (i.e. update on the fact that you exist)

2) The example posed in the top-level post is not an example of anthropic reasoning, but reasoning on specific givens and observations, as are most supposed examples of anthropic reasoning.

3) Any evidence arising from the fact that you exist is implicitly contained by your observations by virtue of *their* existence.

Wikipedia gives one example of a productive use of the anthropic principle, but it appears to be reasoning based on observations of the *type* of life-form we are, as well as other hard-won biochemical knowledge, well above and beyond the observation that we exist.

## ↑ comment by Psy-Kosh · 2009-09-09T16:18:41.877Z · LW(p) · GW(p)

Thanks.

I don't *THINK* I agree with your point 1. ie, I favor saying yes to anthropic updates, but I admit that there's definitely confusing issues here.

Mind expanding on point 3? I think I get what you're saying, but in general we filter out that part our observations, that is, the fact that observations are occurring at all, Getting that back is the point of anthropic updating. Actually... IIRC, Nick Bostrom's way of talking about anthropic updates more or less is exactly your point 3 in reverse... ie, near as I can determine and recall, his position explicitly advocates talking about the significance that observations are occurring at all as part of the usual update based on observation. Maybe I'm misremembering though.

Also, separating it out into a single anthropic update and then treating all observations as conditional on your existence or such helps avoid double counting that aspect, right?

Also, here's another physics example, a bit more recent that was discussed on OB a while back.

Replies from: SilasBarta## ↑ comment by SilasBarta · 2009-09-09T17:29:02.401Z · LW(p) · GW(p)

Reading the link, the second paper's abstract, and most of Scott Aaronson's post, it looks to me like they're not using anthropic reasoning at all. Robin Hanson summarizes their "entropic principle" (and the abstract and all discussion agree with his summary) as

since observers need entropy gains to function physically, we can estimate the probability that any small spacetime volume contains an observer to be proportional to the entropy gain in that volume.

The problem is that "observer" is not the same as "anthrop-" (human). This principle is just a subtle restatement of either a tautology or known physical law. Because it's not that "observers need entropy gains". Rather, observation *is* entropy gain. To observe something is to increase one's mutual information with it. But since phase space is conserved, all gains in mutual information must be offset by an increase in entropy.

But since "observers" are simply anything that forms mutual information with something else, it doesn't mean a conscious observer, let alone a human one. For that, you'd need to go beyond P(entropy gain|observer) to P(consciousness|entropy gain).

(I'm a bit distressed no one else made this point.)

Now, this idea could lead to an insight if you endorsed some neo-animistic view that consciousness is proportional to normalized rate of mutual information increase, and so humans are (as) conscious (as we are) because we're above some threshold ... but again, you'd be using nothing from your existence as such.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-09T18:53:57.293Z · LW(p) · GW(p)

The argument was "higher rate of entropy production is correlated with more observers, probably. So we should expect to find ourselves in chunks of reality that have high rates of entropy production"

I guess it wasn't just observers, but (non reversible) computations

ie, anthropic reasoning was the justification for using the entropy production criteria in the first place. Yes, there is a question of fractions of observers that are conscious, etc... but a universe that can't support much in the way of observers at all probably can't support much in the way of conscious observers, while a universe that can support lots of observers can probably support more conscious observers than the other, right?

Or did I misunderstand your point?

Replies from: SilasBarta## ↑ comment by SilasBarta · 2009-09-09T19:13:10.460Z · LW(p) · GW(p)

Now I'm not understanding how your response applies.

My point was: the entropic principle estimates the probability of observers per unit volume by using the entropy per unit volume. But this follows immediately from the second law and conservation of phase space; it's necessarily true.

To the extent that it assigns a probability to a class that includes us, it does a poor job, because we make up a tiny fraction of the "observers" (*appropriately* defined) in the universe.

## ↑ comment by Nubulous · 2009-09-09T22:10:29.462Z · LW(p) · GW(p)

The situation is not identical in the non-anthropic case in that there are equal numbers of rooms but differing numbers of marbles.

There's only one green room (so observing it is evidence for heads-green with p=0.5) whereas there are 18 green marbles, so p(heads|green)= ((18/20)/0.5)*0.5 = 0.9.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-11T05:52:53.627Z · LW(p) · GW(p)

Sorry for delayed response.

Anyways, how so? 20 rooms in the original problem, 20 marbles in mine.

what fraction are green vs red derives from examining a logical coin, etc etc etc... I'm not sure where you're getting the only one green room thing.

## comment by Wei_Dai · 2009-09-09T09:39:56.678Z · LW(p) · GW(p)

An AI that runs UDT wouldn't conclude that it was a Boltzmann or non-Boltzmann brain. For such an AI, the statement has no meaning, since it's always *both*. The closest equivalent would be "Most of the value I can create by making the right decision is concentrated in the vicinity of non-Boltzmann brains."

BTW, does my indexical uncertainty and the Axiom of Independence post make any more sense now?

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-09T18:26:37.420Z · LW(p) · GW(p)

This was my take after going through a similar analysis (with apples, not paperclips) at the SIAI summer intern program.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-09T19:39:02.965Z · LW(p) · GW(p)

It seems promising that several people are converging on the same "updateless" idea. But sometimes I wonder why it took so long, if it's really the right idea, given the amount of brainpower spent on this issue. (Take a look at http://www.anthropic-principle.com/profiles.html and consider that Nick Bostrom wrote "Investigations into the Doomsday Argument" in 1996 and then did his whole Ph.D. on anthropic reasoning, culminating in a book published in 2002.)

BTW, weren't the SIAI summer interns supposed to try to write one LessWrong post a week (or was it a month)? What happened to that plan?

Replies from: Eliezer_Yudkowsky, CarlShulman## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T20:46:37.159Z · LW(p) · GW(p)

But sometimes I wonder why it took so long, if it's really the right idea, given the amount of brainpower spent on this issue.

People are crazy, the world is mad. Also inventing basic math is a hell of a lot harder than reading it in a textbook afterward.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-09T21:59:29.169Z · LW(p) · GW(p)

People are crazy, the world is mad.

I suppose you're referring to the fact that we are "designed" by evolution. But why did evolution create a species that invented the number field sieve (to give a random piece of *non-basic* math) before UDT? It doesn't make any sense.

Also inventing basic math is a hell of a lot harder than reading it in a textbook afterward.

In what sense is it "hard"? I don't think it's hard in a computational sense, like NP-hard. Or is it? I guess it goes back to the question of "what algorithm are we using to solve these types of problems?"

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-10T08:14:48.231Z · LW(p) · GW(p)

No, I'm referring to the fact that people are crazy and the world is mad. You don't need to reach so hard for an explanation of why no one's invented UDT yet when many-worlds wasn't invented for thirty years.

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-16T02:57:56.809Z · LW(p) · GW(p)

I also don't think general madness is enough of an explanation. Both are counterintuitive ideas in areas without well-established methods to verify progress, e.g. building a working machine or standard mathematical proof techniques.

## ↑ comment by CarlShulman · 2009-09-09T20:20:24.478Z · LW(p) · GW(p)

The OB/LW/SL4/TOElist/polymathlist group is one intellectual community drawing on similar prior work that hasn't been broadly disseminated.

The same arguments apply with much greater force to the the causal decision theory vs evidential decision theory debate.

The interns wound up more focused on their group projects. As it happens, I had told Katja Grace that I was going to write up a post showing the difference between UDT and SIA (using my apples example which is isomorphic with the example above), but in light of this post it seems needless.

Replies from: Vladimir_Nesov, Wei_Dai, KatjaGrace## ↑ comment by Vladimir_Nesov · 2009-09-10T12:35:28.859Z · LW(p) · GW(p)

UDT is basically the bare definition of reflective consistency: it is a non-solution, just statement of the problem in constructive form. UDT says that you should think exactly the same way as the "original" you thinks, which guarantees that the original you won't be disappointed in your decisions (reflective consistency). It only looks good in comparison to other theories that fail *this particular requirement*, but otherwise are much more meaningful in their domains of application.

TDT fails reflective consistency in general, but offers a correct solution in a domain that is larger than those of other practically useful decision theories, while retaining their expressivity/efficiency (i.e. updating on graphical models).

## ↑ comment by Wei_Dai · 2009-09-09T21:27:46.672Z · LW(p) · GW(p)

The OB/LW/SL4/TOElist/polymathlist group is one intellectual community drawing on similar prior work that hasn't been broadly disseminated.

What prior work are you referring to, that hasn't been broadly disseminated?

The same arguments apply with much greater force to the the causal decision theory vs evidential decision theory debate.

I think much less brainpower has been spent on CDT vs EDT, since that's thought of as more of a technical issue that only professional decision theorists are interested in. Likewise, Newcomb's problem is usually seen as an intellectual curiosity of little practical use. (At least that's what I thought until I saw Eliezer's posts about the potential link between it and AI cooperation.)

Anthropic reasoning, on the other hand, is widely known and discussed (I remember the Doomsday Argument brought up during a casual lunch-time conversation at Microsoft), and thought to be both interesting in itself and having important applications in physics.

The interns wound up more focused on their group projects.

I miss the articles they would have written. :) Maybe post the topic ideas here and let others have a shot at them?

Replies from: CarlShulman, CarlShulman## ↑ comment by CarlShulman · 2009-09-13T04:28:08.769Z · LW(p) · GW(p)

"What prior work are you referring to, that hasn't been broadly disseminated?"

I'm thinking of the corpus of past posts on those lists, which bring certain tools and concepts (Solomonoff Induction, anthropic reasoning, Pearl, etc) jointly to readers' attention. When those tools are combined and focused on the same problem, different forum participants will tend to use them in similar ways.

## ↑ comment by CarlShulman · 2009-09-09T23:51:33.527Z · LW(p) · GW(p)

You might think that more top-notch economists and game theorists would have addressed Newcomb/TDT/Hofstadter superrationality given their interest in the Prisoner's Dilemma.

Looking at the actual literature on the Doomsday argument, there are some physicists involved (just as some economists and others have tried their hands at Newcomb), but it seems like more philosophers. And anthropics doesn't seem core to professional success, e.g. Tegmark can indulge in it a bit thanks to showing his stuff in 'hard' areas of cosmology.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-10T07:15:07.472Z · LW(p) · GW(p)

I just realized/remembered that one reason that others haven't found the TDT/UDT solutions to Newcomb/anthropic reasoning may be that they were assuming a fixed human nature, whereas we're assuming an AI capable of self-modification. For example, economists are certainly more interested in answering "What would human beings do in PD?" than "What should AIs do in PD assuming they know each others' source code?" And perhaps some of the anthropic thinkers (in the list I linked to earlier) did invent something like UDT, but then thought "Human beings can never practice this, I need to keep looking."

## ↑ comment by KatjaGrace · 2009-09-14T04:18:42.265Z · LW(p) · GW(p)

This post is an argument against voting on your updated probability when there is a selection effect such as this. It applies to any evidence (marbles, existence etc), but only in a specific situation, so has little to do with SIA, which is about whether you update on your own existence to begin with in any situation. Do you have arguments against that?

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-14T13:12:27.709Z · LW(p) · GW(p)

It's for situations in which different hypotheses all predict that there will be beings subjectively indistinguishable from you, which covers the most interesting anthropic problems in my view. I'll make some posts distinguishing SIA, SSA, UDT, and exploring their relationships when I'm a bit less busy.

Replies from: KatjaGrace## ↑ comment by KatjaGrace · 2009-09-15T05:04:26.642Z · LW(p) · GW(p)

Are you saying *this problem* arises in all situations where multiple beings in multiple hypotheses make the same observations? That would suggest we can't update on evidence most of the time. I think I must be misunderstanding you. Subjectively indistinguishable beings arise in virtually all probabilistic reasoning. If there were only one hypothesis with one creature like you, then all would be certain.

The only interesting problem in anthropics I know of is whether to update on your own existence or not. I haven't heard a good argument for not (though I still have a few promising papers to read), so I am very interested if you have one. Will 'exploring their relationships' include this?

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-15T13:32:21.935Z · LW(p) · GW(p)

You can judge for yourself at the time.

## comment by pengvado · 2009-09-08T21:09:06.918Z · LW(p) · GW(p)

Well, we don't want to build conscious AIs, so of course we don't want them to use anthropic reasoning.

Why is anthropic reasoning related to consciousness at all? Couldn't any kind of Bayesian reasoning system update on the observation of its own existence (assuming such updates are a good idea in the first place)?

Replies from: Marcello, timtyler## ↑ comment by Marcello · 2009-09-09T13:32:42.355Z · LW(p) · GW(p)

Why do I think anthropic reasoning and consciousness are related?

In a nutshell, I think subjective anticipation requires subjectivity. We humans feel dissatisfied with a description like "well, one system running a continuation of the computation in your brain ends up in a red room and two such systems end up in green rooms" because we feel that there's this extra "me" thing, whose future we need to account for. We bother to ask how the "me" gets split up, what "I" should anticipate, because we feel that there's "something it's like to be me", and that (unless we die) there will be in future "something it will be like to be me". I suspect that the things I said in the previous sentence are at best confused and at worst nonsense. But the question of why people intuit crazy things like that is the philosophical question we label "consciousness".

However, the feeling that there will be in future "something it will be like to be me", and in particular that there will be *one* "something it will be like to be me" if taken seriously, forces us to have subjective anticipation, that is, to write probability distribution summing to *one* for which copy we end up as. Once you do that, if you wake up in a green room in Eliezer's example, you are forced to update to 90% probability that the coin came up heads (provided you distributed your subjective anticipation evenly between all twenty copies in both the head and tail scenarios, which really seems like the only sane thing to do.)

Or, at least, the same amount of "something it is like to be me"-ness as we started with, in some ill-defined sense.

On the other hand, if you do not feel that there is any fact of the matter as to which copy you become, then you just want all your copies to execute whatever strategy is most likely to get all of them the most money from your initial perspective of ignorance of the coinflip.

Incidentally, the optimal strategy looks like an policy selected by updateless decision theory and not like any probability of the the coin having been heads or tails. PlaidX beat me to the counter-example for p=50%. Counter-examples of like PlaidX's will work for any p<90%, and counter-examples like Eliezer's will work for any p>50%, so that pretty much covers it. So, unless we want to include ugly hacks like responsibility, or unless we let the copies reason Goldenly (using Eliezer's original TDT) about each other's actions as tranposed versions of their own actions (which does correctly handle PlaidX's counter-example, but might break in more complicated cases where no isomorphism is apparent) there simply *isn't* a probability-of-heads that represents the right thing for the copies to do no matter the deal offered to them.

## ↑ comment by timtyler · 2009-09-09T09:14:42.324Z · LW(p) · GW(p)

Consciousness is really just a name for having a model of yourself which you can reflect on and act on - plus a whole bunch of other confused interpretations which don't really add much.

To do anthropic reasoning you have to have a simple model of yourself which you can reason about.

Machines can do this too, of course, without too much difficulty. That typically makes them conscious, though. Perhaps we can imagine a machine performing anthropic reasoning while dreaming - i.e. when most of its actuators are disabled, and it would not normally be regarded as being conscious. However, then, how would we know about its conclusions?

## comment by Scott Alexander (Yvain) · 2009-09-09T20:07:55.935Z · LW(p) · GW(p)

Curses on this problem; I spent the whole day worrying about it, and am now so much of a wreck that the following may or may not make sense. For better or worse, I came to a similar conclusion of Psy-Kosh: that this could work in less anthropic problems. Here's the equivalent I was using:

Imagine Omega has a coin biased so that it comes up the same way nine out of ten times. You know this, but you don't know which way it's biased. Omega allows you to flip the coin once, and asks for your probability that it's biased in favor of heads. The coin comes up heads. You give your probability as 9/10.

Now Omega takes 20 people and puts them in the same situation as in the original problem. It lets each of them flip their coins. Then it goes to each of the people who got tails, and offers $1 to charity for each coin that came up tails, but threatens to steal $3 from charity for each coin that came up heads.

This nonanthropic problem works the same way as the original anthropic problem. If the coin is really biased heads, 18 people will get heads and 2 people will get tails. In this case,the correct subjective probability to assign is definitely 9/10 in favor of whatever result you got; after all, this is the correct probability when you're the only person in the experiment, and just knowing that 19 other people are also participating in the experiment shouldn't change matters.

I don't have a formal answer for why this happens, but I can think of one more example that might throw a little light on it. In another thread, someone mentioned that lottery winners have excellent evidence that they are brains-in-a-vat and that the rest of the world is an illusion being put on by the Dark Lord of the Matrix for their entertainment. After all, if this was true, it wouldn't be too unlikely for them to win the lottery, so for a sufficiently large lottery, the chance of winning it this way exceeds the chance of winning it through luck.

Suppose Bob has won the lottery and so believes himself to be a brain in a vat. And suppose that the evidence for the simulation argument is poor enough that there is no other good reason to believe yourself to be a brain in a vat. Omega goes up to Bob and asks him to take a bet on whether he is a brain in a vat. Bob says he is, he loses, and Omega laughs at him. What did he do wrong? Nothing. Omega was just being mean by specifically asking the one person whom ve knew would get the answer wrong.

Omega's little prank would still work if ve announced ver intention to perform it beforehand. Ve would say "When one of you wins the lottery, I will be asking this person to take a bet whether they are a brain in a vat or not!" Everyone would say "That lottery winner shouldn't accept Omega's bet. We know we're not brains in vats." Then someone wins the lottery, Omega asks if they're a brain in a vat, and they say yes, and Omega laughs at them (note that this also works if we consider a coin with a bias such that it lands the same way 999999 out of a million times, let a million people flip it once, and ask people what they think the coin's bias is, asking the people who get the counter-to-expectations result more often than chance.)

Omega's being equally mean in the original problem. There's a 50% chance ve will go and ask the two out of twenty people who are specifically most likely to be wrong and can't do anything about it. The best course I can think of would be for everyone to swear an oath not to take the offer before they got assigned into rooms.

Replies from: Eliezer_Yudkowsky, Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T20:53:54.992Z · LW(p) · GW(p)

Then someone wins the lottery, Omega asks if they're a brain in a vat, and they say yes, and Omega laughs at them

By assumption, if the person is right to believe they're in a sim, then most of the lottery winners are in sims, so while Omega laughs at them in *our* world, they win the bet with Omega in most of *their* worlds.

wrong and can't do anything about it

should have been your clue to check further.

Replies from: Yvain## ↑ comment by Scott Alexander (Yvain) · 2009-09-10T14:30:44.635Z · LW(p) · GW(p)

This is a feature of the original problem, isn't it?

Let's say there are 1000 brains in vats, each in their own little world, and a "real" world of a billion people. The chance of a vat-brain winning the lottery is 1, and the chance of a real person winning the lottery is 1 in a million. There are 1000 real lottery winners and 1000 vat lottery winners, so if you win the lottery your chance of being in a vat is 50-50. However, if you look at any particular world, the chances of this week's single lottery winner being a brain in a vat is 1000/1001.

Assume the original problem is run multiple times in multiple worlds, and that the value of pi somehow differs in those worlds (probably you used pi precisely so people couldn't do this, but bear with me). Of all the people who wake up in green rooms, 18/20 of them will be right to take your bet. However, in each particular world, the chances of the green room people being right to take the bet is 1/2.

In this situation there is no paradox. Most of the people in the green rooms come out happy that they took the bet. It's only when you limit it to one universe that it becomes a problem. The same is true of the lottery example. When restricted to a single (real, non-vat) universe, it becomes more troublesome.

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T20:52:26.723Z · LW(p) · GW(p)

Now Omega takes 20 people and puts them in the same situation as in the original problem. It lets each of them flip their coins. Then it goes to each of the people who got tails, and offers $1 to charity for each coin that came up tails, but threatens to steal $3 from charity for each coin that came up heads.

It's worth noting that if everyone got to make this choice *separately* - Omega doing it once for each person who responds - then it would indeed be wise for everyone to take the bet! This is evidence in favor of either Bostrom's division-of-responsibility principle, or byrnema's pointer-based viewpoint, if indeed those two views are nonequivalent.

## ↑ comment by Scott Alexander (Yvain) · 2009-09-10T15:44:54.416Z · LW(p) · GW(p)

EDIT: Never mind

## ↑ comment by byrnema · 2009-09-09T23:59:02.853Z · LW(p) · GW(p)

Bostrom's calculation is correct, but I believe it is an example of multiplying by the right coefficients for the wrong reasons.

I did exactly the same thing -- multiplied by the right coefficients for the wrong reasons -- in my deleted comment. I realized that the justification of these coefficients required a quite different problem (in my case, I modeled that all the green roomers decided to evenly divide the spoils of the whole group) and the only reason it worked was because multiplying the first term by 1/18 and the next term by 1/2 meant you were effectively canceling away that the factors the represented your initial 90% posterior, and thus ultimately just applying the 50/50 probability of the non-anthropic solution.

Anthropic calculation:

18/20(12)+2/20(-52) = 5.6

Bostrom-modified calculation for responsibility per person:

[18/20(12)/18+2/20(-52)/2] / 2 = -1

Non-anthropic calculation for EV per person:

[1/2(12)+1/2(-52)] /20 = -1

My pointer-based viewpoint, in contrast, is not a calculation but a rationale for why you must use the 50/50 probability rather than the 90/10 one. The argument is that each green roomer cannot use the information that they were in a green room because this information was preselected (a biased sample). With effectively no information about what color room they're in, each green roomer must resort to the non-anthropic calculation that the probability of flipping heads is 50%.

Replies from: Christian_Szegedy## ↑ comment by Christian_Szegedy · 2009-09-10T02:52:39.416Z · LW(p) · GW(p)

I can very much relate to Eliezer's original gut reaction: I agree that Nick's calculation is very ad hoc and hardly justifiable.

However, I also think that, although you are right about the pointer bias, your explanation is still incomplete.

I think Psi-kosh made an important step with his reformulation. Especially eliminating the copy procedure for the agents was essential. If you follow through the math from the point of view of one of the agents, the nature of the problem becomes clear:

Trying to write down the payoff matrix from the viewpoint of one of the agents, it becomes clear that you can't fill out any of the reward entries, since the **outcome never depends on that agent's decision alone.** If he got a green marble, it still depends on other agents decision and if he drew a red one, it will depend only on other agent's decision.

This makes it completely clear that the only solution is for the agents is to agree on a predetermined protocol and therefore the second calculation of the OP is the only correct one so far.

However, this protocol **does not imply anything** about P(head|being in green room). It is simply **irrelevant** for the expected value of any of the agreed upon protocol. One could create a protocol that depends on P(head|being in a green room) for some of the agents, *but you would have to analyze the expected value of the protocol from a global point of view*, not just from the point of view of the agent, for you can't complete the decision matrix if the outcome depends on other agent's decisions as well.

Of course a predetermined protocol does not mean that the agents must explicitly agree on a narrow protocol before the action. If we assume that the agents get all the information **once they find themselves in the room**, they could still create a mental model of the **whole global situation** and base their decision on the second calculation of the OP.

## ↑ comment by byrnema · 2009-09-10T04:39:19.731Z · LW(p) · GW(p)

I agree with you that the reason why you can't use the 90/10 prior is because the decision never depends on a person in a red room.

In Eliezer's description of the problem above, he tells each green roomer that he asks all the green roomers if they want him to go ahead with a money distribution scheme, and they must be unanimous or there is a penalty.

I think this is a nice pedogogical component that helps a person understand the dilemma, but I would like to emphasize here (even if you're aware of it) that it is completely superfluous to the mechanics of the problem. It doesn't make any difference if Eliezer bases his action on the answer of one green roomer or all of them.

For one thing, all green roomer answers will be unanimous because they all have the same information and are asked the same complicated question.

And, more to the point, even if just one green roomer is asked, the dilemma still exists that he can't use his prior that heads was probably flipped.

Replies from: Christian_Szegedy, DPiepgrass## ↑ comment by Christian_Szegedy · 2009-09-10T04:42:23.757Z · LW(p) · GW(p)

Agreed 100%.

[EDIT:] Although I would be a bit more general: regardless of red rooms: if you have several actors, **even if they necessarily** make the same decision they have to analyze the global picture. The only situation when the agent should be allowed to make the simplified subjective Bayesian decision table analysis if he is the **only actor** (no copies, etc. It is easy to construct simple decision problems without "red rooms": Where each of the actors have some control over the outcome and none of them can make the analysis for itself only but have to buid a model of the whole situation to make the globally optimal decision.)

However, I did not imply in any way that the penalty matters. (At least, as long as the agents are sane and don't start to flip non-logical coins) The global analysis of the payoff may clearly disregard the penalty case if it's impossible for that specific protocol. The only requirement is that the expected value calculation must be made protocol by protocol basis.

## ↑ comment by DPiepgrass · 2021-02-27T13:31:20.776Z · LW(p) · GW(p)

My intuition says that this is qualitatively different. If the agent *knows* that *only one* green roomer will be asked the question, then upon waking up in a green room the agent thinks "with 90% probability, there are 18 of me in green rooms and 2 of me in red rooms." But then, if the agent is asked whether to take the bet, this new information ("I am the unique one being asked") changes the probability back to 50-50.

## comment by LauraABJ · 2009-10-12T00:17:46.386Z · LW(p) · GW(p)

"I've made sacrifices! You don't know what it cost me to climb into that machine every night, not knowing if I'd be the man in the box or in the prestige!"

sorry- couldn't help myself.

Replies from: pjeby## ↑ comment by pjeby · 2009-10-12T02:13:30.682Z · LW(p) · GW(p)

"I've made sacrifices! You don't know what it cost me to climb into that machine every night, not knowing if I'd be the man in the box or in the prestige!"

You know, I never could make sense out of that line. If you assume the machine creates "copies" (and that's strongly implied by the story up to that point), then that means *every* time he gets on stage, he's going to wind up in the box. (And even if the copies are error-free and absolutely interchangeable, one copy will still end up in the box.)

(Edit to add: of course, if you view it from the quantum suicide POV, "he" *never* ends up in the box, since otherwise "he" would not be there to try again the next night.)

## comment by Vladimir_Nesov · 2009-09-08T20:57:15.143Z · LW(p) · GW(p)

Again: how can you talk about *concluding* that you are a Boltzmann brain? To conclude means to update, and here you refuse updating.

## comment by dfranke · 2009-09-08T19:19:15.509Z · LW(p) · GW(p)

I read this and told myself that it only takes five minutes to have an insight. Five minutes later, here's what I'm thinking:

Anthropic reasoning is confusing because it treats consciousness as a primitive. By doing so, we're committing LW's ultimate no-no: assuming an ontologically fundamental mental state. We need to find a way to reformulate anthropic reasoning in terms Solomonoff induction. If we can successfully do so, the paradox will dissolve.

Replies from: timtyler, SforSingularity## ↑ comment by timtyler · 2009-09-09T09:16:50.659Z · LW(p) · GW(p)

Anthropic reasoning is confusing - probably because we are not used to doing it much in our ancestral environment.

I don't think you can argue it treats consciousness as a primitive, though. Anthropic reasoning is challenging - but not so tricky that machines can't do it.

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-09T18:32:54.269Z · LW(p) · GW(p)

It involves calculating a 'correct measure' of how many partial duplicates of a computation exist:

www.nickbostrom.com/papers/experience.pdf

Anthropics does involve magical categories.

Replies from: timtyler## ↑ comment by timtyler · 2009-09-09T18:43:23.137Z · LW(p) · GW(p)

Right - but that's "Arthur C Clark-style magic" - stuff that is complicated and difficult - not the type of magic associated with mystical mumbo-jumbo.

We can live with some of the former type of magic - and it might even spice things up a bit.

## ↑ comment by SforSingularity · 2009-09-08T20:46:44.371Z · LW(p) · GW(p)

need to find a way to reformulate anthropic reasoning in terms Solomonoff induction

I fail to see how solomonoff can reduce ontologically basic mental states.

## comment by Scott Alexander (Yvain) · 2009-09-10T17:02:50.997Z · LW(p) · GW(p)

More thinking out loud:

It really is in your best interest to accept the offer after you're in a green room. It really is in your best interest to accept the offer conditional on being in a green room before you're assigned. Maybe part of the problem arises because you think your decision will influence the decision of others, ie because you're acting like a timeless decision agent. Replace "me" with "anyone with my platonic computation", and "I should accept the offer conditional on being in a green room" with "anyone with my platonic computation should accept the offer, conditional on anyone with my platonic computation being in a green room." But the chances of someone with my platonic computation being in a green room is 100%. Or, to put it another way, the Platonic Computation is wondering "Should I accept the offer conditional on any one of my instantiations being in a green room?". But the Platonic Computation knows that at least one of its instantiations will be in a green room, so it declines the offer. If the Platonic Computation was really a single organism, its best option would be to single out one of its instantiations before-hand and decide "I will accept the offer, given that Instantiation 6 is in a green room" - but since most instantiations of the computation can't know the status of Instantiation 6 when they decide, it doesn't have this option.

Replies from: byrnema## ↑ comment by byrnema · 2009-09-10T17:44:35.340Z · LW(p) · GW(p)

Yes, exactly.

If you are in a green room and someone asks you if you will bet that a head was flipped, you should say "yes".

However, if that same person asks you if *they* should bet that heads was flipped, you should answer no if you ascertain that they asked you on the precondition that you were in a green room.

the probability of heads | you are in green room = 90%

the probability of you betting on heads | you are green room = 100% = no information about the coin flip

## ↑ comment by Jonathan_Lee · 2009-09-10T22:14:58.340Z · LW(p) · GW(p)

Your first claim needs qualifications: You should only bet if you're being drawn randomly from everyone. If it is known that one random person in a green room will be asked to bet, then if you wake up in a green room and are asked to bet you should refuse.

P(Heads | you are in a green room) = 0.9 P(Being asked | Heads and Green) = 1/18, P(Being asked | Tails and Green) = 1/2 Hence P(Heads | you are asked in a green room) = 0.5

Of course the OP doesn't choose a random individual to ask, or even a random individual in a green room. The OP asks all people in green rooms in this world.

If there is confusion about when your decision algorithm "chooses", then TDT/UDT can try to make the latter two cases equivalent, by thinking about the "other choices I force". Of course the fact that this asserts some variety of choice for a special individual and not for others, when the situation is symmetric, suggests something is being missed.

What is being missed, to my mind, is a distinction between the distribution of (random individuals | data is observed), and the distribution of (random worlds | data is observed).

In the OP, the latter distribution isn't altered by the update as the observed data occurs somewhere with probability 1 in both cases. The former is because it cares about the number of copies in the two cases.

## comment by Jonathan_Lee · 2009-09-10T03:34:02.009Z · LW(p) · GW(p)

I've been watching for a while, but have never commented, so this may be horribly flawed, opaque or otherwise unhelpful.

I think the problem is entirely caused by the use of the wrong sets of belief, and that anything holding to Eliezer's 1-line summary of TDT or alternatively UDT should get this right.

Suppose that you're a rational agent. Since you are instantiated in multiple identical circumstances (green rooms) and asked identical questions, your answers should be identical. Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12.

This committal is what you wish to optimise over from TDT/UDT, and clearly this requires knowledge about the likelyhood of different decision making groups. The distribution of sizes of random groups is *not* the same as the distribution of sizes of groups that a random individual is in. The probabilities of being in a group are upweighted by the size of the group and normalised. This is why Bostrom's suggested 1/n split of responsibility works; it reverses the belief about where a random individual is in a set of decision making groups to a belief about the size of a random decision making group.

By the construction of the problem the probability that a random (group of all the people in green rooms) has size 18 is 0.5, and similarly for 2 the probability is 0.5. Hence the expected utility is (0.5*12)+(0.5*-52)=-20.

If you're asked to accept a bet on there being 18 people in green rooms, and you're told that only you're being offered it, then the decision commits exactly one instance of you to a specific loss or gain, regardless of the group you're in. Hence you can't do better than the 0.9 and 0.1 beliefs.

If you're told that the bet is being offered to everyone in a green room, then you are committing to n times the outcome in any group of n people. In this case gains are conditional on group size, and so you have to use the 0.5-0.5 belief about the distribution of groups. It doesn't matter because the larger groups have the larger multiplier and thus shutting up and multiplying yields the same answers as a single-shot bet.

ETA: At some level this is just choosing an optimal output for your calculation of what to do, given that the result is used *variably* widely.

## ↑ comment by Christian_Szegedy · 2009-09-10T03:56:24.485Z · LW(p) · GW(p)

This committal is what you wish to optimise over from TDT/UDT, and clearly this requires knowledge about the likelyhood of different decision making groups.

I was influenced by the OP and used to think that way. However I think now, that this is not the root problem.

What if the agents get more complicated decision problems: for example, rewards depending on the parity of the agents voting certain way, etc.?

I think, what essential is that the agents have to think globally (categorical imperative, hmmm?)

Practically: if the agent recognizes that there is a collective decision, then it should model all available conceivable protocols (but making apriori sure that all cooperating agents perform the same or compatible analysis, if they can't communicate) and then they should choose the protocol with **best overall total gain**. In the case of the OP: the second calculation in the OP. (Not messing around with correction factors based on responsibilities, etc.)

Special considerations based on group sizes etc. may be incidentally correct in certain situations, but this is just not general enough. The crux is that the ultimate test is simply the expected value computation for the protocol of the whole group.

Replies from: Jonathan_Lee, None## ↑ comment by Jonathan_Lee · 2009-09-10T11:52:00.224Z · LW(p) · GW(p)

Between non communicating copies of your decision algorithm, it's forced that every instance comes to the same answers/distributions to all questions, as otherwise Eliezer can make money betting between different instances of the algorithm. It's not really a categorical imperative, beyond demanding consistency.

The crux of the OP is asking for a probability assessment of the world, not whether the DT functions.

I'm not postulating 1/n allocation of responsibility; I'm stating that the source of the confusion is over: P(A random individual is in a world of class A_i | Data) with P(A random world is of class A_i | Data) And that these are not equal if the number of individuals with access to Data are different in distinct classes of world.

Hence in this case, there are 2 classes of world, A_1 with 18 Green rooms and 2 Reds, and A_2 with 2 Green rooms and 18 Reds.

P(Random individual is in the A_1 class | Woke up in a green room) = 0.9 by anthropic update. P(Random world is in the A_1 class | Some individual woke up in a green room) = 0.5

Why? Because in A_1 there 18/20 individuals fit the description "Woke up in a green room", but in A_2 only 2/20 do.

The crux of the OP is that neither a 90/10 nor 50/50 split seem acceptable, if betting on "Which world-class an individual in a Green room is in" and "Which world-class the (set of all individuals in Green rooms which contains this individual) is in" are identical. I assert that they are not. The first case is 0.9/0.1 A_1/A_2, the second is 0.5/0.5 A_1/A_2.

Consider a similar question where a random Green room will be asked. If you're in that room, you update both on (Green walls) and (I'm being asked) and recover the 0.5/0.5, correctly. This is close to the OP as if we wildly assert that you and only you have free will and force the others, then you are special. Equally in cases where everyone is asked and plays separately, you have 18 or 2 times the benefits depending on whether you're in A_1 or A_2.

If each individual Green room played separately, then you update on (Green walls), but P(I'm being asked|Green) = 1 in either case. This is betting on whether there are 18 people in green rooms or 2, and you get the correct 0.9/0.1 split. To reproduce the OP the offers would need to be +1/18 to Greens and -3/18 from Reds in A_1, and +1/2 to Greens and -3/2 from Reds in A_2, and then you'd refuse to play, correctly.

## ↑ comment by **[deleted]** ·
2009-09-10T09:11:55.869Z · LW(p) · GW(p)

And how would they decide which protocol had the best overall total gain? For instance, could you define a protocol complexity measure, and then use this complexity measure to decide? And are you even dealing with ordinary Bayesian reasoning any more, or is this the first hint of some new more general type of rationality?

MJG - The Black Swan is Near!

Replies from: Christian_Szegedy## ↑ comment by Christian_Szegedy · 2009-09-10T18:00:51.167Z · LW(p) · GW(p)

It's not about complexity, it is just expected total gain. Simply the second calculation of the OP.

I just argued, that the second calculation is right and that is what the agents should do in general. (unless they are completely egoistic for their special copies)

Replies from: None## ↑ comment by **[deleted]** ·
2009-09-11T05:02:45.204Z · LW(p) · GW(p)

This was a simple situation. I'm suggesting a 'big picture' idea for the general case.

According to Wei Dei and Nesov above, the anthropic-like puzzles can be re-interpreted as 'agent co-ordination' problems (multiple agents trying to coordinate their decision making). And you seemed to have a similiar interpretation. Am I right?

If Dei and Nesov's interpretation is right, it seems the puzzles could be reinterpreted as being about groups of agents tring to agree in advance about a 'decision making protocol'.

But now I ask is this not equivalent to trying to find a 'communication protocol' which enables them to best coordinate their decision making? And rather than trying to directly calculate the results of every possible protocol (which would be impractical for all but simple problems), I was suggesting trying to use information theory to apply a complexity measure to protocols, in order to rank them.

Indeed I ask whether this is actually the correct way to interpret Occam's Razor/Complexity Priors? i.e, My suggestion is to re-interpret Occam/Priors as referring to copies of agents trying to co-ordinate their decision making using some communication protocol, such that they seek to minimize the complexity of this protocol.

## ↑ comment by CarlShulman · 2009-09-10T06:16:19.404Z · LW(p) · GW(p)

"Hence if you wake up in a green room and you're asked to steal from the red rooms and give to the green rooms, you either commit a group of 2 of you to a loss of 52 or commit a group of 18 of you to a gain of 12."

In the example you care equally about the red room and green room dwellers.

Replies from: Jonathan_Lee## ↑ comment by Jonathan_Lee · 2009-09-10T10:38:17.905Z · LW(p) · GW(p)

Hence if there are 2 instances of your decision algorithm in Green rooms, there are 2 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and gain 1 for each green, for a total gain of 1*2-3*18 = - 52.

If there are 18 instances in Green rooms, there are 18 runs of your decision algorithm, and if they vote to steal there is a loss of 3 from each red and a gain of 1for each green, for a total gain of 1*18-2*3 = 12

The "committal of a group of" is noting that there are 2 or 18 runs of your decision algorithm that are logically forced by the decision made this specific instance of the decision algorithm in a green room.

## comment by AllanCrossman · 2009-09-08T20:11:21.274Z · LW(p) · GW(p)

I think I'm with Bostrom.

The problem seems to come about because the good effects of 18 people being correct are more than wiped out by the bad effects of 2 people being wrong.

I'm sure this imbalance in the power of the agents has something to do with it.

Replies from: JGWeissman## ↑ comment by JGWeissman · 2009-09-09T04:13:44.542Z · LW(p) · GW(p)

What if, instead of requiring agreement of all copies in a green room, one copy in a green room was chosen at random to make the choice?

Replies from: JGWeissman, Christian_Szegedy## ↑ comment by JGWeissman · 2009-09-09T05:00:09.940Z · LW(p) · GW(p)

In this case the chosen copy in the green room should update on the anthropic evidence of being chosen to make the choice. That copy had a 1/18 probability of being chosen if the coin flip came up heads, and a 1/2 probability of being chosen if the coin flip came up tails, so the odds of heads:tails should be updated from 9:1 to 1:1. This exactly canceled the anthropic evidence of being in a green room.

## ↑ comment by Christian_Szegedy · 2009-09-09T04:29:25.148Z · LW(p) · GW(p)

... or equivalently: you play a separate game with every single copy in each green room...

In both cases, the anthropic update gives the right solution as I mentioned in an earlier post. (And consequently, this demonstrates the the crux of the problem was in fact the collective nature of the decision.)

Replies from: JGWeissman## ↑ comment by JGWeissman · 2009-09-09T04:52:39.683Z · LW(p) · GW(p)

They are not equivalent. If one green room copy is chosen at random, then the game will be played exactly once whether the coin flip resulted in heads or tails. But if every green room copy plays, the the game will be played 18 times if the coin came up heads and 2 times if the coin came up tails.

Replies from: Christian_Szegedy## ↑ comment by Christian_Szegedy · 2009-09-09T05:53:52.680Z · LW(p) · GW(p)

Good point.

However, being chosen for the game (since the agent knows that in both cases exactly one copy will be chosen) also carries information the same way as being in the green room. Therefore, (by the same logic) it would imply an additional anthropic update: "Although I am in a green groom, the fact that I am chosen to play the game makes it much less probable that the coin is head." So (by calculating the correct chances), he can deduce:

I am in a green room + I am chosen => P(head)=0.5

OTOH:

I am in a green room (not knowing whether chosen) => P(head)=0.9

[EDIT]: I just noted that you already argued the same way, I have plainly overlooked it.

## comment by cousin_it · 2009-09-10T07:02:41.238Z · LW(p) · GW(p)

I waited to comment on this, to see what others would say. Right now Psy-Kosh seems to be right about anthropics; Wei Dai seems to be right about UDT; timtyler seems to be right about Boltzmann brains; byrnema seems to be mostly right about pointers; but I don't understand why nobody latched on to the "reflective consistency" part. Surely the kind of consistency under observer-splitting that you describe is too strong a requirement in general: if two copies of you play a game, the correct behavior for both of them would be to try to win, regardless of what overall outcome you'd prefer before the copying. The paperclip formulation works around this problem, so the correct way to analyze this would be in terms of multiplayer game theory with chance moves, as Psy-Kosh outlined.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-10T07:28:25.926Z · LW(p) · GW(p)

if two copies of you play a game, the correct behavior for both of them would be to try to win, regardless of what overall outcome you'd prefer before the copying

That doesn't make sense to me, unless you're assuming that the player isn't capable of self-modification. If it was, wouldn't it modify itself so that its copies won't try to win individually, but cooperate to obtain the outcome that it prefers before the copying?

Replies from: cousin_it, tut## ↑ comment by cousin_it · 2009-09-10T08:24:15.856Z · LW(p) · GW(p)

Yes, that's right. I've shifted focus from correct program behavior to correct human behavior, because that's what everyone else here seems to be talking about. If the problem is about programs, *there's no room for all this confusion in the first place*. Just specify the inputs, outputs and goal function, then work out the optimal algorithm.

## comment by RobinHanson · 2009-09-10T15:09:11.678Z · LW(p) · GW(p)

There are lots of ordinary examples in game theory of time inconsistent choices. Once you know how to resolve them, then if you can't use those approaches to resolve this I might be convinced that anthropic updating is at fault. But until then I think you are making a huge leap to blame anthropic updating for the time inconsistent choices.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-10T16:24:59.152Z · LW(p) · GW(p)

Robin, you're jumping into the middle of a big extended discussion. We're not only blaming anthropic updating, we're blaming Bayesian updating in general, and proposing a decision theory without it (Updateless Decision Theory, or UDT). The application to anthropic reasoning is just that, an application.

UDT seems to solve all cases of time inconsistency in decision problems with one agent. What UDT agents do in multi-player games is still an open problem that we're working on. There was an extensive discussion about it in the previous threads if you want to see some of the issues involved. But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.

Replies from: RobinHanson, Eliezer_Yudkowsky, Vladimir_Nesov## ↑ comment by RobinHanson · 2009-09-14T00:00:36.605Z · LW(p) · GW(p)

The ordinary time inconsistencies in game theory are all regarding multiple agents. Seems odd to suggest you've solved the problem except for those cases.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-14T01:44:18.589Z · LW(p) · GW(p)

I was referring to problems like Newcomb's Problem, Counterfactual Mugging, Sleeping Beauty, and Absentminded Driver.

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-10T18:33:46.999Z · LW(p) · GW(p)

Not exactly the way I would phrase it, but Timeless Decision Theory and Updateless Decision Theory between them have already killed off a sufficiently large number of time inconsistencies that treating any remaining ones as a Problem seems well justified. Yes, we *have* solved all ordinary dynamic inconsistencies of conventional game theory already!

## ↑ comment by RobinHanson · 2009-09-14T00:04:08.773Z · LW(p) · GW(p)

Let's take the simple case of time inconsistency regarding punishment. There is a two stage game with two players. First A decides if to cheat B for some gain. Then B decides if to punish A at some cost. Before the game B would like to commit to punishing A if A cheats, but once A has already cheated, B would rather not punish.

Replies from: Wei_Dai, Eliezer_Yudkowsky, CarlShulman, Alicorn## ↑ comment by Wei_Dai · 2009-09-14T01:44:07.996Z · LW(p) · GW(p)

In UDT, we blame this time inconsistency on B's updating on A having cheated (i.e. treating it as a fact that can no longer be altered). Suppose it's common knowledge that A can simulate or accurately predict B, then B should reason that by deciding to punish, it increases the probability that A would have predicted that B would punish and thus decreases the probability that A would have cheated.

But the problem is not fully solved, because A could reason the same way, and decide to cheat no matter what it predicts that B does, in the expectation that B would predict this and see that it's pointless to punish.

So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We're trying to work out what to do about this.

Replies from: Benja, Eliezer_Yudkowsky## ↑ comment by Benya (Benja) · 2012-11-17T23:43:41.288Z · LW(p) · GW(p)

So UDT seems to eliminate time-inconsistency, but at the cost of increasing the number of possible outcomes, essentially turning games with sequential moves into games with simultaneous moves, with the attendant increase in the number of Nash equilibria. We're trying to work out what to do about this.

Er, turning games with sequential moves into games with simultaneous moves is standard in game theory, and "never cheat, always punish cheating" and "always cheat, never punish" *are* what are considered the Nash equilibria of that game in standard parlance. [**ETA**: Well, "never cheat, punish x% of the time" will also be a NE for large enough x.] It is subgame perfect equilibrium that rules out "never cheat, always punish cheating" (the set of all SPE of a sequential game is a subset of the set of all NE of that game).

## ↑ comment by Wei_Dai · 2012-11-18T01:09:32.232Z · LW(p) · GW(p)

Yeah, I used the wrong terminology in the grandparent comment. I guess the right way to put it is that SPE/backwards induction no longer seems reasonable under UDT and it's unclear what can take its place, as far as reducing the number of possible solutions to a given game.

## ↑ comment by Manfred · 2012-11-18T00:17:59.690Z · LW(p) · GW(p)

It is subgame perfect equilibrium that rules out "never cheat, always punish cheating" (the set of all SPE of a sequential game is a subset of the set of all NE of that game).

How strictly do you (or the standard approach) mean to rule out options that aren't good on all parts of the game? It seems like sometimes you do want to do things that are subgame suboptimal.

Edit: or at least be known to do things, which unfortunately can require actually being prepared to do the things.

Replies from: Benja## ↑ comment by Benya (Benja) · 2012-11-18T11:32:28.715Z · LW(p) · GW(p)

Well, the classical game theorist would reply that they're studying one-off games, in which the game you're currently playing doesn't affect any payoff you get outside that game (otherwise that should be made part of the game), so you can't be doing the punishment because you want to be known to be a punisher, or the game that Robin specified doesn't model the situation you're in. The classical game theorist assumes you can't look into people's heads, so whatever you say or do before the cheating, you're always free to not punish during the punishment round (as you're undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).

The classical game theorist would further point out that if you *do* want model that punishment helps you be known as a punisher, then you should use their theory of repeated games, where they have some folk theorems for you saying that lots and lots of things can be Nash equilibria e.g. in a game where after each round there is a fixed probability of another round; for example, cooperation in the prisoner's dilemma, but also all sorts of suboptimal outcomes (which become Nash equilibria because any deviator gets punished as badly as the other players can punish them).

I should point out that not all classical game theorists think that SPE makes particularly good predictions, though; I've read someone say, I think Binmore, that you expect to virtually always see a NE in the laboratory after a learning period, but not an SPE, and that the original inventor of SPE actually came up with it as an example of what you would *not* expect to see in the lab, or something to that tune. (Sorry, I should really chase down that reference, but I don't have time right now. I'll try to remember to do that later. **ETA**: Ok, Binmore and Shaked, 2010: Experimental Economics: Where Next? *Journal of Economic Behavior & Organization*, 73: 87-100. See the stuff about backward induction, starting at the bottom on p.88. The inventor of SPE is Reinhard Selten, and the claim is that he didn't believe it would predict what you see it in the lab and "[i]t was to demonstrate this fact that he encouraged Werner Güth (...) to carry out the very first experiment on the Ultimatum game", not that he invented SPE for this purpose.)

## ↑ comment by Manfred · 2012-11-18T23:51:29.525Z · LW(p) · GW(p)

so whatever you say or do before the cheating, you're always free to not punish during the punishment round

Interesting. This idea, used as an argument for SPE, seems to be the free will debate intruding into decision theory. "Only some of these algorithms have freedom, and others don't, and humans are free, so they should behave like the free algorithms." This either ignores, or accepts, the fact that the "free" algorithms are just as deterministic as the "unfree" algorithms. (And it depends on other stuff, but that's not the fun bit)

(as you're undoubtedly aware, mutual checking of source code is prohibited by antitrust laws in over 185 countries).

:D

Replies from: Benja## ↑ comment by Benya (Benja) · 2012-11-25T21:03:16.827Z · LW(p) · GW(p)

Hm, I may not quite have gotten the point across: I think you may be thinking of the argument that humans have free will, so they can't force future versions of themselves to do something that would be against that future version's given its information, but that isn't the argument I was trying to explain. The idea I was refering to works precisely the same way with deterministic algorithms, as long as the players only get to observe each others' actions, not each others' source (though of course its proponents don't think in those terms). The point is that if the other player looks at you severely and suggestively taps their baseball bat and tells you about how they've beaten up people who have defected in the past, that still doesn't mean that they're actually going to beat you up -- since if such threats were effective on you, then making them would be the *smart* thing to do even if the other player has no intention of *actually* beating you up (and risk going to jail) if for some reason you end up defecting. (Compare AI-in-the-box...) (Of course, this argument only works if you're reasonably sure that the other player is a classical game theorist; if you think you might be playing against someone who will, "irrationally", actually punish you, like a timeless decision theorist, then you should not defect, and they won't have to punish you...)

Now, if you had actual information about what this player had done in similar situations in the past, like police reports of beaten-up defectors, this argument wouldn't work, but *then* (the standard argument continues) you have the wrong game-theoretical model; the correct model includes all of the punisher's previous interactions, and in *that* game, it might well be a SPE to punish. (Though only if the exact number of "rounds" is not certain, for the same reason as in the finitely iterated Prisoner's Dilemma: in the last round the punisher has no more reason to punish because there are no future targets to impress, so you defect no matter what they did in previous rounds, so they have no reason to punish in the second-to-last round, etc.)

(BTW: reference added to grandparent.)

Replies from: Manfred## ↑ comment by Manfred · 2012-11-25T22:15:51.771Z · LW(p) · GW(p)

I think you may be thinking of the argument that humans have free will, so they can't force future versions of themselves to do something that would be against that future version's given its information

That is not what I was thinking of. Here, let me re-quote the whole sentence:

The classical game theorist assumes you can't look into people's heads, so whatever you say or do before the cheating, you're always free to not punish during the punishment round

The funny implication here is that if someone *did* look into your head, you would no longer be "free." Like a lightswitch :P And then if they erased their memory of what they saw, you're free again. Freedom on, freedom off.

And though that is a fine idea to define, to mix it up with an *algorithmic* use of "freedom" seems to just be used to argue "by definition."

## ↑ comment by Benya (Benja) · 2012-11-25T22:47:46.851Z · LW(p) · GW(p)

Ok, sorry I misread you. "Free" was just my word rather than part of the standard explanation, so alas we don't have anybody we can attribute that belief to :-)

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-14T18:15:24.222Z · LW(p) · GW(p)

(The difficulty arises if UDT B reasons logically that there should *not logically exist* any copies of its current decision process finding themselves in worlds where A is dependent on its own decision process, and yet A defects. I'm starting to think that this resembles the problem I talked about earlier, where you have to use Omega's probability distribution in order to agree to be Counterfactually Mugged on problems that Omega expects to have a high payoff. Namely, you may have to use A's logical uncertainty, rather than your own logical uncertainty, in order to perceive a copy of yourself inside A's counterfactual. This is a complicated issue and I may have to post about it in order to explain it properly.)

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-14T01:45:03.179Z · LW(p) · GW(p)

Drescher-Nesov-Dai UDT solves this (that is, goes ahead and punishes the cheater, making the same decision at both times).

TDT can handle Parfit's Hitchhiker - pay for the ride, make the same decision at both times, because it forms the counterfactual "If I did not pay, I would not have gotten the ride". But TDT has difficulty with this particular case, since it implies that B's original belief that A would *not* cheat if punished, was wrong; and after updating on this new information, B may no longer have a motive to punish. (UDT of course does not update.) Since B's payoff can depend on B's complete strategy tree including decisions that would be made under other conditions, instead of just depending on the actual decision made under real conditions, this scenario is outside the realm where TDT is guaranteed to maximize.

## ↑ comment by CarlShulman · 2009-09-14T01:08:36.807Z · LW(p) · GW(p)

The case is underspecified:

- How transparent/translucent are the agents? I.e. can A examine B's sourcecode, or use observational and other data to assess B's decision procedure? If not, what is A's prior probability distribution for decision procedures B might be using?
- Are both A and B using the same decision theory, TDT/UDT? Or is A using CDT and B using TDT/UDT or vice versa?

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-14T01:50:15.925Z · LW(p) · GW(p)

Clearly B has mistaken beliefs about either A or its own dispositions; otherwise B would not have dealt with A in the interaction where A ended up cheating. If B uses UDT (and hence will carry through punishments), and A uses any DT that correctly forecasts B's response to cheating, then A should not in fact cheat. If A cheats anyway, though, B still punishes.

Actually, on further reflection, it's possible that B would reason that it is logically impossible for A to have the specified dependency on B's decision, and yet for A to still end up defecting, in which case even UDT might end up in trouble - it would be a *transparent* logical impossibility for A to defect if B's beliefs about A are true, so it's not clear that B would handle the event correctly. I'll have to think about this.

## ↑ comment by Vladimir_Nesov · 2009-09-14T07:01:19.224Z · LW(p) · GW(p)

If there is some probability of A cheating even if B precommits to punishment, but with odds in B's favor, the situation where B needs to implement punishment is quite possible (expected). Likewise, if B precommiting to punish A is predicted to lead to an even worse outcome than not punishing (because of punishment expenses), UDT B won't punish A. Futhermore, a probability of cheating and not-punishment of cheating (mixed strategies, possibly on logical uncertainty to defy the laws of the game if pure strategies are required) is a mechanism through which the players can (consensually) bargain with each other in the resulting parallel game, an issue Wei Dai mentioned in the other reply. B doesn't need absolute certainty at any stage, in both cases.

Also, in UDT there are no logical certainties, as it doesn't update on logical conclusions as well.

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-14T18:12:07.339Z · LW(p) · GW(p)

If there is some probability of A cheating even if B precommits to punishment

Sure, but that's the *convenient* setup. What if for A to cheat means that you necessarily just mistaken about which algorithm A runs?

Also, in UDT there are no logical certainties, as it doesn't update on logical conclusions as well.

UDT will be logically certain about some things but not others. If UDT B "doesn't update" on its computation about what A will do in response to B, it's going to be in trouble.

Replies from: Vladimir_Nesov## ↑ comment by Vladimir_Nesov · 2009-09-20T14:09:50.924Z · LW(p) · GW(p)

What if for A to cheat means that you necessarily just mistaken about which algorithm A runs?

A decision algorithm should never be mistaken, only uncertain.

UDT will be logically certain about some things but not others. If UDT B "doesn't update" on its computation about what A will do in response to B, it's going to be in trouble.

"Doesn't update" doesn't mean that it doesn't use the info (but you know that, so what do you mean?). A logical conclusion can be a parameter in a strategy, without making the algorithm unable to reason about what it would be like if the conclusion was different, that is basically about uncertainty of same algorithm in other states of knowledge.

## ↑ comment by Alicorn · 2009-09-14T00:12:54.323Z · LW(p) · GW(p)

Am I correct in assuming that if A cheats and is punished, A suffers a net loss?

Replies from: Johnicholas## ↑ comment by Johnicholas · 2009-09-14T00:43:24.120Z · LW(p) · GW(p)

Yes.

## ↑ comment by Wei_Dai · 2009-09-10T19:06:30.515Z · LW(p) · GW(p)

What is the remaining Problem that you're referring to? Why can't we apply the formalism of UDT1 to the various examples people seem to be puzzled about and just get the answers out? Or is cousin_it right about the focus having shifted to how human beings ought to reason about these problems?

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-12T20:03:45.930Z · LW(p) · GW(p)

The anthropic problem was a remaining problem for TDT, although not UDT.

UDT has its own problems, possibly. For example, in the Counterfactual Mugging, it seems that you want to be counterfactually mugged whenever Omega has a well-calibrated distribution and has a systematic policy of offering high-payoff CMs according to that distribution, even if your own prior has a different distribution. In other words, the key to the CM isn't your own distribution, it's Omega's. And it's not possible to interpret UDT as epistemic advice, which leaves anthropic questions open. So I haven't yet shifted to UDT outright.

(The reason I did not answer your question earlier was that it seemed to require a response at greater length than the above.)

Replies from: Wei_Dai, Wei_Dai## ↑ comment by Wei_Dai · 2009-09-14T02:05:56.594Z · LW(p) · GW(p)

Well, you're right in the sense that I can't understand the example you gave. (I waited a couple of days to see if it would become clear, but it didn't) But the rest of the response is helpful.

Replies from: Benja## ↑ comment by Benya (Benja) · 2012-12-03T20:00:07.381Z · LW(p) · GW(p)

Did he ever get around to explaining this in more detail? I don't remember reading a reply to this, but I think I've just figured out the idea: Suppose you get word that Omega is coming to the neighbourhood and going to offer counterfactual muggings. What sort of algorithm do you want to self-modify into? You don't know *what* CMs Omega is going to offer; all you know is that it will offer odds according to its well-calibrated prior. Thus, it has higher expected utility to be a CM-accepter than a CM-rejecter, and even a CDT agent would want to self-modify.

I don't think that's a problem for UDT, though. What UDT will compute when asked to pay is the expected utility under its prior of paying up *when Omega asks it to*; thus, the condition for UDT to pay up is **NOT**

```
prior probability of heads * Omega's offered payoff > prior of tails * Omega's price
```

but

```
prior of (heads and Omega offers a CM for this coin) * payoff > prior of (tails and CM) * price.
```

In other words, UDT takes the quality of Omega's predictions into account and acts as if updating on them (the same way you would update if Omega told you who it expects to win the next election, at 98% probability).

CDT agents, as usual, will actually want to self-modify into a UDT agent whose prior equals the CDT agent's posterior [**ETA:** wait, sorry, no, they won't act as if they can acausally control other instances of the same program, but they *will* self-modify so as to make future instances of themselves (which obviously they control causally) act in a way that maximizes EU according to the agent's *present* posterior, and that's what we need here], and will use the second formula above accordingly -- they don't want to be a general CM-rejecter, but they think that they can do even better than being a general CM-accepter if they refuse to pay up if *at the time of self-modification* they assigned low probability to tails, even conditional on Omega offering them a CM.

## ↑ comment by Wei_Dai · 2012-12-04T13:52:16.893Z · LW(p) · GW(p)

He never explained further, and actually I still don't quite understand the example even given your explanation. Maybe you can reply directly to Eliezer's comment so he can see it in his inbox, and let us know if he still thinks it's a problem for UDT?

## ↑ comment by Vladimir_Nesov · 2009-09-10T16:52:03.873Z · LW(p) · GW(p)

But the key ingredient that is missing is a theory of logical uncertainty, that tells us how different agents (or more generally, computational processes) are logically correlated to each other.

I'd look for it as logical theory of concurrency and interaction: "uncertainty" fuzzifies the question.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-11T19:10:30.135Z · LW(p) · GW(p)

I'd look for it as logical theory of concurrency and interaction: "uncertainty" fuzzifies the question.

Why? For me, how different agents are logically correlated to each other seems to be the same type of question as "what probability (if any) should I assign to P!=NP?" Wouldn't the answer fall out of a general theory of logical uncertainty? (ETA: Or at least be illuminated by such a theory?)

Replies from: Vladimir_Nesov## ↑ comment by Vladimir_Nesov · 2009-09-11T20:43:48.290Z · LW(p) · GW(p)

Logic is already in some sense about uncertainty (e.g. you could interpret predicates as states of knowledge). When you add one more "uncertainty" of some breed, it leads to perversion of logic, usually of applied character and barren meaning.

The concept of "probability" is suspect, I don't expect it to have foundational significance.

Replies from: Wei_Dai## ↑ comment by Wei_Dai · 2009-09-11T21:12:25.989Z · LW(p) · GW(p)

So what would you call a field that deals with how one ought to make bets involving P!=NP (i.e., mathematical statements that we can't prove to be true or false), if not "logical uncertainty"? Just "logic"? Wouldn't that cause confusion in others, since today it's usually understood that such questions are outside the realm of logic?

Replies from: Vladimir_Nesov## ↑ comment by Vladimir_Nesov · 2009-09-11T21:19:34.198Z · LW(p) · GW(p)

I don't understand how to make such bets, except in a way it's one of the kinds of human decision-making that can be explicated in terms of priors and utilities. The logic of this problem is in the process that works with the statement, which is in the domain of proof theory.

## comment by Psy-Kosh · 2009-09-08T19:58:33.839Z · LW(p) · GW(p)

I think I'll have to sit and reread this a couple times, but my *INITIAL* thought is "Isn't the apparent inconsistancy here qualitatively similar to the situation with a counterfactual mugging?"

## ↑ comment by Nisan · 2009-09-08T21:25:36.950Z · LW(p) · GW(p)

This is my reaction too. This is a decision involving Omega in which the right thing to do is not update based on new information. In decisions not involving Omega, you do want to update. It doesn't matter whether the new information is of an anthropic nature or not.

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-08T21:33:03.004Z · LW(p) · GW(p)

Yeah, thought about it a bit more, and still seems to be more akin to "paradox of counterfactual mugging" than "paradox of anthropic reasoning"

To me, confusing bits of anthropic reasoning would more come into play via stuff like "aumann agreement theorem vs anthropic reasoning"

## comment by nshepperd · 2012-11-18T04:02:19.835Z · LW(p) · GW(p)

Huh. Reading this again, together with byrnema's pointer discussion and Psy-Kosh's non-anthropic reformulation...

It seems like the problem is that whether each person gets to make a decision depends on the evidence they think they have, in such a way to make that evidence meaningless. To construct an extreme example: The Antecedent Mugger gathers a billion people in a room together, and says:

"I challenge you to a game of wits! In this jar is a variable amount of coins, between $0 and $10,000. I will allow each of you to weigh the jar using this set of extremely imprecise scales. Then I will ask each of you whether to accept my offer: to as a group buy the jar off me for $5000, the money to be distributed equally among you. Note: although I will ask all of you, the only response I will consider is the one given by the person with the greatest subjective expected utility from saying 'yes'."

In this case, even if the jar always contains $0, there will always be someone who receives enough information from the scales to think the jar contains >$5000 with high probability, and therefore to say yes. Since that person's response is the one that is taken for the whole group, the group always pays out $5000, resulting in a money pump in favour of the Mugger.

The problem is that, from an outside perspective, the observations of the one who gets to make the choice are almost completely uncorrelated from the actual contents of the jar, due to the Mugger's selection process. For any general strategy `Observations → Response`

, the Mugger can always summon enough people to find *someone* who has seen the observations that will produce the response he wants, unless the strategy is a constant function.

Similarly, in the problem with the marbles, only the people with the observation `Green`

get any influence, so the observations of "people who get to make a decision" are uncorrelated with the actual contents of the buckets (even though observations of the participants in general *are* correlated with the buckets).

## ↑ comment by Kindly · 2012-11-18T04:28:37.245Z · LW(p) · GW(p)

The problem here is that your billion people are for some reason giving the answer most likely to be correct rather than the answer most likely to actually be *profitable*. If they were a little more savvy, they could reason as follows:

"The scales tell me that there's $6000 worth of coins in the jar, so it seems like a good idea to buy the jar. However, if I did not receive the largest weight estimate from the scales, my decision is irrelevant; and if I *did* receive the largest weight estimate, then conditioned on that it seems overwhelmingly likely that there are many fewer coins in the jar than I'd think based on that estimate -- and in that case, I ought to say no."

## ↑ comment by nshepperd · 2012-11-18T05:34:55.918Z · LW(p) · GW(p)

Ooh, and we can apply similar reasoning to the marble problem if we change it, in a seemingly isomorphic way, so that instead of making the trade based on all the responses of the people who saw a green marble, Psy-Kosh selects one of the green-marble-observers at random and considers that person's response (this should make no difference to the outcomes, assuming that the green-marblers can't give different responses due to no-spontaneous-symmetry-breaking and all that).

Then, conditioning on drawing a green marble, person A infers a 9/10 probability that the bucket contained 18 green and 2 red marbles. However, if the bucket contains 18 green marbles, person A has a 1/18 chance of being randomly selected given that she drew a green marble, whereas if the bucket contains 2 green marbles, she has a 1/2 chance of being selected. So, conditioning on her response being the one that matters *as well as the green marble itself*, she infers a (9:1) * (1/18)/(1/2) = (9:9) odds ratio, that is probability 1/2 the bucket contains 18 green marbles.

Which leaves us back at a kind of anthropic updating, except that this time it resolves the problem instead of introducing it!

## comment by tim · 2009-09-10T00:06:15.833Z · LW(p) · GW(p)

isn't this a problem with the frequency you are presented with the opportunity to take the wager? [no, see edit]

the equation: (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20
neglects to take into account that you will be offered this wager nine times more often in conditions where you win than when you lose.

for example, the wager: "i will flip a fair coin and pay you $1 when it is heads and pay you -$2 when it is tails" is -EV in nature. however if a conditional is added where you will be asked if you want to take the bet 90% of the time given the coin is heads (10% of the time you are 'in a red room') and 10% of the time given the coin is tails (90% of the time you are 'in a red room'), your EV changes from (.5)(1) + (.5)(-2) = -.5 to (.5)(.9)($1) + (.5)(.1)(-$2) = $.35 representing the shift from "odds the coin comes up heads" to "odds the coin comes up heads and i am asked if i want to take the bet"

it seems like the same principle would apply to the green room scenario and your pre-copied self would have to conclude that though the two outcomes are +$12 or -$52, they do not occur with 50-50 frequency and given you are offered the bet, you have a 90% chance of winning. (.9)($12) + (.1)(-$52) = $5.6

EDIT: okay, after thinking about it, i am wrong. the reason i was having trouble with this was the fact that when the coin comes up tails and 90% of the time i am in a red room, even though "i" am not being specifically asked to wager, my two copies in the green rooms are - and they are making the wrong choice because of my precommitment to taking the wager given i am in a green room. this makes my final EV calculation wrong as it ignores trials where "i" appear in a red room even though the wager still takes place.

its interesting that this paradox exists because of entities other than yourself (copies of you, paperclip maximizers, etc) making the "incorrect" choice the 90% of the time you are stuck in a red room with no say.

Replies from: tim## ↑ comment by tim · 2009-09-11T20:28:48.455Z · LW(p) · GW(p)

some other thoughts. the paradox exists because you cannot precommit yourself to taking the wager given you are in a green room as this commits you to taking the wager on 100% of coinflips which is terrible for you.

when you find yourself in a green room, the right play IS to take the wager. however, you can't make the right play without committing yourself to making the wrong play in every universe where the coin comes up tails. you are basically screwing your parallel selves over because half of them exist in a 'tails' reality. it seems like factoring in your parallel expectation cancels out the ev shift of adjusting you prior (50%) probability to 90%.

and if you don't care about your parallel selves, you can just think of them as the components that average to your true expectation in any given situation. if the overall effect across all possible universes was negative, it was a bad play even if it helped you in this universe. metaphysical hindsight.

## comment by Jack · 2009-09-08T19:56:00.403Z · LW(p) · GW(p)

If the many worlds interpretation of quantum mechanics is true isn't anthropic reasoning involved in making predictions about the future of quantum systems. There exists some world in which, from the moment this comment is posted onward, all attempts to detect quantum indeterminacy fail, all two-slit experiments yield two distinct lines instead of a wave pattern etc. Without anthropic reasoning we have no reason to find this result at all surprising. So either we need to reject anthropic reasoning or we need to reject the predictive value of quantum mechanics under the many worlds interpretation. Right?

(Apologies if this has been covered, I'm playing catch-up and just trying to hash things out for myself. Also should I expect to be declared a prophet in the world in which quantum indeterminacy disappears from here on out?)

Replies from: Douglas_Knight, Johnicholas## ↑ comment by Douglas_Knight · 2009-09-09T01:59:54.067Z · LW(p) · GW(p)

If the many worlds interpretation of quantum mechanics is true isn't anthropic reasoning involved in making predictions about the future of quantum systems.

Basic QM seems to say that probability is ontologically basic. In a collapse point of view, it's what we usually think of as probability that shows up in decision theory. In MWI, both events happen. But you could talk about usual probability either way. ("classical probability is a degenerate form of quantum probability" with or without collapse)

Anthropics is about the interaction of probability with the number of observers.

Replacing usual probability with QM doesn't seem to me to make a difference. Quantum suicide is a kind of anthropics, but it's not clear to me in what sense it's really quantum. It's mainly about rejecting the claim that the Born probabities are ontologically basic, that they measure how real an outcome is.

Replies from: Jack## ↑ comment by Jack · 2009-09-09T03:23:29.711Z · LW(p) · GW(p)

But in MWI isn't the observed probability of some quantum state just the fraction of worlds in which an observer would detect that quantum state? As such, doesn't keeping the probabilities of quantum events as QM predicts require that "one should reason as if one were a random sample from the set of all observers in one’s reference class" (from a Nick Bostrom piece). The reason we think our theory of QM is right is that we think our branch in the multi-verse didn't get cursed with an unrepresentative set of observed phenomena.

Wouldn't a branch in the multi-verse that observed quantum events in which values were systematically distorted (by random chance) come up with slightly different equations to describe quantum mechanics? If so, what reason do we have to think that our equations are correct if we don't consider our observations to be similar to the observations made in other possible worlds?

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-09T03:34:05.647Z · LW(p) · GW(p)

It's not just world counting... (Although Robin Hanson's Mangled World's idea does suggest a way that it *may* turn out to amount to world counting after all)

essentially one has to integrate the squared modulus of quantum amplitude over a world. This is proportional to the subjective probability of experiencing that world.

Yes... that it isn't simple world counting does seem to be a problem. This is something that we, or at least I, am confused about.

Replies from: Jack## ↑ comment by Jack · 2009-09-09T04:13:00.763Z · LW(p) · GW(p)

Thanks. Good to know. I don't suppose you can explain why it works that way?

Replies from: Psy-Kosh## ↑ comment by Psy-Kosh · 2009-09-09T04:48:24.812Z · LW(p) · GW(p)

As I said, that's something I'm confused about, and apparently others are as well.

We've got the linear rules for how quantum amplitude flows over configuration space, then we've got this "oh, by the way, the subjective probability of experiencing any chunk of reality is proportional to the square of the absolute value" rule.

There're a few ideas out there, but...

## ↑ comment by Johnicholas · 2009-09-08T22:30:37.449Z · LW(p) · GW(p)

Would you expand and sharpen your point? Woit comes to mind.

At one point you claim, possibly based on MWI, that "there is some world in which ...". As far as I can tell, the specifics of the scenario shouldn't have anything to do with the correctness of your argument.

This is how I would paraphrase your comment:

- According to MWI, there exists some world in which unlikely things happen.
- We find this surprising.
- Anthropic reasoning is necessary to conclude 2.
- Anthropic reasoning is involved in making predictions about quantum systems.

In step 2: Who is the "we"? What is the "this"? Why do we find it surprising? In step 3: What do you mean by "anthropic reasoning"? In general, it is pretty hard metareasoning to conclude that a reasoning step or maneuver is necessary for a conclusion.

Replies from: Jack## ↑ comment by Jack · 2009-09-09T04:07:28.360Z · LW(p) · GW(p)

We don't need anthropic reasoning under MWI in order to be surprised when finding ourselves in worlds in which unlikely things happen so much as we need anthropic reasoning to conclude that an unlikely thing has happened. And our ability to conclude that an unlikely thing has happened is needed to accept quantum mechanics as a successful scientific theory.

"We" is the set of observers in the worlds where events, declared to be unlikely by quantum mechanics actually happen. An observer is any physical system with a particular kind of causal relation to quantum states such that the physical system can record information about quantum states and use the information to come up with methods of predicting the probability of previously unobserved quantum processes (or something, but if we can't come up with a definition of observer then we shouldn't be talking about anthropic reasoning anyway).

- According to MWI, the (quantum) probability of a quantum state is defined as the fraction of worlds in which that state occurs.
- The only way an observer somewhere in the multi-verse can trust the observations used that confirm quantum mechanics probabilistic interpretations is if they reason as if they were a random sample from the set of all observers in the multi-verse (one articulation of anthropic reasoning) because if they can't do that then they have no reason to think their observations aren't wrong in a systematic way.
- An observer's reason for believing the standard model of QM to be true the first place is that they can predict atomic and subatomic particles behaving according a probabilistic wave-function.
- Observers lose their reason for trusting QM in the first place if they accept the MWI AND are prohibited reason anthropically.

In other words If MWI is likely, then QM is likely iff AR is acceptable.

I think one could write a different version of this argument by referencing expected surprise at discovering sudden changes in quantum probabilities (which I was conflating with the first argument in my first comment) but the above version is probably easier to follow.

Replies from: Johnicholas## ↑ comment by Johnicholas · 2009-09-12T01:23:53.081Z · LW(p) · GW(p)

Can I paraphrase what you just said as:

"If many-worlds is true, then all evidence is anthropic evidence"

Replies from: Jack## ↑ comment by Jack · 2009-09-12T18:34:00.240Z · LW(p) · GW(p)

I hadn't come to that conclusion until you said it... but yes, that is about right. I'm not sure I would say all evidence is anthropic- I would prefer saying that all updating involves a step of anthropic reasoning. I make that hedge just because I don't know that direct sensory information is anthropic evidence, just that making good updates with that sensory information is going to involve (implicit) anthropic reasoning.

## comment by wedrifid · 2009-09-09T16:59:00.631Z · LW(p) · GW(p)

Timeless decision agents reply as if controlling all similar decision processes, including all copies of themselves. Classical causal decision agents, to reply "Yes" as a group, will need to somehow work out that other copies of themselves reply "Yes", and then reply "Yes" themselves. We can try to help out the causal decision agents on their coordination problem by supplying rules such as "If conflicting answers are delivered, everyone loses $50". If causal decision agents can win on the problem "If everyone says 'Yes' you all get $10, if everyone says 'No' you all lose $5, if there are conflicting answers you all lose $50" then they can presumably handle this. If not, then ultimately, I decline to be responsible for the stupidity of causal decision agents.

The coordination hack to work around some of the stupidity of causal decision agents doesn't appear to be necessary here.

"Somehow working out that the other copies of themselves reply 'yes'" should be trivial for an agent focussed on causality when the copies are identical, have no incentive to randomise and have identical inputs. If the payoff for others disagreeing is identical to the payoff for 'no' they can be ignored. The conflict penalty makes the coordination problem more difficult for the causal agent in this context, not less.

## comment by gelisam · 2009-09-09T04:01:12.087Z · LW(p) · GW(p)

The reason we shouldn't update on the "room color" evidence has nothing to do with the fact that it constitutes anthropic evidence. The reason we shouldn't update is that we're *told*, albeit indirectly, that we shouldn't update (because if we do then some of our copies will update differently and we will be penalized for our disagreement).

In the real world, there is no incentive for all the copies of ourselves in all universes to agree, so it's all right to update on anthropic evidence.

## comment by byrnema · 2009-09-09T01:36:00.373Z · LW(p) · GW(p)

[comment deleted]

Oops... my usual mistake of equivocating different things and evolving the problem until it barely resembles the original. I will update my "solution" later if it still works for the original.

... Sigh. Won't work. My previous "solution" recovered the correct answer of -20 because I bent the rules enough to have each of my green-room-deciders make a global rather than anthropic calculation.

Replies from: byrnema## ↑ comment by byrnema · 2009-09-09T16:56:26.514Z · LW(p) · GW(p)

Thinking about how all the green-room people come to the wrong conclusion makes my brain hurt. But I suppose, finally, it is true. They cannot base their decision on their subjective experience, and here I'll outline some thoughts I've had as to under what conditions they should know they cannot do so.

Suppose there are 20 people (Amy, Benny, Carrie, Donny, ...) and this experiment is done as described. If we always ask Tony (the 20th person) whether or not to say "yes", and he bases his decision on whether or not he is in a green room, then the expected value of his decision *really is* $5.6. Tony here is a special, singled out "decider". One way of looking at this situation is that the 'yes' depends on some information in the system (that is, whether or not Tony was in a green room.)

If instead we say that the decider can be anyone, and in fact we choose the decider *after* the assortment into rooms as someone in a green room, then we are not really given any information about the system.

It is the difference between (a) picking a person, and seeing if they wake up in a green room, and (b) picking a person that is in a green room. (I know you are well aware of this difference, but it helps to spell it out.)

You can't pick the deciders from a set with a prespecified outcome. It's a pointer problem: You can learn about the system from the change of state from Tony to Tony* (Tony: no room -->Tony: green room), but you can't *assign* the star after the assignment (pick someone in a green room and ask them).

When a person wakes in a green room and is asked, they should say 'yes' if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, *because* they awoke in a green room, they should recognize this as the “unfixed pointer problem” (a special kind of selection bias).

Avoiding the pointer problem is straight-forward. The people who wake in red rooms have a posterior probability of heads as 10%. The people who wake in green rooms have a posterior probability of heads as 90%. Your posterior probability is meaningful only if your posterior probability *could have been* either way. Since Eliezer only asks people who woke in green rooms, and never asks people who woke in red rooms, the posterior probabilities are not meaningful.

## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T18:26:54.761Z · LW(p) · GW(p)

The people who wake in red rooms have a posterior probability of heads as 10%. The people who wake in green rooms have a posterior probability of heads as 90%. Your posterior probability is meaningful only if your posterior probability could have been either way. Since Eliezer only asks people who woke in green rooms, and never asks people who woke in red rooms, the posterior probabilities are not meaningful.

The rest of your reply makes sense to me, but can I ask you to amplify on this? Maybe I'm being naive, but to me, a 90% probability is a 90% probability and I use it in all my strategic choices. At least that's what I started out thinking.

Now you've just shown that a decision process won't want to strategically condition on this "90% probability", because it always ends up as "90% probability" regardless of the true state of affairs, and so is not strategically informative to green agents - even if the probability seems well-calibrated in the sense that, looking over impossible possible worlds, green agents who say "90%" are correct 9 times out of 10. This seems like a conflict between an anthropic sense of probability (relative frequency in a population of observers) and a strategic sense of probability (summarizing information that is to be used to make decisions), or something along those lines. Is this where you're pointing toward by saying that a posterior probability is meaningful at some times but not others?

Replies from: byrnema## ↑ comment by byrnema · 2009-09-09T20:28:43.262Z · LW(p) · GW(p)

a decision process won't want to strategically condition on this "90% probability", because it always ends up as "90% probability" regardless of the true state of affairs, and so is not strategically informative to green agents

The 90% probability is *generally* strategically informative to green agents. They may legitimately point to themselves for information about the world, but in this specific case, there is confusion about who is doing the pointing.

When you think about a problem anthropically, *you* yourself are the pointer (the thing you are observing before and after to make an observation) and you assign yourself as the pointer. This is going to be strategically sound in all cases in which you don't change as the pointer before and after an observation. (A pretty normal condition. Exceptions would be experiments in which you try to determine the probability that a certain activity is fatal to yourself -- you will never be able to figure out the probability that you will die of your shrimp allergy by repeated trials of consuming shrimp, as it will become increasingly skewed towards lower and lower values.)

Likewise, if I am in the experiment described in the post and I awaken in a green room I should answer "yes" to your question if I determine that you asked me randomly. That is, that you would have asked me even if I woke in a red room. In which case my anthropic observation that there is a 90% probability that heads was flipped is quite sound, as usual.

On the other hand, if you ask me only if I wake in a green room, then you wouldn’t have asked “me” if I awoke in a red room. (So I must realize this isn’t really about *me* assigning myself as a pointer, because “me” doesn’t change depending on what room I wake up in.) It's strange and requires some mental gymnastics for me to understand that *you* Eliezer are picking the pointer in this case, even though you are asking me about my anthropic observation, for which I would usually expect to assign myself as the pointer.

So for me this is a pointer/biased-observation problem. But the anthropic problem is related, because we as humans cannot ask about the probability of currently observed events based on the frequency of observations which, had they been otherwise, would not have permitted ourselves to ask the question.

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T20:49:33.021Z · LW(p) · GW(p)

On the other hand, if you ask me only if I wake in a green room, then you wouldn’t have asked “me” if I awoke in a red room. (So I must realize this isn’t really about me assigning myself as a pointer, because “me” doesn’t change depending on what room I wake up in.)

Huh. Very interesting again. So in other words, the probability that I would use for myself, is not the probability that I should be using to answer questions from this decision process, because the decision process is using a different kind of pointer than my me-ness?

How would one formalize this? Bostrom's division-of-responsibility principle?

Replies from: byrnema## ↑ comment by byrnema · 2009-09-09T21:15:21.470Z · LW(p) · GW(p)

I haven't had time to read this, but it looks possibly relevant (it talks about the importance of whether an observation point is fixed in advance or not) and also possibly interesting, as it compares Bayesian and frequentist views.

I will read it when I have time later... or anyone else is welcome to if they have time/interest.

Replies from: byrnema## ↑ comment by byrnema · 2009-09-11T09:58:31.791Z · LW(p) · GW(p)

What I got out of the article above, since I skipped all the technical math, was that frequentists consider "the pointer problem" (i.e., just your usual selection bias) as something that needs correction while Bayesians don't correct in these cases. The author concludes (I trust, via some kind of argument) that Bayesian's don't need to correct if they choose the posteriors carefully enough.

I now see that I was being entirely consistent with my role as the resident frequentist when I identified this as a "pointer problem" problem (which it is) but that doesn't mean the problem can't be pushed through without correction* -- the Bayesian way -- by carefully considering the priors.

*"Requiring correction" then might be a euphemism for time-dependent, while a preference for an updateless decision theory is a good Bayesian quality. A quality, by the way, a frequentist can appreciate as well, so this might be a point of contact on which to win frequentists over.

## comment by twanvl · 2009-09-08T22:28:24.481Z · LW(p) · GW(p)

Before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50%

((18+$1) + (2-$3))) + (50%((18-$3) + (2+$1))) = -$20

This assumes that the question is asked only once, but then, to which of the 20 copies will it be asked?

If all 20 copies get asked the same question (or equivalently if a single copy chosen at random is) then the utility is (50% * 18/20 * ((18 * +$1) + (2 * -$3))) + (50% * 2/20 * ((18 * -$3) + (2 * +$1))) = 2.8$ = 50% * 5.6$.

Consider the similar thought experiments:

- I flip a fair coin to determine whether to switch to my headdy coin or my tailly coin, which have a 90% and 10% probability of heads respectively.
- Now I flip this biased coin. If it comes up heads then I paint the room green, if it comes up tails I paint it red.
- You then find yourself in a green room.
- Then I flip the biased coin again, and repaint the room.
- Before this second flip, I offer you the bet of +1$ if the room stays green and -3$ if it becomes red.

The prior expected utility before the experiment is:

```
E(util|headdy) = 90% * 1$ + 10% * -3$ = 0.6$
E(util|tailly) = 10% * 1$ + 90% * -3$ = -2.6$
E(util) = 50% * E(util|headdy) + 50% * E(util|tailly) = -1$
```

Given that you find yourself in a green room after the first flip, you can determine the probability that the headdy coin is used:

```
P(green) = 0.5
P(green|headdy) = 0.9
P(headdy|green) = 0.9
```

Which gives a posterior utility:

```
E(util|green) = 0.9 * E(util|headdy) + 0.1 * E(util|tailly) = 0.28$
```

Replies from: DanArmak, DanArmak## ↑ comment by DanArmak · 2009-09-08T22:41:50.523Z · LW(p) · GW(p)

This assumes that the question is asked only once, but then, to which of the 20 copies will it be asked?

Every copy that is in a green room is asked the question (so either 2 or 18 copies total are asked). If all answer Play, we play. If all answer Don't Play, we don't. In any other case we fine all 20 copies some huge amount; this is intended to make them agree beforehand on what answer to give. (This is reworded from the OP.)

For your other thought experiment - if there aren't actual N copies being asked the question, then there's no dilemma; you (the only copy) simply update on the evidence available (that the room is green). So yes, the original problem requires copies being asked in parallel to introduce the possibility that you're hurting *other* copies of yourself by giving a self-serving answer. Whereas if you're the only copy, you always give a self-serving answer, i.e. play only if the room is green.

## ↑ comment by DanArmak · 2009-09-08T22:40:56.532Z · LW(p) · GW(p)

This assumes that the question is asked only once, but then, to which of the 20 copies will it be asked?

Every copy that is in a green room is asked the question (so either 2 or 18 copies total are asked). If all answer Play, we play. If all answer Don't Play, we don't. In any other case we fine all 20 copies some huge amount; this is intended to make them agree beforehand on what answer to give. (This is reworded from the OP.)

For your other thought experiment - if there aren't actual N copies being asked the question, then there's no dilemma; you (the only copy) simply update on the evidence available (that the room is green). So yes, the original problem requires copies being asked in parallel to introduce the possibility that you're hurting *other* copies of yourself by giving a self-serving answer. Whereas if you're the only copy, you always give a self-serving answer, i.e. play only if the room is green.

## comment by mamert · 2016-05-16T10:20:27.702Z · LW(p) · GW(p)

I keep having trouble thinking of probabilities when I'm to be copied and >=1 of "me" will see red and >=1 of "me" will see green. My thought is that it is 100% likely that "I" will see red and know there are others, once-mes, who see green, and 100% likely vice-versa. Waking up to see red (green) is exactly the expected result.

I do not know what to make of this opinion of mine. It's as if my definition of self - or choice of body - is in superposition. Am I committing an error here? Suggestions for further reading would be appreciated.

## comment by Angela · 2014-08-06T22:22:24.702Z · LW(p) · GW(p)

I remain convinced that the probability is 90%.

The confusion is over whether you want to maximize the expectation of the number of utilons there will be if you wake up in a green room or the expectation of the number of utilons you will observe if you wake up in a green room.

## comment by **[deleted]** ·
2014-03-07T13:24:19.442Z · LW(p) · GW(p)

The notion of "I am a bolzmann brain" goes away when you conclude that conscious experience is a Tegmark-4 thing, and that equivalent conscious experiences are mathematically equal and therefore there is no difference and you are at the same time a human being and a bolzmann brain, at least until they diverge.

Thus, antrhopic reasoning is right out.

Replies from: Kawoomba## ↑ comment by Kawoomba · 2014-03-07T18:40:17.435Z · LW(p) · GW(p)

Well, by the same token "What I experience represents what I think it does / I am not a Boltzmann brain which may dwindle out of existence in an instance" would go right out, just the same. This kind of reasoning reduces to something similar to quantum suicide. The point at which your conscious experience is expected to diverge, even if you take that perspective, does kind of matter. The different paths and their probabilistic weights which govern the divergence alter your expected experience, after all. Or am I misunderstanding?

Replies from: None## ↑ comment by **[deleted]** ·
2014-03-10T21:09:07.467Z · LW(p) · GW(p)

I am not sure.

Let met try to clarify.

By virtue of existential quantification in a ZF equivalent set theory, we can have anything.

In an arbitrary encoding format, I now by existential quantfication select a set which is the momentary subjective experience of being me as I write this post, e.g. memory sensations, existential sensations, sensory input, etc.

It is a mathematical object. I can choose it's representation format independent of any computational medium I might use to implement it.

I just so happens that there is a brain in the universe we are in, which is implementing this matematical object.

Brains are computers that compute conscious experiences.

They no more have bearing on the mathematical objects they implement than a modern computer has on the definition of conways game of life.

Does that clarify it?

Replies from: Kawoomba## ↑ comment by Kawoomba · 2014-03-10T21:30:27.652Z · LW(p) · GW(p)

I just so happens that there is a brain in the universe we are in, which is implementing this matematical object.

Which is why we're still highly invested in the question whether (whatever it is that generates our conscious experience) will "stay around" and continue with our pattern in an expected manner.

Let's say we identify with only the mathematical object, not the representation format at all. That doesn't excuse us from anthropic reasoning, or from a personal investment in reasoning about the implementing "hardware". We'd still be highly invested in the question, even as 'mathematical objects'. We probably still care about being continually instantiated.

The shift in perspective you suggest doesn't take away from that (and adds what could be construed as a flavor of dualism).

Replies from: None## ↑ comment by **[deleted]** ·
2014-03-11T13:58:46.220Z · LW(p) · GW(p)

Hmmm.

I will have to mull on that, but let me leave with a mote of explanation:

The reasoning strategy I used to arrive at this conclusion was similar to the one used in concluding that "every possible human exists in paralell universes, so we need not make more humans, but more humans feeling good."

Replies from: Jiro## ↑ comment by Jiro · 2014-03-11T14:53:03.520Z · LW(p) · GW(p)

Doesn't every possible human-feeling-good also exist in parallel universes?

(And if you argue that although they exist you can increase their measure, that applies to the every-possible-human version as well.)

Replies from: None## ↑ comment by **[deleted]** ·
2014-03-11T20:36:07.257Z · LW(p) · GW(p)

Sure, but I will quote Karkat Vantas on time-travel shenanigans from Andrew Hussie's *Homestuck*

CCG: EVERYBODY, DID YOU HEAR THAT?? SUPERFUTURE VRISKA HAS AN IMPORTANT LIFE LESSON FOR US ALL.

CCG: WE DON'T HAVE TO WORRY ABOUT OUR PRESENT RESPONSIBILIES AND OBLIGATIONS!

CCG: BECAUSE AS IT TURNS OUT, IN THE FUTURE ALL THAT STUFF ALREADY HAPPENED. WE'RE OFF THE FUCKING HOOK!

## comment by byrnema · 2009-09-11T08:57:59.699Z · LW(p) · GW(p)

Whoohoo! I just figured out the correct way to handle this problem, that renders the global and egocentric/internal reflections consistent.

We will see if my solution makes sense in the morning, but the upshot is that there was/is nothing wrong with the green roomer's *posterior*, as many people have been correctly defending. The green roomer who computed an EV of $5.60 modeled the money pay-off scheme wrong.

In the incorrect calculation that yields $5.6 EV, the green roomer models himself as winning (getting the favorable +$12) when he is right and losing (paying the -$52) when he is wrong. But no, not exactly. The green roomer doesn't win every time he's right -- even though certainly he's right every time he's right.

The green roomer wins 1 out of every 18 times that he's right, because 17 copies of himself that were also right do not get their own independent winnings, and he loses 1 out of every 2 times he's wrong, because there are 2 of him that are wrong in the room that pays $52.

So it is Bostrom's division-of-responsibility, with the justification. It is probably more apt to name it division-of-*reward*.

Here's is the correct green roomer calculation:

EV = P(heads)(payoff given heads)(rate of payoff given heads)+ P(tails)(payoff given tails)(rate of payoff given tails)

=.9($12)(1/18)+.1(-$52)(1/2) = -2

(By the way, this doesn't modify what I said about pointers, but I must admit I don't understand at the moment how the two perspectives are related. Yet; some thoughts.)

Replies from: byrnema, byrnema## ↑ comment by byrnema · 2009-09-11T17:43:26.324Z · LW(p) · GW(p)

This is my attempt at a pedagogical exposition of “the solution”. It’s overly long, and I've lost perspective completely about what is understood by the group here and what isn't. But since I've written up this solution for myself, I'll go ahead and share it.

The cases I'm describing below are altered from the OP so that they completely non-metaphysical, in the sense that you could implement them in real life with real people. Thus there is an objective reality regarding whether money is collectively lost or won, so there is finally no ambiguity about what the correct calculation actually is.

Suppose that there are twenty different graduate students {Amy, Betty, Cindy, ..., Tony} and two hotels connected by a breezeway. Hotel Green has 18 green rooms and 2 red rooms. Hotel Red has 18 red rooms and 2 green rooms. Every night for many years, students will be assigned a room in either Hotel Green or Hotel Red depending on a coin flip (heads --> Hotel Green for the night, tails --> Hotel Red for the night). Students won’t know what hotel they are in but can see their own room color only. If a student sees a green room, that student correctly deduces they are in Hotel Green with 90% probability.

**Case 1**: Suppose that every morning, Tony is allowed to bet that he is in a green room. If he bets ‘yes’ and is correct, he pockets $12. If he bets ‘yes’ and is wrong, he has to pay $52. (In other words, his payoff for a correct vote is $12, the payoff for a wrong vote is -$52.) What is the expected value of his betting if he always says ‘yes’ if he is in a green room?

For every 20 times that Tony says ‘yes’, he wins 18 times (wins $12x18) and he loses twice (loses $52x2), consistent with his posterior. One average he wins $5.60 per bet , or $2.80 per night. (He says “yes” to the bet 1 out of every 2 nights, because that is the frequency with which he finds himself in a green room.) This is a steady money pump in the student’s favor.

The correct calculation for Case 1 is:

average payoff per bet = (probability of being right)x(payoff if right)+ (probability of being wrong)x(payoff if wrong) = .9x18+.1x-52 =5.6.

**Case 2**: Suppose that Tony doesn’t pocket the money, but instead the money is placed in a tip jar in the breezeway. Tony’s betting contributes $2.80 per night on average to the tip jar.

**Case 3**: Suppose there is nothing special about Tony, and all the students get to make bets. They will all make bets when they wake in green rooms, and add $2.80 per night to the tip jar on average. Collectively, the students add $56 per night to the tip jar on average. (If you think about it a minute, you will see that they add $216 to the tip jar on nights that they are assigned to hotel Green and lose $104 on nights that they are assigned to hotel Red.) If the money is distributed back to the students, they each are making $2.80 per night, the same steady money pump in their favor that Tony took advantage of in Case 1.

**Case 4**: Now consider the case described in the OP. We already understand that the students will vote “yes” if they wake in a green room and that they expect to make money doing so. Now the rules are going to change, however, so that when all the green roomers unanimously vote “yes”, $12 are added to the tip jar if they are correct and $52 are subtracted if they are wrong. Since the students are assigned to Hotel Green half the time and to Hotel Red half the time, on average the tip jar loses $20 every night. Suddenly, the students are losing $1 a night!

Each time a student votes correctly, it is because they are *all* in Hotel Green, as per the initial set up of the problem in the OP. So all 18 green roomer votes are correct and collectively earn $12 for that night. The payoff is $12/18 *per correct vote*. Likewise, the payoff per wrong vote is -$52/2.

So the correct calculation for case 4 is as follows:

average payoff per bet = (probability of being right)x(payoff if right)+ (probability of being wrong)x(payoff if wrong) = .9x(18/12)+.1x(-52/2) = -2.

So in conclusion, in the OP problem, the green roomer must recognize that he is dealing with case #4 and not Case #1, in which the payoff is different (but not the posterior).

Replies from: mendel## ↑ comment by mendel · 2011-05-22T09:15:09.442Z · LW(p) · GW(p)

I believe both of your computations are correct, and the fallacy lies in mixing up the payoff for the group with the payoff for the individual - which the frame of the problem as posed does suggest, with multiple identities that are actually the same person. More precisely, the probabilities for the individual are 90/10 , but the probabilities for the groups are 50/50, and if you compute payoffs for the group (+$12/-$52), you need to use the group probabilities. (It would be different if the narrator ("I") offered the guinea pig ("you") the $12/$52 odds individually.)

byrnema looked at the result from the group viewpoint; you get the same result when you approach it from the individual viewpoint, if done correctly, as follows:

For a single person, the correct payoff is not $12 vs. -$52, but rather ($1 minus $6/18 to reimburse the reds, making $0.67) * 90% and ($1 minus $54/2 = -$26) * 10%, so each of the copies of the guinea pig is going to be out of pocket by 2/3* 0.9 + (-26) * 0.1 = 0.6 - 2.6 = -2, on average.

The fallacy of Eliezer's guinea pigs is that each of them thinks they get the $18 each time, which means that the 18 goes into his computation twice (squared) for their winnings (18 * 18/20). This is not a problem with antropic reasoning, but with statistics.

A distrustful individual would ask themselves, "what is the narrator getting out of it", and realize that the **narrator** will see the -$12 / + $52 outcome, not the guinea pig - and that to the narrator, the 50/50 probability applies. Don't mix them up!

## ↑ comment by byrnema · 2009-09-11T09:13:39.345Z · LW(p) · GW(p)

It was 3:30 in the morning just a short while ago, and I woke up with a bunch of non-sensical ideas about the properties of this problem, and then while I was trying to get back to sleep I realized that one of the ideas made sense. Evidence that understanding this problem for myself required a right-brain reboot.

I'm not surprised about the reboot: I've been thinking about this problem a lot, which signals to my brain that it's *important*, and it literally hurt my brain to think about why the green roomers were losing for the group when they thought they were winning, strongly suggesting I was hitting my apologist limit.

## comment by wedrifid · 2009-09-09T17:31:35.221Z · LW(p) · GW(p)

In personal conversation, Nick Bostrom suggested that a division-of-responsibility principle might cancel out the anthropic update - i.e., the paperclip maximizer would have to reason, "If the logical coin came up heads then I am 1/18th responsible for adding +1 paperclip, if the logical coin came up tails then I am 1/2 responsible for destroying 3 paperclips." I confess that my initial reaction to this suggestion was "Ewwww", but I'm not exactly comfortable concluding I'm a Boltzmann brain, either.

I would perhaps prefer to use different language in the description but this seems to be roughly the answer to the apparent inconsistency. When reasoning anthropically you must decide anthropically. Unfortunately it is hard to describe such decision making without using sounding either unscientific or outright incomprehensible

I'm rather looking forward to another Eleizer post on this topic once he has finished dissolving his confusion. I've gained plenty from absorbing the posts and discussions and more from mentally reducing the concepts myself. But this stuff is rather complicated and to be perfectly honest, I don't trust myself to not have missed something.

## comment by Emile · 2009-09-09T08:23:15.943Z · LW(p) · GW(p)

Let the dilemma be, "I will ask all people who wake up in green rooms if they are willing to take the bet 'Create 1 paperclip if the logical coinflip came up heads, destroy 3 paperclips if the logical coinflip came up tails'. (Should they disagree on their answers, I will destroy 5 paperclips.)" Then a paperclip maximizer, before the experiment, wants the paperclip maximizers who wake up in green rooms to refuse the bet. But a conscious paperclip maximizer who updates on anthropic evidence, who wakes up in a green room, will want to take the bet, with expected utility ((90%

+1 paperclip) + (10%-3 paperclips)) = +0.6 paperclips.

That last calculation doesn't look right to me : the paperclip maximizer in the green room still knows that there are other paperclip maximizers in red rooms who will refuse the bet whether or not they rely on anthropic evidence. So the expected utility of taking the bet would be 100% * - 5 paperclips.

Or did I misunderstand something?

Replies from: wedrifid## comment by PlaidX · 2009-09-09T06:59:51.560Z · LW(p) · GW(p)

Can someone come up with a situation of the same general form as this one where anthropic reasoning results in optimal actions and nonanthropic reasoning results in suboptimal actions?

Replies from: PlaidX## ↑ comment by PlaidX · 2009-09-09T07:13:25.122Z · LW(p) · GW(p)

How about if the wager is that anybody in any room can guess the outcome of the coinflip, and if they get it right they win 1$ and if they get it wrong they lose 2$?

If you still think it's 50% after waking up in a green room, you won't take the bet, and you'll win 0$, if you think it's 90% you'll take the bet and come out 14$ ahead on balance, with two of you losing 2$ each and 18 of you getting $1.

Doesn't this show anthropic reasoning is right as much as the OP shows it's wrong?

## comment by Dagon · 2009-09-08T23:22:30.942Z · LW(p) · GW(p)

I think you're missing a term in your second calculation. And why are anthropism and copies of you necessary for this puzzle. I suspect the answer will indicate something I'm completely missing about this series.

Take this for straight-up probability:

I have two jars of marbles, one with 18 green and 2 red, the other with 18 red and two green. Pick one jar at random, then look at one marble from that jar at random.

If you pick green, what's the chance that your jar is mostly green? I say 90%, by fairly straightforward application of bayes' rule.

I offer a wager: you get $1 per green and lose $3 per red marble in the jar you chose.

After seeing a green marble, I think your EV is $5.60. After seeing a red marble, I think your EV is $0 (you decline the bet). If you are forced to make the wager before seeing anything, conditional on drawing green, I think your EV is $2.80. I calculate it thus: 50% to get mostly-green jar, and 90% of that will you see green and take the bet, which is worth +$1*18 - $3*2 in this case. 50% to get mostly-red, 10% of which will you draw green, worth +1*2 - $3*18.
0.5 * 0.9 * (1 \* 18 - 3 * 2) + 0.5 * 0.1 * (1 \* 2 - 3 * 18) = 2.80, which is consistent: half the time you pick green, with EV of 5.60.

I think you left out the probability that you'll get green and take the bet in each of your 0.5 probabilities for the conditional strategy. Multiply a 0.9 to the first term and 0.1 into the second, and everything gets consistent.

Replies from: Eliezer_Yudkowsky## ↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-09-09T00:28:30.723Z · LW(p) · GW(p)

The problem is that we aren't asking one randomly selected person, we're asking *all* of the green ones (they have to agree unanimously for the Yes vote to go through).

## ↑ comment by Dagon · 2009-09-09T15:16:34.644Z · LW(p) · GW(p)

Ah, I see. You're asking all the green ones, but only paying each pod once. This feels like reverse-weighting the payout, so it should still be -EV even after waking up, but I haven't quite worked out a way to include that in the numbers...

## ↑ comment by timtyler · 2009-09-09T09:45:25.441Z · LW(p) · GW(p)

The second sum still seems wrong. Here it is:

"However, before the experiment, you calculate the general utility of the conditional strategy "Reply 'Yes' to the question if you wake up in a green room" as (50% * ((18 * +$1) + (2 * -$3))) + (50% * ((18 * -$3) + (2 * +$1))) = -$20. You want your future selves to reply 'No' under these conditions."

The sum given is the one you *would* perform *if* you did not know which room you woke up in. Surely a different sum is appropriate with the *additional* evidence that you awoke in a green room.

Incidentally, this problem seems far too complicated! I feel like the programmer faced with a bug report which failed to find some simple code that nontheless manages to reproduce the problem. Simplify, simplify, simplify!

## comment by rwallace · 2009-09-08T23:20:45.384Z · LW(p) · GW(p)

In this comment:

http://lesswrong.com/lw/17d/forcing_anthropics_boltzmann_brains/138u

I put forward my view that the best solution is to just maximize total utility, which correctly handles the forcing anthropics case, and expressed curiosity as to whether it would handle the outlawing anthropics case.

It now seems my solution does correctly handle the outlawing anthropics case, which would seem to be a data point in its favor.

Replies from: CarlShulman## ↑ comment by CarlShulman · 2009-09-09T18:37:42.366Z · LW(p) · GW(p)

Maximizing total hedonic utility fails the outlawing anthropics case: substitute hedons for paperclips.

Replies from: rwallace## comment by Christian_Szegedy · 2009-09-08T22:51:05.603Z · LW(p) · GW(p)

Assume that each agent **has his own game** (that is one game for each agent). That is there are overall 18 (or 2) games (depending the result of the coin flip.)

Then the first calculation would be correct in every respect, and it makes sense to say yes from a global point of view. (And also with any other reward matrix, the dynamic update would be consistent with the apriori decision all the time)

This shows that the error made by the agent was to implicitely assume that he **has his own game**.

## comment by lavalamp · 2009-09-08T21:45:38.115Z · LW(p) · GW(p)

How about give all of your potential clones a vote, even though you can't communicate?

So, in one case, 18 of you would say "Yes, take the bet!" and 2 would say "No, let me keep my money." In the other case, 18 would say no and two would say yes. In either case, of course, you're one of the ones who would vote yes. OK, that leaves us tied. So why not let everyone's vote be proportional to what they stand to gain/lose? That leaves us with 20 * -3 vs. 20 * 1. Don't take the bet.

(Yes, I realize half the people that just voted above don't exist. We just don't know which half...)

## comment by ArthurB · 2009-09-10T18:25:43.264Z · LW(p) · GW(p)

As it's been pointed out, this is not an anthropic problem, however there still is a paradox. I'm may be stating the obvious, but the root of the problem is that you're doing something fishy when you say that the other people will think the same way and that your decision will theirs.

The proper way to make a decision is to have a probability distribution on the code of the other agents (which will include their prior on your code). From this I believe (but can't prove) that you will take the correct course of action.

Newcomb like problem fall in the same category, the trick is that there is always a belief about someone's decision making hidden in the problem.

## comment by Christian_Szegedy · 2009-09-08T20:21:13.260Z · LW(p) · GW(p)

[EDIT:] Warning: This post was based on a misunderstanding of the OP. Thanks orthonormal for pointing out the the mistake! I leave this post here so that the replies stay in context.

I think that decision matrix of the agent waking up in green room is not complete: it should contain the outcome of losing $50 if the answers are not consistent.

Therefore, it would compute that even if the probability of the coin was flipped to 1 is 90%, it still does not make sense to answer "yes" since two other copies would answer "no" and therefore the penalty for not giving a uniform answer will outweigh the potential) win of $5.60. (Even without the penalty, the agent could infer that there were two dissenting copies of itself in that case and he has no ways to generate all the necessary votes to get the money.)

The error of the agent is not the P=90% estimate, but the implicit assumption that he is the only one to influence the outcome.

Replies from: orthonormal## ↑ comment by orthonormal · 2009-09-08T20:49:23.508Z · LW(p) · GW(p)

The copies in red rooms don't get to vote in this setup.

Replies from: Christian_Szegedy## ↑ comment by Christian_Szegedy · 2009-09-08T22:38:33.828Z · LW(p) · GW(p)

Thanks for pointing that out. Now I understand the problem.

However, I still think that the mistake made by the agent is the **implicit assumption the he is the only one influencing the outcome**.

Since all of the copies assume that they solely decide the outcome, they overestimate the reward after the anthropic update (each of the copies claim the whole reward for his decision, although the decision is collective and each vote is necessary).

Replies from: orthonormal## ↑ comment by orthonormal · 2009-09-15T06:35:00.001Z · LW(p) · GW(p)

By the way, please don't delete a comment if you change your mind or realize an error; it makes the conversation difficult for others to read. You can always put in an edit (and mark it as such) if you want.

I'd only delete one of my comments if I felt that its presence actually harmed readers, and that there was no disclaimer I could add that would prevent that harm.

Replies from: Christian_Szegedy## ↑ comment by Christian_Szegedy · 2009-09-15T06:41:47.868Z · LW(p) · GW(p)

OK, sorry. (In this special case, I remember thinking that your remark was perfectly understandable even without the context.)

## comment by **[deleted]** ·
2009-09-09T07:44:56.086Z · LW(p) · GW(p)

EDIT: at first I thought this was equivalent, but then I tried the numbers and realized it's not.

- I'll flip a coin to choose which roulette wheel to spin. If it comes up heads, I'll spin a wheel that's 90% green and 10% red. If it comes up tails, a wheel that's 10% green and 90% red.
- I won't show you the wheel or the coin (at this point) but I'll tell you which color came up.
- If it's green, you can bet on the coinflip: win $3 for heads and lose $13 for tails.

If the color is green, do you take the bet?

EDIT: After playing with the numbers, I think reason it's not equivalent is that in the rooms, there are always some of you who see green. I still think it's possible to create an equivalent situation in real life, without copying people. Maybe if you had a group of people draw lots and all the people who get green vote on whether to bet on which lot they were drawing from.

## comment by SforSingularity · 2009-09-08T20:47:39.378Z · LW(p) · GW(p)

Perhaps we should look at Dresher's Cartesian Camcorder as a way of reducing consciousness, and thereby eliminate this paradox.

Or, to turn it around, this paradox is a litmus test for theories of consciousness.

## comment by Sideways · 2009-09-08T21:36:24.815Z · LW(p) · GW(p)

The more I think about this, the more I suspect that the problem lies in the distinction between quantum and logical coin-flips.

Suppose this experiment is carried out with a quantum coin-flip. Then, under many-worlds, both outcomes are realized in different branches. There are 40 future selves--2 red and 18 green in one world, 18 red and 2 green in the other world--and your duty is clear:

(50%

((18+$1) + (2-$3))) + (50%((18-$3) + (2+$1))) = -$20.

Don't take the bet.

So why Eliezer's insistence on using a logical coin-flip? Because, I suspect, it prevents many-worlds from being relevant. Logical coin-flips don't create possible worlds the way quantum coin-flips do.

But what is a logical coin-flip, anyway?

Using the example given at the top of this post, an agent that was not only rational but *clever* would sit down and calculate the 256th binary digit of pi before answering. Picking a more difficult logical coin-flip just makes the calculation more difficult; a more intelligent agent could solve it, even if you can't.

So there are two different kinds of logical coin-flips: the sort that are indistinguishable from quantum coin-flips even in principle, in which case they ought to cause the same sort of branching events under many-worlds--and the sort that are solvable, but only by someone smarter than you.

If you're not smart enough to solve the logical coin-flip, you may as well treat it as a quantum coin-flip, because it's already been established that you can't possibly do *better*. That doesn't mean your decision algorithm is flawed; just that if you were more powerful, it would be more powerful too.

## comment by James_Miller · 2009-09-08T21:30:29.543Z · LW(p) · GW(p)

Is there any version of this post that doesn't involve technologies that we don't have? If not, then might the resolution to this paradox be that the copying technology assumed to exist can't exist because if it did it would give rise to a logical inconsistency.

Replies from: Johnicholas## ↑ comment by Johnicholas · 2009-09-08T23:38:28.684Z · LW(p) · GW(p)

Cute.

You may be able to translate into the language of "wake, query, induce amnesia" - many copies would correspond to many wakings.

Replies from: DanArmak, DanArmak, James_Miller## ↑ comment by DanArmak · 2009-09-08T23:51:38.737Z · LW(p) · GW(p)

No, the dilemma depends on having many copies. You're trying to optimize the outcome averaged over all copies (before the copies are made), because you don't know which copy "you" will "be".

In the no-copies / amnesia version, the updateless approach is clearly correct. You have no data to update on - awakening in a green room tells you nothing about the coin tosses because either way you'd wake up in a green room at least once (and you forget about it, so you don't know how many times it happened). Therefore you will always refuse to play.

## ↑ comment by DanArmak · 2009-09-08T23:50:15.249Z · LW(p) · GW(p)

No, the dilemma depends on having many copies. You're trying to optimize the outcome averaged over all copies (before the copies are made), because you don't know which copy "you" will "be".

In the no-copies / amnesia version, the updateless approach is ovbiously correct. You have no data to update on (you don't know how many times you've woken and forgotten about it), so you always refuse to play, even in a green room. IOW: awakening in a green room tells you nothing about the coin tosses, since either way you'd awake in a green room at least once.

## ↑ comment by James_Miller · 2009-09-09T00:06:22.715Z · LW(p) · GW(p)

But we don't have the type of amnesia drugs required to manifest the Sleeping Beauty problem, and perhaps there is something about consciousness that would prevent them from ever being created. (Isn't there some law of physics that precludes the total destruction of information.)

Replies from: Johnicholas, timtyler## ↑ comment by Johnicholas · 2009-09-09T02:25:16.180Z · LW(p) · GW(p)

I don't understand - what type of amnesia drug is required? For example, this lab:

apparently routinely does experiments induce temporary amnesia using a drug called midalozam. In general, I was under the impression that a wide variety of drugs have side effects of various degrees and kinds of amnesia, including both anterograde and retrograde.

Your proposal that consciousness might be conserved, and moreover that this might be proved by armchair reasoning seems a bit farfetched. Are you:

- just speculating idly?
- seriously pursuing this hypothesis as the best avenue towards resolving EY's puzzle?
- pursuing some crypto-religious (i.e. "consciousness conserved"=>"eternal life") agenda?

## ↑ comment by James_Miller · 2009-09-09T04:37:43.468Z · LW(p) · GW(p)

My first comment was (2) the second (1).

If DanArmk's comment is correct then it isn't important for my original comment whether there exists amnesia drugs.

If your post is correct then my second comment is incorrect.

## comment by DanArmak · 2009-09-08T22:33:24.890Z · LW(p) · GW(p)

Edit: presumably there's an answer already discussed that I'm not aware of, probably common to all games where Omega creates N copies of you. (Since so many of them have been discussed here.) Can someone please point me to it?

I'm having difficulties ignoring the inherent value of having N copies of you created. The scenario assumes that the copies go on existing after the game, and that they each have the same amount of utilons as the original (instead of a division of some kind).

For suppose the copies are short lived: Omega destroys them after the game. (Human-like agents will be deterred by the negative utility of having N-1 copies created just to experience death.) Then every copy effectively decides for itself, because its siblings won't get to keep their utilons for long, and the strategy "play game iff room is green" is valid.

Now suppose the copies are long lived. It's very likely that creating them has significant utility value.

For goals on which the N copies can cooperate (e.g. building paperclips or acquiring knowledge), the total resources available (and so utility) will have increased, often linearly (N times), sometimes a lot more. An AI might decide to pool all resources / computing power and destroy N-1 copies immediately after the game is played.

For goals on which the copies compete (e.g. property and identity), utility will be much reduced by increased competition.

In the absence of any common or contested goals, all copies will probably profit from trade and specialization.

The utility outcome of having N copies created probably far outweighs the game stakes, and certainly can't be ignored.

Replies from: orthonormal## ↑ comment by orthonormal · 2009-09-15T06:46:51.982Z · LW(p) · GW(p)

Um, you get copied N times *regardless of your choice*, so the utility of being copied shouldn't factor into your choice. I'm afraid I don't understand your objection.