# Normative uncertainty in Newcomb's problem

post by CarlShulman · 2013-06-16T02:16:44.853Z · score: 6 (8 votes) · LW · GW · Legacy · 32 comments## Contents

Newcomb's problem: one box or two boxes? None 32 comments

Here is Wikipedia's description of Newcomb's problem:

The player of the game is presented with two boxes, one transparent (labeled A) and the other opaque (labeled B). The player is permitted to take the contents of both boxes, or just the opaque box B. Box A contains a visible $1,000. The contents of box B, however, are determined as follows: At some point before the start of the game, the Predictor makes a prediction as to whether the player of the game will take just box B, or both boxes. If the Predictor predicts that both boxes will be taken, then box B will contain nothing. If the Predictor predicts that only box B will be taken, then box B will contain $1,000,000.

Nozick also stipulates that if the Predictor predicts that the player will choose randomly, then box B will contain nothing.

By the time the game begins, and the player is called upon to choose which boxes to take, the prediction has already been made, and the contents of box B have already been determined. That is, box B contains either $0 or $1,000,000 before the game begins, and once the game begins even the Predictor is powerless to change the contents of the boxes. Before the game begins, the player is aware of all the rules of the game, including the two possible contents of box B, the fact that its contents are based on the Predictor's prediction, and knowledge of the Predictor's infallibility. The only information withheld from the player is what prediction the Predictor made, and thus what the contents of box B are.

Most of this is a fairly general thought experiment for thinking about different decision theories, but one element stands out as particularly arbitrary: the ratio between the amount the Predictor may place in box B and the amount in box A. In the Newcomb formulation conveyed by Nozick, this ratio is 1000:1, but this is not necessary. Most decision theories that recommend one-boxing do so as long as the ratio is greater than 1.

The 1000:1 ratio strengthens the intuition for one-boxing, which is helpful for illustrating why one might find one-boxing plausible. However, given uncertainty about normative decision theory, the decision to one-box can diverge from one's best guess at the best decision theory, e.g. if I think there is a 1 in 10 chance that one-boxing decision theories I may one-box on Newcomb's problem with a potential payoff ratio of 1000:1 but not if the ratio is only 2:1.

So the question, "would you one-box on Newcomb's problem, given your current state of uncertainty?" is not quite the same as "would the best decision theory recommend one-boxing?" This occurred to me in the context of this distribution of answers among target philosophy faculty from the PhilPapers Survey:

### Newcomb's problem: one box or two boxes?

Accept: two boxes | 13 / 31 (41.9%) |

Accept: one box | 7 / 31 (22.6%) |

Lean toward: two boxes | 6 / 31 (19.4%) |

Agnostic/undecided | 2 / 31 (6.5%) |

Other | 2 / 31 (6.5%) |

Lean toward: one box | 1 / 31 (3.2%) |

If all of these answers are about the correct decision theory (rather than what to do in the actual scenario), then two-boxing is the clear leader, with a 2.85:1 ratio of support (accept or lean) in its favor, but this skew would seem far short of that needed to justify 1000:1 confidence in two-boxing on Newcomb's Problem.

Here are Less Wrong survey answers for 2012:**NEWCOMB'S PROBLEM**

One-box: 726, 61.4%

Two-box: 78, 6.6%

Not sure: 53, 4.5%

Don't understand: 86, 7.3%

No answer: 240, 20.3%

Here one-boxing is overwhelmingly dominant. I'd like to sort out how much of this is disagreement about theory, and how much reflects the extreme payoffs in the standard Newcomb formulation. So, I'll be putting a poll in the comments below.

## 32 comments

Comments sorted by top scores.

Carl,

You are completely right that there is a somewhat illicit factor-of-1000 intuition pump in a certain direction in the normal problem specification, which makes it a bit one-sided. Will McAskill and I had half-written a paper on this and related points regarding decision-theoretic uncertainty and Newcomb's problem before discovering that Nozick had already considered it (even if very few people have read or remembered his commentary on this).

We did still work out though that you can use this idea to create compound problems where for any reasonable distribution of credences in the types of decision theory, you should one-box on one of them and two-box on the other: something that all the (first order) decision theories agree is wrong. So much the worse for them, we think. I've stopped looking into this, but I think Will has a draft paper where he talks about this alongside some other issues.

Thanks, I'll ask him for a copy.

What is the lowest payoff ratio below at which you would one-box on Newcomb's problem, given your current subjective beliefs? [Or answer "none" if you would never one-box.]

[pollid:469]

Do these options keep any of the absolute payoffs constant, like box A always containing $1,000 and the contents of B varying according to the selected ratio? If not, the varying marginal utility of money makes this difficult to answer - I'm much more likely to risk a sure $1,000 for $1,000,000 than I am to risk a sure $1,000,000 for $1,000,000,000.

Assume all playoffs are in utilons, not dollars.

Keep box A constant at $1,000.

Curious. A majority is more confident in their one-boxing than I am.

Even more curious are the 8% who one box at 1:1. Why? (Oh, '8%' means 'one person'. That is somewhat less curious.)

There are now 5 people one boxing at 1:1. We rationalists may not believe in god but apparently we believe in Omega, may prosperity be upon his name.

Perhaps the reasoning is that it is good to be the type of agent that one-boxes, as that will lead to good results on most variations of the problem. So having an absolute rule to always one-box can be an advantage, as it is easier to predict that you will one-box then someone who has a complicated calculation to figure out whether it's worthwhile.

Of course, that only makes a difference if Omega is not perfectly omniscient, but only extremely smart and ultimately fallible. Still, because "in the real world" you are not going to ever meet a perfectly omniscient being, only (perhaps) an extremely smart one, I think one could make a reasonable argument for the position that you should try to be a type of agent that is very easy to predict will one-box.

You might as well precommit to one-box at 1:1 odds anyway. If Omega has ever been observed to make an error, it's to your advantage to be extremely easy to model in case the problem ever comes up again. On the other hand, if Omega is truly omniscient... well, you aren't getting more than $1,000 anyway, and Omega knows where to put it.

If there is visibly $1,000 in box A and there's a probability 0 EU(one-boxing), unless one is particularly incompetent at opening boxes labelled "A". Even if Omega is omniscient, I'm not, so I can never have p=1.

If anyone would one-box at 1:1 odds, would they also one-box at 1:1.01 odds (taking $990 over $1000 by two-boxing) in the hope that Omega would offer better odds in the future and predict them better?

I wouldn't one-box at 1:1.01 odds; the rule I was working off was: "Precommit to one-boxing when box B is stated to contain at least as much money as box A," and I was about to launch into this big justification on how even if Omega was observed to have 99+% accuracy, rather than being a perfect predictor, it'll fail at predicting a complicated theory before it fails at predicting a simple one...

...and that's when I realized that "Precommit to one-boxing when box B is stated to contain **more** money than box A," is just as simple a rule that lets me two-box at 1:1 and one-box when it will earn me more.

TL;DR - your point is well taken.

I took that option to mean "one-box all the way down to 1:1, even if it's 1:1.00001." If it were actually exactly 1:1, I would be indifferent between one- and two-boxing.

If you're completely confident in one-boxing, then a 1:1 ratio implies that you should be indifferent between one- and two-boxing. If you interpret the original wording as "at what ratio would you be willing to one-box" (instead of "at what ratio would you *always* insist on one-boxing"), then it makes sense to pick 1:1, since there'd be no reason not to one-box, though also no reason not to two-box.

If you're completely confident in one-boxing, then a 1:1 ratio implies that you should be indifferent between one- and two-boxing. If you interpret the original wording as "at what ratio would you be willing to one-box" (instead of "at what ratio would you always insist on one-boxing"), then it makes sense to pick 1:1, since there'd be no reason not to one-box, though also no reason not to two-box.

I had expected the group of people who are confident in one-boxing to also be likely to not be perfectly confident. All correct answers will be some form of ">1". "=1" is an error (assuming they are actually answering the Normative Uncertainty Newcomb's Problem as asked).

I didn't intend "perfectly confident" to imply people literally assigning a probability of 1. It is enough for them to assign a high enough probability that it rounds closer to 1:1 than 1.01:1.

I didn't intend "perfectly confident" to imply people literally assigning a probability of 1. It is enough for them to assign a high enough probability that it rounds closer to 1:1 than 1.01:1.

That isn't enough. Neither the actual behaviour of rational agents nor those following the instructions Carl gave for the survey (quoted below) would ever choose the bad deal due to rounding error. If people went about one boxing at 0.999:1 I hope you would agree that there is a problem.

What is the lowest payoff ratio below at which you would one-box on Newcomb's problem, given your current subjective beliefs? [Or answer "none" if you would never one-box.]

The payoffs listed are monetary, and box A only has $1000. Non-monetary consequences can be highly significant in comparison. There is value in sticking one's neck out to prove a point.

The payoffs listed are monetary, and box A only has $1000.

This isn't even specified. Carl mentioned that both boxes were to be altered but didn't bother specifying the specifics since it is the ratio that is important for the purpose of the problem.

Non-monetary consequences can be highly significant in comparison.

They also fall under fighting the hypothetical.

There is value in sticking one's neck out to prove a point.

It is troubling if "One box! Cooperate!" is such an applause light that people choose it to 'prove a point' even when the reason for it to be a good idea is removed. "One Box!" is the right answer in Newcomb's Problem and the wrong answer in Normative Uncertainty Necomb's Problem (1:1). If there is still value to 'proving that point' then something is broken.

Applause lights are one thing, fame (paradoxically, I guess) is another. If one were to imagine the scenario in an otherwise-realistic world, such a rash decision would gain a lot of news coverage. Which can be turned to useful ends, by most people's lights.

As for fighting the hypothetical, yeah guilty. But it's useful to remind ourselves that (A) money isn't utility and, more importantly, (B) while money clearly is ratio scalable, it's not uncontroversial that utility even fits an interval scale. I'm doubtful about (B), so sticking with money allows me to play along with the ratio assumption - but invites other complications.

Edited to add: in the comments Carl specified to keep box A constant at $1000.

Applause lights are one thing, fame (paradoxically, I guess) is another. If one were to imagine the scenario in an otherwise-realistic world, such a rash decision would gain a lot of news coverage.

Your model of how to gain fame does not seem to be similar to mine.

I'm looking for the "I don't understand the question" choice. (Maybe I'm being the Village Idiot today, rather than this *actually* needing clarification... but I'd bet I'm not alone.)

Actually, the ratio alone is not sufficient, because there is a reward for two-boxing related to "verifying if Omega was right" -- if Omega is right "apriori" then I see no point in two-boxing above 1:1. I think the poll would be more meaningful if 1 stood for $1. ETA: actually, "verifying" or "being playful" might mean for example tossing a coin to decide.

Omega has been observed to have a less than 1% error rate, I assume.

I've been curious why all the formulations of Newcomb's I've read give Omega/Predictor an error rate at all. Is it just to preempt reasoning along the lines of "well he never makes an error that means he is a god so I one-box" or is there a more subtle, problem-relevant reason that I'm missing?

It's to forestall arguments about "impossible epistemic states". The difference between a 1% error rate and a 0% error rate is 1%, so your answer shouldn't change (regarding certainty as valuable results in getting Dutch Booked). If you don't permit an error rate then many people will refuse to answer based solely on certainty in infallibility being impossible.

Well, the quoted version being used here posits that I have "knowledge of the Predictor's infallibility" and doesn't give an error rate. So there's one counterexample, at least.

Of course, "knowledge" doesn't mean I have a confidence of exactly 1 -- Predictor may be infallible, but I'm not. If Predictor is significantly more baseline-accurate than I am, then for EV calculations the primary factor to consider is my level of confidence in the things I "know," and Predictor's exact error rate is noise by comparison.

In practice I would say that if I somehow found myself in the state where I knew Predictor was infalllible the *first* thing I should do is ask myself how I came to know that, and whether I endorse my current confidence in that conclusion on reflection based on those conditions.

But I don't think any of that is terribly relevant. I mean, OK, I find myself instead in the state where I know Predictor is infalllible and I remember concluding a moment earlier that I reflectively endorse my current confidence in that conclusion. To re-evaluate *again* seems insane. What do I do next?

Yes, more at Wikipedia.

I don't think you need to resolve your uncertainty with regard to decision theories to figure out the correct thing to do if you anticipate being subjected to Newcomb's problems in the future: just precommit to one-boxing on such problems, and that precommitment will (hopefully) be honored by any simulations of you that are run in the future.

Yes, if you can commit yourself, that is. Generally we are limited in our ability to do that. Of course one can modify the problem so that all predictions are done using data collected before you thought about that, and any models of you don't have your full conscious experience.

I remark that those who two-box may already be taking their uncertainty about the correct decision system into account, implying that their native certainty in two-boxing is already very great.

I get the feeling that this is only controversial at all because people hear "Omega has never made an incorrect prediction" and internalize it as "Omega has just been lucky, and this is unlikely to continue", rather than "Omega is a superintelligence that might as well be infallible, so there's no point in trying to beat it". I can see no reason to two box if Omega's predictive power has been demonstrated and unbeaten a statistically significant number of times. I could try to prove I can outsmart a superintelligence with a high probability of failure, or I could take the one million dollars and pay off my student loans and put more effort into investing the rest to make up the 1000 I lose from box A.

But to the question at hand, if we assume Omega has a success rate of 100/100 so far, and decide to give ourselves a slight advantage and say that Cthulhu can cause Omega to fail an as of yet unobserved 1% of the time, we can calculate the value of each decision, as has been shown numerous times. The value of 1boxing would be 0.99x, where x is the maximum value in box B, and 2Boxing would be y+0.01x, where y is the value in box A. At x:y = 1:1, 2boxing has the advantage. At 1.01:1, they're about equal. At anything higher than 1.01:1, 1boxing wins. There's probably an elegant formulation, somewhere.

I suspect that there are people who would value beating Omega enough to screw with these numbers, though. If that is enough for the 1000:1 case to swing in favor of 2boxing, though, I'd expect that person to lose a great deal of money on various nigh-impossible challenges. It's hard to say how this would map into the real world, given the lack of observed nigh-perfect predicters like Omega.