# Counterfactual Mugging v. Subjective Probability

post by MBlume · 2009-07-20T16:31:55.512Z · score: 1 (8 votes) · LW · GW · Legacy · 32 commentsThis has been in my drafts folder for ages, but in light of Eliezer's post yesterday, I thought I'd see if I could get some comment on it:

A couple weeks ago, Vladimir Nesov stirred up the biggest hornet's nest I've ever seen on LW by introducing us to the Counterfactual Mugging scenario.

If you didn't read it the first time, please do -- I don't plan to attempt to summarize. Further, if you don't think you would give Omega the $100 in that situation, I'm afraid this article will mean next to nothing to you.

So, those still reading, you would give Omega the $100. You would do so because if someone told you about the problem now, you could do the expected utility calculation 0.5*U(-$100)+0.5*U(+$10000)>0. Ah, but where did the 0.5s come from in your calculation? Well, Omega told you he flipped a fair coin. Until he did, there existed a 0.5 probability of either outcome. Thus, for you, hearing about the problem, there is a 0.5 probability of your encountering the problem as stated, and a 0.5 probability of your encountering the corresponding situation, in which Omega either hands you $10000 or doesn't, based on his prediction. This is all very fine and rational.

So, new problem. Let's leave money out of it, and assume Omega hands you 1000 utilons in one case, and asks for them in the other -- exactly equal utility. What if there is an urn, and it contains either a red or a blue marble, and Omega looks, maybe gives you the utility if the marble is red, and asks for it if the marble is blue? What if you have devoted considerable time to determining whether the marble is red or blue, and your subjective probability has fluctuated over the course of you life? What if, unbeknownst to you, a rationalist community has been tracking evidence of the marble's color (including your own probability estimates), and running a prediction market, and Omega now shows you a plot of the prices over the past few years?

In short, what information do you use to calculate the probability you plug into the EU calculation?

## 32 comments

Comments sorted by top scores.

Further, if you don't think you would give Omega the $100 in that situation, I'm afraid this article will mean next to nothing to you.

Surely you mean something more like, "...if you don't understand the reasoning by which one would be inclined to give Omega the $100..."

I've seen this sort of thoroughly unnecessary divisiveness in several LW posts, and it really puzzles me. Have we really fallen so far into confirmation bias and ingroup/outgrouping that we're actually telling those who disagree to stay away?

No offence, but I'm getting worried about how you and a few other people keep trying to force ingroup/outgroup concerns on the rest of us. It's unnecessary and it sows dissension; you really ought not to be doing this.

None taken, and the ingoup thing was mostly a joke. I'm just genuinely puzzled as to why people write things like "The rest of this article is for Newcomb one-boxers only" when something like "The rest of this article concerns the subtleties of one-boxing, so if you don't care about that feel free to move on" would seem to be more accurate (and incidentally less inflammatory).

I'm just genuinely puzzled as to why people write things like "The rest of this article is for Newcomb one-boxers only" when something like "The rest of this article concerns the subtleties of one-boxing, so if you don't care about that feel free to move on" would seem to be more accurate (and incidentally less inflammatory).

It's simple practicality. We don't want want boring rehashes of old arguments, that already have their own place. Even if I have general disagreement with some premises I take such a disclaimer and simply accept certain assumptions for the sake of the conversation.

Something along the lines of your suggested alternative may make this intent (at least, what I choose to act as if is the intent) more explicit.

Thus, for you, hearing about the problem, there is a 0.5 probability of your encountering the problem as stated, and a 0.5 probability of your encountering the corresponding situation, in which Omega either hands you $10000 or doesn't, based on his prediction. This is all very fine and rational.

It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads. But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability previously possessed by events I have since learned the outcome of... Again, it's easy to talk informally about why you have to reply "Yes" in this case, but that's not the same as being able to exhibit a general algorithm.

Your post seems more appropriate as a comment to Eliezer's post. Your example with the fluctuating probabilities just shows that you didn't arrive at your "fine and rational" solution by computing with a generalized decision theory. You just guess-and-checked the two possible decisions to find the reflectively consistent one.

So Eliezer has asked: What mathematical formalism should a rational agent use to represent decision problems that crop up in its environment?

A causal decision theorist would tell you that the agent can use a Markov decision process. But in counterfactual-mugging-like situations, an MDP doesn't define a quantity that a reflectively self-consistent agent would maximize.

The challenge is to present a formalism in which to represent decision problems that might include some level of "decision-dependent counterfactual outcomes", and define what quantity is to be maximized for each formalized problem-instance.

Given the lack of archive navigation on LW (or even with it), could you provide a hyperlink to "Eliezer's post yesterday" for the convenience of future browsers?

I don't see where Omega the mugger plays a central role in this question. Aren't you just asking how one would guess whether a marble in an urn is red or blue, given the sources of information you describe in the last paragraph? (Your own long-term study, a suddenly-discovered predictions market.)

Isn't the answer the usual: do the best you can with all the information you have available?

No. Usually, by probability of event X you mean "probability of X given the facts that create this situation where I'm estimating the probability". In this case, you are asking about probability of coin landing on one of the sides given that it was thrown at all, not given that you are seeking the answer. This is an utterly alien question about probability estimation, as a point of view enforced on you doesn't correspond to where you are, as it always has been.

Thanks for the answer, but I am afraid I am more confused than before. In the part of the post which begins, "So, new problem...", the coin is gone, and instead Omega will decide what to do based on whether an urn contains a red or blue marble, about which you have certain information. There is no coin. Can you restate your explanation in terms of the urn and marble?

I'm not sure what exactly confuses you. Coin, urn, what does it matter? See the original post for the context in which the coin is used. Consider it a rigged coin, one probability of which landing on each side was a topic of that debate MBlume talks about.

Let me try restating the scenario more explicitly, see if I understand that part.

Omega comes to you and says, "There is an urn with a red or blue ball in it. I decided that if the ball were blue, I would come to you and ask you to give me 1000 utilons. Of course, you don't have to agree. I also decided that if the ball were red, I would come to you and give you 1000 utilons - but only if I predicted that if I asked you to give me the utilons in the blue-ball case, you would have agreed. If I predicted that you would not have agreed to pay in the blue-ball case, then I would not pay you in the red-ball case. Now, as it happens, I looked at the ball and found it blue. Will you pay me 1000 utilons?"

The difference from the usual case is that instead of a coin flip determining which question Omega asks, we have the ball in the urn. I am still confused about the significance of this change.

Is it that the coin flip is a random process, but that the ball may have gotten into the urn by some deterministic method?

Is it that the coin flip is done just before Omega asks the question, while the ball has been sitting in the urn, unchanged, for a long time?

Is it that we have partial information about the urn state, therefore the odds will not be 50-50, but potentially something else?

Is it the presence of a prediction market that gives us more information about what the state of the urn is?

Is it that our previous estimates, and those of the prediction market, have varied over time, rather than being relatively constant? (Are we supposed to give some credence to old views which have been superseded by newer information?)

Another difference is that in the original problem, the positive payoff was much larger than the negative one, while in this case, they are equal. Is that significant?

And once again, if this were not an Omega question, but just some random person offering a deal whose outcome depended on a coin flip vs a coin in an urn, why don't the same considerations arise?

Is it that the coin flip is a random process, but that the ball may have gotten into the urn by some deterministic method?

Randomness is uncertainty, and determinism doesn't absolve you of uncertainty. If you find yourself wondering what exactly was that deterministic process that fits your incomplete knowledge, it is a thought about randomness. A coin flip is as random as a pre-placed ball in an urn, both in deterministic and stochastic worlds, so long as you don't know what the outcome is, based on the given state of knowledge.

Is it that we have partial information about the urn state, therefore the odds will not be 50-50, but potentially something else?

The tricky part is what this "partial information" is, as, for example, looking at the urn after Omega reveals the actual color of the ball doesn't count.

Another difference is that in the original problem, the positive payoff was much larger than the negative one, while in this case, they are equal. Is that significant?

In the original problem, payoffs differ so much to counteract lack of identity between amount of money and utility, so that the bet does look better than nothing. For example, even if $100*0.5-$100*0.5>0, it doesn't guarantee that U($100)*0.5+U(-$100)*0.5>0. In this post, the values in utilons are substituted directly to place the 50/50 bet exactly at neutral.

And once again, if this were not an Omega question, but just some random person offering a deal whose outcome depended on a coin flip vs a coin in an urn, why don't the same considerations arise?

They could, you'd just need to compute that tricky answer that is the topic of this post, to close the deal. This question actually appears in legal practice, see hindsight bias.

Groan! Of all the Omega crap, this is the craziest. Can anyone explain to me, why should anyone ever contemplate this impossible scenario? Don't *just* vote down.

If you do not test a principle in wacky hypothetical situations that will never happen, then you run the risk of going by pure intuition by another name. Many people are not comfortable with that.

But they will never happen! That's like... like

void f(unsigned int i) { if ( i < 0) throw "Invalid argument."; }

!

What principles are being tested here?

Well, that can test whether your compiler / language actually does anything when you declare i an unsigned int. Yes, there are some that will happily accept 'unsigned' and throw it away.

Perhaps I could explain in a more helpful manner if I could understand your oddly-punctuated remark there.

An unsigned integer can't have a minus sign, so it can't be less than 0. Programmer talk.

I'm comparing contemplating impossible scenarios to computer code that will never be executed because its execution depends on a condition that will never be true. Such code does nothing but takes time to write and storage space.

Okay...

Say you want to test principle X (a principle of ethics or rationality or whatever you like) and see if it gets a good answer in *every* case. You have some choices: you can try to test *every case*; you can use the principle for a couple of weeks and see if it encourages you to leap off of anything tall or open fire on a daycare; you can come up with a couple dozen likely situations that might call for a principle like the one you have in mind and see if it does all right; or you can do your absolute best to *destroy* that principle and find a situation somewhere, somehow, where it flounders and dies and tells you that what you really should do is wear a colander on your head and twirl down the street singing Gilbert and Sullivan transposed into the Mixolydian.

Weird situations like those in which Omega is invoked are attempts at the last, which is usually the strategy quickest to turn up a problem with a given principle (even if the counterexample is actually trivial). The "attempt to destroy" method is effective because it causes you to concentrate on the weak points of the principle itself, instead of being distracted by other confounding factors and conveniences.

I get what you're saying.

What principle is being tested here right now?

The various Newcombe situations have fairly direct analogues in everyday things like ultimatum situations, or promise keeping. They alter it to reduce the number of variables, so the "certainty of trusting other party" dial gets turned up to 100% of Omega, "expectation of repeat" to 0 etc, in order to evaluate how to think of such problems when we cut out certain factors.

That said, I'm not actually sure what this question has to do with Newcombe's paradox / counterfactual mugging, or what exactly is interesting about it. If it's just asking "what information do you use to calculate the probability you plug into the EU calculation?" and Newcombe's paradox is just being used as one particular example of it, I'd say that the obvious answer is "the probability you believe it is now." After all, that's going to already be informed by your past estimates, and any information you have available (such as that community of rationalists and their estimates). If the question is something specific to Newcombe's paradox, I'm not getting it.

I think this one is actually doing it backwards - "here are some wacky situations, someone come up with a principle that works here".

"This is the wacky situation that breaks the best current candidate method. Can you fix it?"

True, but note as a caveat the problems many ethicists have in recent years brought up involving thought experiments.

For example, if our concepts are fuzzy, we should expect our rules about the concepts to output fuzzy answers. Testing boundary cases might in that case not be helpful, as the distinctions between concepts might fall apart.

A subproblem of Friendly AI, or at least a similar problem, is the challenge of proving that properties of an algorithm are stable under self-modification. If we don't identify a provably optimal algorithm for maximizing expected utility in decision-dependent counterfactuals, it's hard to predict how the AI will decide to modify its decision procedure, and it's harder to prove invariants about it.

Also, if someone else builds a rival AI, you don't want it to able to trick your AI into deciding to self-destruct by setting up a clever Omega-like situation.

If we can predict to how an AI would modify itself, why don't we just write an already modified AI?

Because the point of a self-modifying AI is that it will be able to self-modify in situations we don't anticipate. Being able to predict its self-modification in principle is useful precisely because we can't hard-code every special case.

You forgot to link to the Counterfactual Mugging post.