# Strongmanning Pascal's Mugging

post by Pentashagon · 2013-02-20T12:36:38.554Z · score: 3 (7 votes) · LW · GW · Legacy · 21 commentsA mugger appears and says "For $5 I'll offer you a set of deals from which you can pick any one. Each deal, d(N), will be N bits in length and I guarantee that if you accept d(N) I will run UTM(d(N)) on my hypercomputer, where UTM() is a function implementing a Universal Turing Machine. If UTM(d(N)) halts you will increase your utility by the number of bits written to the tape by UTM(d(N)). If UTM(d(N)) does not halt, I'll just keep your $5. Which deal would you like to accept?"

The expected increase in utility of any deal is p(d(N)) * U(UTM(d(N)), where p(d(N)) is the probability of accepting d(N) and actually receiving as many utilons as the number of bits a halting UTM(d(N)) writes to its tape. A non-empty subset of UTM programs of length N will write BB(N) bits to the tape where BB(X) is the busy-beaver function for programs of bit length X. Since BB(X) >= UTM(F) for any function F of bit length X, for every finite agent there is some N for which p(UTM(d(N)) = BB(N)) * BB(N) > 0. To paraphrase: Even though the likelihood of being offered a deal that actually yields BB(N) utilons is incredibly small, the fact that BB(X) grows at least as fast as any function of length X means that, at minimum, an agent that can be emulated on a UTM by a program of M bits cannot provide a non-zero probability of d(M) such that the expected utility of accepting d(M) is negative. In practice N can probably be much less than M.

Since p("UTM(d(X)) = BB(X)") >= 2^-X for d(X) with bits selected at random it doesn't make sense for the agent to assign p(d(X))=0 unless the agent has other reasons to absolutely distrust the mugger. For instance, discounting the probability of a deal based on a function of the promised number of utilons won't work; no discounting function grows as fast as BB(X) and an agent can't compute an arbitrary UTM(d(X)) to get a probability estimate without hypercomputational abilities. Any marginal-utility calculation fails in a similar manner.

I'm not sure where to go from here. I don't think it's rational to spend the rest of my life trying to find the largest integer I can think of to acausally accept d(biggest-integer) from some Omega. So far the strongest counterargument I've been able to think of is attempting to manage the risk of accepting the mugging by attempting to buy insurance of some sort. For example, a mugger offering intractably large amounts of utility for $5 shouldn't mind offering the agent a loan for $5 (or even $10,000) if the agent can immediately pay it back with astronomical amounts of interest out of the wealth that would almost certainly become available if the mugger fulfilled the deal. In short, it doesn't make sense to exchange utility now for utility in the future *unless* the mugger will accept what is essentially a counter-mugging that yields more long term utility for the mugger at the cost of some short term disutility. The mugger should have some non-zero probability, p, for which zhe is indifferent between p*"have $10 after fulfilling the deal" and (1-p)*"have $5 now". If the mugger acts like p=0 for this lottery, why can't the agent?

## 21 comments

Comments sorted by top scores.

The scariest version of Pascal's mugging is the mugger-less one.

Very many hypotheses -- arguably infinitely many -- can be formed about how the world works. In particular, some of these hypotheses imply that by doing something counter-intuitive in following those hypothesis, you get ridiculously awesome outcomes. For example, even in advance of me posting this comment, you could form the hypothesis "if I send Kindly $5 by Paypal, he or she will refrain from torturing 3^^^3 people in the matrix and instead give them candy."

Now, usually all such hypotheses are low-probability and that decreases the expected benefit from performing these counter-intuitive actions. But how can you show that in all cases this expected benefit is sufficiently low to justify ignoring it?

Right, this is the real core of Pascal's Mugging (I was somewhat surprised that Bostrom didn't put it into his mainstream writeup). For aggregative utility functions over a model of the environment which e.g. treat all sentient beings (or all paperclips) as having equal value without diminishing marginal returns, and all epistemic models which induce simplicity-weighted explanations of sensory experience, all decisions will be dominated by tiny variances in the probability of extremely unlikely hypotheses because the "model size" of a hypothesis can grow Busy-Beaver faster than its Kolmogorov complexity. (I've deleted a nearby comment from a known troll about how opposite hypotheses ought to cancel out. Unless this is a literal added axiom - and a *false* one at that which will produce poorly calibrated probabilities - there is no reason for all the Busy-Beaver sized hypotheses to have consequences which cancel each other out *exactly* down to the Busy-Beaverth decimal place, and continue to cancel regardless of what random bits of evidence and associated updates we run into throughout life. This presumes properties of both reality and the utility function which are very unlikely.)

Note that this actually happens without allowing that the mugger has a finite probability of possessing a hypercomputer - it follows just from trying to assign *non-zero* probability to the mugger possessing any Turing machine. It should also be noted that assigning probability literally zero means you will never believe the mugger can do something *regardless of what evidence* they show you that they are Matrix Lords. Similarly if we use Hanson's anthropic solution, we will never believe the mugger can put us into a vastly special position without log(vast) amounts of evidence.

Ouch, this rules out game-theoretic solutions.

p(d(N)) * U(UTM(d(N))

It is considered a good form to summarize your main point in a reasonably plain English, even for a complicated research paper, let alone a forum post. Give it a try, you might gain some broader audience.

Moreover, have you tried to simplify your problem, distill it to its essence? Find the extra piece that makes it non-trivial?

Another standard step is looking up the existing literature and referencing it. The issues of Pascal Mugging, unbounded utility functions and largest computable integers have been discussed here recently, are none of them relevant to your problem?

It is considered a good form to summarize your main point in a reasonably plain English, even for a complicated research paper, let alone a forum post. Give it a try, you might gain some broader audience.

Is it generally better to edit an existing post or just retract it? At this point after reading the replies it's obvious that I missed some important facts and there might not be much of interest left in the post.

Another standard step is looking up the existing literature and referencing it. The issues of Pascal Mugging, unbounded utility functions and largest computable integers have been discussed here recently, are none of them relevant to your problem?

Is there a better way to find the topical posts on LW than the google site search? I tried to find all the posts about pascal's mugging but apparently missed some of the more recent ones about bounded utility. That may have just been an oversight on my part, or using the wrong search terms. I also try to follow the list of recent posts and thought I had read all the posts about it since sometime last summer.

You can certainly delete a post if you think it does not have the value you expected. Any replies will also be deleted. As for the literature, if you don't include links, how do we know what you agree or disagree with, and what you are building on?

Please, if you want to remove your post, just edit it or retract it, don't delete it. While it may or may not be up to your standards, it does hold a few pieces of useful insight that it would be a shame to just delete. And with the replies being deleted as well, you remove others' posts, something that they may feel are valuable enough that they should not be removed. I feel that a simple edit should most certainly be enough.

Isn't that steel-man, rather than strong-man?

The historical Steelman was also a strongman, at least according to Wikipedia.

The mugger should have some non-zero probability, p, for which zhe is indifferent between p

"have $10 after fulfilling the deal" and (1-p)"have $5 now".

There is no reason to suppose this, and in fact it's unlikely, since the mugger probably isn't motivated by money (since he surely has better ways of obtaining that). In the least convenient world, he's probably just curious to know how you'll answer.

The standard solution is to have a bounded utility function, and it seems like it fully solves this reformulation as well. There may also be some other solutions that work for this, but I'm sufficiently unsure about all the notation that I'm not very confident in them.

Also, just to check, have you seen the Lifespan Dilemma?

The standard solution is to have a bounded utility function, and it seems like it fully solves this reformulation as well. There may also be some other solutions that work for this, but I'm sufficiently unsure about all the notation that I'm not very confident in them.

You're right. Somehow I completely missed Pascal's Mugging for bounded utility functions

EDIT: I obviously didn't miss it, because I commented there. What I did do was *not understand it*, which is quite a bit worse.

A mugger appears and says "For $5 I'll offer you a set of deals from which you can pick any one. Each deal, d(N), will be N bits in length and I guarantee that if you accept d(N) I will run UTM(d(N)) on my hypercomputer, where UTM() is a function implementing a Universal Turing Machine. If UTM(d(N)) halts you will increase your utility by the number of bits written to the tape by UTM(d(N)). If UTM(d(N)) does not halt, I'll just keep your $5. Which deal would you like to accept?"

That's unclear. Does the mugger show you the set of deals before you accept the offer or later? If he shows you later, he can trivially win by presenting you the set { 'while (true) {printf("I Go Chop Your Dollar\n");}' }

where p(d(N)) is the probability of accepting d(N)

d(N) either halts or it doesn't. There is nothing probabilistic about it. Are you talking about the prior probability about its termination before you see d(N), or after you see it (since you can't compute its termination in the general case)?

p(UTM(d(N)) = BB(N)) * BB(N) > 0. To paraphrase: Even though the likelihood of being offered a deal that actually yields BB(N) utilons is incredibly small, the fact that BB(X) grows at least as fast as any function of length X means that, at minimum, an agent that can be emulated on a UTM by a program of M bits cannot provide a non-zero probability of d(M) such that the expected utility of accepting d(M) is negative.

To paraphrase nothing.

p(UTM(d(N)) = BB(N)) * BB(N) > 0 is trivially true because the product of two positive numbers is always positive. But this tells you nothing about the magnitude of the value. In particular, it doesn't follow thatp(UTM(d(N)) = BB(N)) * BB(N) >= U($5)

Since p("UTM(d(X)) = BB(X)") >= 2^-X for d(X) with bits selected at random it doesn't make sense for the agent to assign p(d(X))=0 unless the agent has other reasons to absolutely distrust the mugger.

I wonder what that reasons to distrust the mugger may be... Perhaps that he wants to win your $5 rather than giving away utility?

That's unclear. Does the mugger show you the set of deals before you accept the offer or later? If he shows you later, he can trivially win by presenting you the set { 'while (true) {printf("I Go Chop Your Dollar\n");}' }

I think that even if the mugger doesn't show you the set of deals before you accept there is still a non-zero probability that some of the deals have BB(N) utility unless the agent has p(mugger_is_perfect) = 1. I agree that I should have specified that in the description.

d(N) either halts or it doesn't. There is nothing probabilistic about it. Are you talking about the prior probability about its termination before you see d(N), or after you see it (since you can't compute its termination in the general case)?

The idea was to talk about the agent's total estimation of the probability that a particular deal yields BB(N) utils. Before seeing it the agent would only have a prior probability based perhaps on Solomonoff Induction, and after seeing it the agent could just run it for a while to make sure it didn't halt quickly. Either way, the idea was that any probability estimate would be dwarfed by BB(N).

p(UTM(d(N)) = BB(N))

BB(N) > 0 is trivially true because the product of two positive numbers is always positive. But this tells you nothing about the magnitude of the value. In particular, it doesn't follow that p(UTM(d(N)) = BB(N))BB(N) >= U($5)

You are correct.

This steelman argument doesn't address the main issue with Pascal's mugging - anyone in real life who makes you such an offer is virtually certain to be lying or deluded.

This was my argument when I first encountered the problem in the Sequences. I didn't post it here because I haven't yet figured out what this post is about (gotta sit down and concentrate on the notation and the message of the author and I haven't done that yet), but my first thoughts when I read Eliezer claiming that it's a hard problem were that as the number of potential victims increases, the chance of the claim being actually true decreases (until it reaches a hard limit which equals the chance of the claimant having a machine that can produce infinite victims without consuming any resources). And the decrease in chance isn't just due to the improbability of a random person having a million torture victims - it also comes from the condition that a random person with a million torture victims also for some reason wants $5 from you.

Where is the flaw here? What makes the mugging important, despite how correct my gut reaction appears to me?

**[deleted]**· 2013-02-20T21:23:54.600Z · score: 0 (0 votes) · LW · GW

Question: Is Pascal's Mugging Isomorphic to the Two General's Problem, or am I confused?

I tried to start making a comparison between them, since each additional Messenger should grant you Utility, with Busy Beaver levels of Utility being Busy Beaver levels of Messengers, but the conclusion I came to is even if you trust the other person 100%, can never actually be safe on the on the attack/mugging unless the other person says "I will attack/pay with Certainty, regardless of any replies you send." and then does.

At that point, the worst thing that can happen to you is that the other general gets slaughtered because you didn't hear or chose not to trust the other general. The best possible result for you is still for them to be the one that takes the risk of choosing to move first with certainty so either way you get as good a result as you can.

The Pascal's Mugging equivalent of this would seem to be for the mugger to appear and say "I am going to take a chance such that the first thing I do is that I give you a small, small chance of Fabulously large utility, and I'm going to do that regardless of whether you pay me or not. But after I DO that... I really need you to send me 5 dollars."

But that doesn't seem like a mugging anymore!

I guess that means a possible reply is essentially "If this offer is so good, then pay me first, and then we'll talk."

If they resist or say no, then yes, you can just reply "But there's got to be SOME payment I can offer to you so that you move first, right?" But that's basically the offer they made to you initially!

If they are isomorphic, it makes sense that we would have trouble solving it, since there IS no solution to the two generals problem, according to the wiki:

http://en.wikipedia.org/wiki/Two_Generals%27_Problem

If they aren't isomorphic, that's weird and I am confused, because they have a lot of similarities when I look at them.

In one, utility approaches an upper bound, in the other, it grows without bound.

**[deleted]**· 2013-02-21T14:03:47.180Z · score: 0 (0 votes) · LW · GW

Thank you for thinking of that! While they do have similarities, as you point out, they clearly do also have at least one significant difference.

Facts not yet presented: How many utilions are lost by giving away $5? How quickly does the UTM operate (if it operates at ~10^10 operations per second, but the normal interest on $5 provides 10^100 utilions per second, you lose even if you get your money back at the end; if it operates at 1 operation per year, you presumably lose if you die before it halts.)