# Pascal's Mugging: Tiny Probabilities of Vast Utilities

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2007-10-19T23:37:38.000Z · score: 49 (45 votes) · LW · GW · Legacy · 347 commentsThe most common formalizations of Occam's Razor, Solomonoff induction and Minimum Description Length, measure the program size of a computation used in a hypothesis, but don't measure the running time or space requirements of the computation. What if this makes a mind vulnerable to finite forms of Pascal's Wager? A compactly specified wager can grow in size *much* faster than it grows in complexity. The utility of a Turing machine can grow much faster than its prior probability shrinks.

Consider Knuth's up-arrow notation:

- 3^3 = 3*3*3 = 27
- 3^^3 = (3^(3^3)) = 3^27 = 3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3*3 = 7625597484987
- 3^^^3 = (3^^(3^^3)) = 3^^7625597484987 = 3^(3^(3^(... 7625597484987 times ...)))

In other words: 3^^^3 describes an exponential tower of threes 7625597484987 layers tall. Since this number can be computed by a simple Turing machine, it contains very little information and requires a very short message to describe. This, even though writing out 3^^^3 in base 10 would require *enormously* more writing material than there are atoms in the known universe (a paltry 10^80).

Now suppose someone comes to me and says, "Give me five dollars, or I'll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people."

Call this Pascal's Mugging.

"Magic powers from outside the Matrix" are easier said than done - we have to suppose that our world is a computing simulation run from within an environment that can afford simulation of arbitrarily large finite Turing machines, and that the would-be wizard has been spliced into our own Turing tape and is in continuing communication with an outside operator, etc.

Thus the Kolmogorov complexity of "magic powers from outside the Matrix" is larger than the mere English words would indicate. Therefore the Solomonoff-inducted probability, two to the *negative* Kolmogorov complexity, is exponentially tinier than one might naively think.

But, small as this probability is, it isn't anywhere *near* as small as 3^^^^3 is large. If you take a decimal point, followed by a number of zeros equal to the length of the Bible, followed by a 1, and multiply this unimaginably tiny fraction by 3^^^^3, the result is pretty much 3^^^^3.

Most people, I think, envision an "infinite" God that is nowhere near as large as 3^^^^3. "Infinity" is reassuringly featureless and blank. "Eternal life in Heaven" is nowhere near as intimidating as the thought of spending 3^^^^3 years on one of those fluffy clouds. The notion that the diversity of life on Earth springs from God's infinite creativity, sounds more plausible than the notion that life on Earth was created by a superintelligence 3^^^^3 bits large. Similarly for envisioning an "infinite" God interested in whether women wear men's clothing, versus a superintelligence of 3^^^^3 bits, etc.

The original version of Pascal's Wager is easily dealt with by the gigantic multiplicity of possible gods, an Allah for every Christ and a Zeus for every Allah, including the "Professor God" who places only atheists in Heaven. And since all the expected utilities here are allegedly "infinite", it's easy enough to argue that they cancel out. Infinities, being featureless and blank, are all the same size.

But suppose I built an AI which worked by some bounded analogue of Solomonoff induction - an AI sufficiently Bayesian to insist on calculating complexities and assessing probabilities, rather than just waving them off as "large" or "small".

If the probabilities of various scenarios considered did not *exactly* cancel out, the AI's action in the case of Pascal's Mugging would be *overwhelmingly* dominated by whatever tiny differentials existed in the various tiny probabilities under which 3^^^^3 units of expected utility were actually at stake.

You or I would probably wave off the whole matter with a laugh, planning according to the dominant mainline probability: Pascal's Mugger is just a philosopher out for a fast buck.

But a silicon chip does not look over the code fed to it, assess it for reasonableness, and correct it if not. An AI is not given its code like a human servant given instructions. An AI *is* its code. What if a philosopher tries Pascal's Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override *everything* else in the AI's calculations? What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?

How do *I* know to be worried by this line of reasoning? How do *I* know to rationalize reasons a Bayesian shouldn't work that way? A mind that worked strictly by Solomonoff induction would not know to rationalize reasons that Pascal's Mugging mattered less than Earth's existence. It would simply go by whatever answer Solomonoff induction obtained.

It would seem, then, that I've implicitly declared my existence as a mind that does not work by the logic of Solomonoff, at least not the way I've described it. What am I comparing Solomonoff's answer to, to determine whether Solomonoff induction got it "right" or "wrong"?

Why do I think it's unreasonable to focus my entire attention on the magic-bearing possible worlds, faced with a Pascal's Mugging? Do I have an instinct to resist exploitation by arguments "anyone could make"? Am I unsatisfied by any visualization in which the dominant mainline probability leads to a loss? Do I drop sufficiently small probabilities from consideration entirely? Would an AI that lacks these instincts be exploitable by Pascal's Mugging?

Is it me who's wrong? Should I worry more about the possibility of some Unseen Magical Prankster of very tiny probability taking this post literally, than about the fate of the human species in the "mainline" probabilities?

It doesn't feel to me like 3^^^^3 lives are *really* at stake, even at very tiny probability. I'd sooner question my grasp of "rationality" than give five dollars to a Pascal's Mugger because I thought it was "rational".

Should we penalize computations with large space and time requirements? This is a hack that solves the problem, but is it *true?* Are computationally costly explanations less likely? Should I think the universe is probably a coarse-grained simulation of my mind rather than real quantum physics, because a coarse-grained human mind is *exponentially *cheaper than real quantum physics? Should I think the galaxies are tiny lights on a painted backdrop, because that Turing machine would require less space to compute?

Given that, in general, a Turing machine can increase in utility vastly faster than it increases in complexity, how should an Occam-abiding mind avoid being dominated by tiny probabilities of vast utilities?

If I could formalize whichever internal criterion was telling me I didn't want this to happen, I might have an answer.

I talked over a variant of this problem with Nick Hay, Peter de Blanc, and Marcello Herreshoff in summer of 2006. I don't feel I have a satisfactory resolution as yet, so I'm throwing it open to any analytic philosophers who might happen to read Overcoming Bias.

## 347 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

There aren't 3^^^3 people and there is no machine that can simulate even one person, let alone that many people.

Nobody has such magic powers.

Even if you don't accept 1 and 2 above, there's no reason to expect that the person is telling the truth. He might kill the people even if you give him the $5, or conversely he might not kill them even if you don't give him the $5.

I also don't understand the appeal of Occam's razor, so I'm pretty sure I'm not part of the target audience for this paradox.

Why would not giving him $5 make it more likely that people would die, as opposed to less likely? The two would seem to cancel out. It's the same old "what if we are living in a simulation?" argument- it is, at least, possible that me hitting the sequence of letters "QWERTYUIOP" leads to a near-infinity of death and suffering in the "real world", due to AGI overlords with wacky programming. Yet I do not refrain from hitting those letters, because there's no entanglement which drives the probabilities in that direction as opposed to some other random direction; my actions do not alter the expected future state of the universe. You could just as easily wind up saving lives as killing people.

Because he said so, and people tend to be true to their word more often than dictated by chance.

That observation applies to humans, who also tend not to kill large numbers of people for no payoff (that is, if you've already refused the money and walked away).

That's a symmetric effect, though.

Yes, but they're more likely to kill large numbers of people conditional on you not doing what they say than conditional on you doing what they say.

The mugger claims to not be a 'person' in the conventional sense, but rather an entity with outside-Matrix powers. If this statement is true, then generalized observations about the reference class of 'people' cannot necessarily be considered applicable.

Conversely, if it is false, then this is not a randomly-selected person, but rather someone who has started off the conversation with an outrageous profit-motivated lie, and as such cannot be trusted.

They claim to not be a human. They're still a person, in the sense of a sapient being. As a larger class, you'd expect lower correlation, but it would still be above zero.

I am not convinced that, even among humans speaking to other humans, truth-telling can be assumed when there is such a blatantly obvious incentive to lie.

I mean, say there actually *is* someone who can destroy vast but currently-unobservable populations with less effort than it would take them to earn $5 with conventional economic activity, and the ethical calculus works out such that you'd be better served to pay them $5 than let it happen. At that point, aren't they better served to exaggerate their destructive capacity by an order of magnitude or two, and ask you for $6? Or $10?

Once the number the mugger quotes exceeds your ability to independently confirm, or even properly imagine, the number itself becomes irrelevant. It's either a display of incomprehensibly overwhelming force, to which you must submit utterly or be destroyed, or a bluff you should ignore.

...when there is such a blatantly obvious incentive to lie.

There is no blatantly obvious reason to want to torture the people only if you do give him money.

At that point, aren't they better served to exaggerate their destructive capacity by an order of magnitude or two, and ask you for $6? Or $10?

So, you're saying that the problem is that, if they really were going to kill 3^^^3 people, they'd lie? Why? 3^^^3 isn't just enough to get $5. It's enough that the expected seriousness of the threat is unimaginably large.

Look at it this way: If they're going to lie, there's no reason to exaggerate their destructive capacity by an order of magnitude when they can just make up a number. If they choose to make up a number, 3^^^3 is plenty high. As such, if it really is 3^^^3, they might as well just tell the truth. If there's any chance that they're not lying given that they really can kill 3^^^3 people, their threat is valid. It's one thing to be 99.9% sure they're lying, but here, a 1 - 1/sqrt(3^^^3) certainty that they're lying still gives more than enough doubt for an unimaginably large threat.

It's either a display of incomprehensibly overwhelming force, to which you must submit utterly or be destroyed, or a bluff you should ignore.

You're not psychic. You don't know which it is. In this case, the risk of the former is enough to overwhelm the larger probability of the latter.

Not the way I do the math.

Let's say you're a sociopath, that is, the only factors in your utility function are your own personal security and happiness. Two unrelated people approach you simultaneously, one carrying a homemade single-shot small-caliber pistol (a 'zip gun') and the other apparently unarmed. Both of them, separately, demand $10 in exchange for not killing you immediately. You've got a $20 bill in your wallet; the unarmed mugger, upon learning this, obligingly offers to make change. While he's thus distracted, you propose to the mugger with the zip gun that he shoot the unarmed mugger, and that the two of you then split the proceeds. The mugger with the zipgun refuses, explaining that the unarmed mugger claims to be close personal friends with a professional sniper, who is most likely observing this situation from a few hundred yards away through a telescopic sight and would retaliate against anyone who hurt her friend the mugger. The mugger with the zip gun has never actually met the sniper or directly observed her handiwork, but is sufficiently detered by rumor alone.

If you don't pay the zip-gun mugger, you'll definitely get shot at, but only once, and with good chances of a miss or nonfatal injury. If you don't pay the unarmed mugger, and the sniper is real, you will almost certainly die before you can determine her position or get behind sufficiently hard cover. If you pay them both, you will have to walk home through a bad part of town at night instead of taking the quicker-and-safer bus, which apart from the inconvenience might result in you being mugged a third time.

How would you respond to that?

I don't need to be psychic. I just do the math. Taking any sort of infinitessimally-unlikely threat so seriously that it dominates my decisionmaking means anyone can yank my chain just by making a few unfounded assertions involving big enough numbers, and then once word gets around, the world will no longer contain acceptable outcomes.

In your example, only you die. In Pascal's mugging, it's unimaginably worse.

Do you accept that, in the circumstance you gave, you are more likely to be shot by a sniper if you only pay one mugger? Not significantly more likely, but still more likely? If so, that's analogous to accepting that Pascal's mugger will be more likely to make good on his threat if you don't pay.

In my example, the person making the decision was specified to be a sociopath, for whom there is no conceivable worse outcome than the total loss of personal identity and agency associated with death.

The two muggers are indifferent to each other's success. You could pay off the unarmed mugger to eliminate the risk of being sniped (by that particular mugger's friend, at least, if she exists; there may well be other snipers elsewhere in town with unrelated agendas, about whom you have even less information) and accept the risk of being shot with the zip gun, in order to afford the quicker, safer bus ride home. In that case you would only be paying one mugger, and still have the lowest possible sniper-related risk.

The three possible expenses were meant as metaphors for existential risk mitigation (imaginary sniper), infrastructure development (bus), and military/security development (zip gun), the latter two forming the classic guns-or-butter economic dilemma. Historically speaking, societies that put too much emphasis, too many resources, toward preventing low-probability high-impact disasters, such as divine wrath, ended up succumbing to comparatively banal things like famine, or pillaging by shorter-sighted neighbors. What use is a mathematical model of utility that would steer us into those same mistakes?

Is your problem that we'd have to keep the five dollars in case of another mugger? I'd hardly consider the idea of steering our life around pascal's mugging to be disagreeing with it. For what it's worth, if you look for hypothetical pascal's muggings, expected utility doesn't converge and decision theory breaks down.

Let's say you're a sociopath, that is, the only factors in your utility function are your own personal security and happiness.

Can we use the less controversial term 'economist'?

I think this answer contains something important--

Not so much an answer to the problem, but a clue to the reason WHY we intuitively, as humans, know to respond in a way which seems un-mathematical.

It seems like a Game Theory problem to me. Here, we're calling the opponents' bluff. If we make the decision that SEEMINGLY MAXIMIZES OUR UTILITY, according to game theory we're set up for a world of hurt in terms of indefinite situations where we can be taken advantage of. Game Theory already contains lots of situations where reasons exist to take action that seemingly does not maximize your own utility.

It is threatening people just to test you. We can assume that Its behavior is completely different from ours. So Tom's argument still works.

**[deleted]**· 2012-04-23T22:33:33.449Z · score: -2 (1 votes) · LW · GW

Yes, but the chance of magic powers from outside the matrix is low enough that what he says has an insignificant difference.

...or is an insignificant difference even possible?

Yes, but the chance of magic powers from outside the matrix is low enough

The chance of magic powers from outside the matrix is nothing compared to 3^^^^3. It makes no difference in whether or not it's worth while to pay him.

Very interesting thought experiment!

One place where it might fall down is that our disutility for causing deaths is probably not linear in the number of deaths, just as our utility for money flattens out as the amount gets large. In fact, I could imagine that its value is connected to our ability to intuitively grasp the numbers involved. The disutility might flatten out *really quickly* so that the disutility of causing the death of 3^^^^3 people, while large, is still small enough that the small probabilities from the induction are not overwhelmed by it.

That just means you have to change the experiment. Suppose he just said he'll cause a certain amount of net disutility, without specifying how.

This works unless you assume a maximum possible disutility.

You are not entitled to assume a maximum disutility, even if you think you see a proof for it (see Confidence Levels Inside and Outside an Argument).

People say the fact that there are many gods neutralizes Pascal’s wager - but I don't understand that at all. It seems to be a total non sequetor. Sure, it opens the door to other wagers being valid, but that is a different issue.

Lets say I have a simple game against you where, if I choose 1 I win a lotto ticket and if I choose 0 I loose. There is also a number of other games tables around the room with people winning or not winning lotto tickets. If I want to win the lotto, what number should I pick?

Also I don't tink there is a fundimental issue with having favour with Allah, Christ and Zeus simultaniously. (so you could actualy win, then get up and go play at another table - although there would be a time cost to that).

Now there is the more detailed argument where you argue that a god who desired you disbelieve in him and oppose his will is equally likely to one that desires that you believe in him and supports his will. But as long as there is any imperfection in the mirror then there is a Pascal’s wager to be had.

What if a philosopher tries Pascal's Mugging on the AI for a joke, and the tiny probabilities of 3^^^^3 lives being at stake, override everything else in the AI's calculations?

Suppose that depends on how he calculates the probability of the threat of the mugger. The very act of giving a specific probability to a threat like that opens one up to an infinite risk (i.e. that they will demand infinite things in exchange for infinity x 3^^^^3 lives). So this is a bit like comparing what I might call naive utilitarianism (where one doesn’t consider the wider effects of one’s acts and rules) with pure utilitarianism (where one takes everything into account).

Whether that neutralizes Pascal’s wager relates to how one resolves the mirror issue I mentioned. If that produces a tidy result then the problem above doesn’t occur.

There is one problem with having favor of several gods simultaneously:

Exodus 20:3 "You shall have no other gods before me."

In fact, one could argue that being a true orthodox christian would lead you to the muslim, hindu, protestant and scientology (etc.) hells, while choosing anyone of them would subtract that hell but add the hell of whatever religion you left...

I try to stay away for safety's sake :)

[edit: spelling]

This is an instance of the general problem of attaching a probability to matrix scenarios. And you can pascal-mug yourself, without anyone showing up to assert or demand anything - just think: *what if* things are set up so that whether I do, or do not do, *something*, determines whether those 3^^^^3 people will be created and destroyed? It's just as possible as the situation in which a messenger from Outside shows up and tells you so.

The obvious way to attach probabilities to matrix scenarios is to have a unified notion of possible world capacious enough to encompass both matrix worlds and worlds in which your current experiences are veridical; and then you look at relative frequencies or portions of world-measure for the two classes of possibility. For example, you could assume the correctness of our current physics across all possible worlds, and then make a Drake/Bostrom-like guesstimate of the frequency of matrix construction across all those universes, and of the "demography" and "political character" of those simulations. Garbage in, garbage out; but you really can get an answer if you make enough assumptions. In that regard, it is not too different to any other complicated decision made against a background of profound uncertainty.

Tom and Andrew, it seems very implausible that someone saying "I will kill 3^^^^3 people unless X" is literally *zero* Bayesian evidence that they will kill 3^^^^3 people unless X. Though I guess it could plausibly be weak enough to take much of the force out of the problem.

Andrew, if we're in a simulation, the world containing the simulation could be able to support 3^^^^3 people. If you knew (magically) that it couldn't, you could substitute something on the order of 10^50, which is vastly less forceful but may still lead to the same problem.

Andrew and Steve, you could replace "kill 3^^^^3 people" with "create 3^^^^3 units of disutility according to your utility function". (I respectfully suggest that we all start using this form of the problem.)

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

IIRC, Peter de Blanc told me that any consistent utility function must have an upper bound (meaning that we must discount lives like Steve suggests). The problem disappears if your upper bound is low enough. Hopefully any realistic utility function has such a low upper bound, but it'd still be a good idea to solve the general problem.

I see a similarity to the police chief example. Adopting a policy of paying attention to any Pascalian muggings would encourage others to manipulate you using them. At first it doesn't seem like this would have nearly enough disutility to justify ignoring muggings, but it might when you consider that it would interfere with responding to any *real* threat (unlikely as it is) of 3^^^^3 deaths.

create 3^^^^3 units of disutility according to your utility function

For all X:

If your utility function assigns values to outcomes that differ by a factor of X, then you are vulnerable to becoming a fanatic who banks on scenarios that only occur with probability 1/X. As simple as that.

If you think that banking on scenarios that only occur with probability 1/X is silly, then you have implicitly revealed that your utility function only assigns values in the range [1,Y], where Y<X, and where 1 is the lowest utility you assign.

If you think that banking on scenarios that only occur with probability 1/X is silly, then you have implicitly revealed that your utility function only assigns values in the range [1,Y], where Y<X, and where 1 is the lowest utility you assign.

... or your judgments of silliness are out of line with your utility function.

When I said "Silly" I meant from an axiological point of view, i.e. you think the scenario over, and you still think that you would be doing something that made you win less.

Of course in any such case, there are likely to be conflicting intuitions: one to behave as an aggregative consequentialist, and the another to behave like a sane human being.

Michael Vassar has suggested that we should consider any number of identical lives to have the same utility as one life. That could be a solution, as it's impossible to create 3^^^^3 distinct humans. But, this also is irrelevant to the create-3^^^^3-disutility-units form.

What if we required that the utility function grow no faster than the Kolmogorov complexity of the scenario? This seems like a suitable generalization of Vassar's proposal.

Mitchell, it doesn't seem to me like any sort of accurate many-worlds probability calculation would give you a probability anywhere near low enough to cancel out 3^^^^3. Would you disagree? It seems like there's something else going on in our intuitions. (Specifically, our intuitions that an good FAI would need to agree with us on this problem.)

Sorry, the first link was supposed to be to Absence of Evidence is Evidence of Absence.

Mitchell, I don't see how you can Pascal-mug yourself. Tom is right that the possibility that typing QWERTYUIOP will destroy the universe can be safely ignored; there is no evidence either way, so the probability equals the prior, and the Solomonoff prior that typing QWERTYUIOP will save the universe is, as far as we know, exactly the same. But the mugger's threat is a shred of Bayesian evidence that you have to take into account, and when you do, it massively tips the expected utility balance. Your suggested solution does seem right but utterly intractable.

I don't think the QWERTYUIOP thing is literally zero Bayesian evidence either. Suppose the thought of that particular possibility was manually inserted into your mind by the simulation operator.

*Tom and Andrew, it seems very implausible that someone saying "I will kill 3^^^^3 people unless X" is literally zero Bayesian evidence that they will kill 3^^^^3 people unless X. Though I guess it could plausibly be weak enough to take much of the force out of the problem.*

Nothing could possibly be that weak.

*Tom is right that the possibility that typing QWERTYUIOP will destroy the universe can be safely ignored; there is no evidence either way, so the probability equals the prior, and the Solomonoff prior that typing QWERTYUIOP will save the universe is, as far as we know, exactly the same.*

*Exactly* the same? These are different scenarios. What happens if an AI actually calculates the prior probabilities, using a Solomonoff technique, without any a priori desire that things should exactly cancel out?

Why would an AI consider those two scenarios and no others? Seems more likely it would have to chew over every equivalently-complex hypothesis before coming to any actionable conclusion... at which point it stops being a worrisome, potentially world-destroying AI and becomes a brick, with a progress bar that won't visibly advance until after the last proton has decayed.

... which doesn't solve the problem, but at least that AI won't be giving anyone... five dollars? Your point is valid, but it doesn't expand on anything.

More generally I mean that an AI capable of succumbing to this particular problem wouldn't be able to function in the real world well enough to cause damage.

I'm not sure that was ever a question. :3

Nothing could possibly be that weak.

Well, let's think about this mathematically.

In other articles, you have discussed the notion that, in an infinite universe, there exist with probability 1 identical copies of me some 10^(10^29) {span} away. You then (correctly, I think) demonstrate the absurdity of declaring that one of them in particular is 'really you' and another is a 'mere copy'.

When you say "3^^^^3 people", you are presenting me two separate concepts:

Individual entities which are each "people".

A set {S} of these entities, of which there are 3^^^^3 members.

Now, at this point, I have to ask myself: "what is the probability that {S} exists?"

By which I mean, what is the probability that there are 3^^^^3 *unique* configurations, *each* of which qualifies as a self-aware, experiencing entity with moral weight, without reducing to an "effective simulation" of another entity already counted in {S}?

Vs. what is the probability that the total cardinality of unique configurations that each qualify as self-aware, experiencing entities with moral weight, is < 3^^^^3?

Because if we're going to juggle Bayesian probabilities here, at some point that has to get stuck in the pipe and smoked, too.

OK, let's try this one more time:

- Even if you don't accept 1 and 2 above, there's no reason to expect that the person is telling the truth. He might kill the people even if you give him the $5, or conversely he might not kill them even if you don't give him the $5.

To put it another way, conditional on this nonexistent person having these nonexistent powers, why should you be so sure that he's telling the truth? Perhaps you'll only get what you want by *not* giving him the $5. To put it mathematically, you're computing p*X, where p is the probability and X is the outcome, and you're saying that if X is huge, then just about any nonzero p will make p*X be large. But you're forgetting two things: first, if you have the imagination to imagine X to be super-huge, you should be able to have the imagination to imagine p to be super-small. (I.e., if you can talk about 3^^^^3, you can talk about 1/3^^^^3.) Second, once you allow these hypothetical super-large X's, you have to acknowledge the possibility that you got the sign wrong.

I have to go with Tom McGabe on this one; This is just a restatement of the core problem of epistemology. It's not unique to AI, either.

*3. Even if you don't accept 1 and 2 above, there's no reason to expect that the person is telling the truth. He might kill the people even if you give him the $5, or conversely he might not kill them even if you don't give him the $5.*

But if a Bayesian AI actually calculates these probabilities by assessing their Kolmogorov complexity - or any other technique you like, for that matter - without desiring that they come out exactly equal, can you rely on them coming out exactly equal? If not, an expected utility differential of 2 to the negative googolplex times 3^^^^3 still equals 3^^^^3, so whatever tiny probability differences exist will dominate all calculations based on what we think of as the "real world" (the mainline of probability with no wizards).

*if you have the imagination to imagine X to be super-huge, you should be able to have the imagination to imagine p to be super-small*

But we can't just set the probability to anything we like. We have to calculate it, and Kolmogorov complexity, the standard accepted method, will not be anywhere near that super-small.

Addendum: In computational terms, you can't avoid using a 'hack'. Maybe not the hack you described, but something, somewhere has to be hard-coded. How else would you avoid solipsism?

This case seems to suggest the existence of new interesting rationality constraints, which would go into choosing rational probabilities and utilities. It might be worth working out what constraints one would have to impose to make an agent immune to such a mugging.

Eliezer,

OK, one more try. First, you're picking 3^^^^3 out of the air, so I don't see why you can't pick 1/3^^^^3 out of the air also. You're saying that your priors have to come from some rigorous procedure but your utility comes from simply transcribing what some dude says to you. Second, even if for some reason you really want to work with the utility of 3^^^^3, there's no good reason for you not to consider the possibility that it's really -3^^^^3, and so you should be doing the opposite. The issue is not that two huge numbers will exactly cancel out; the point is that you're making up *all* the numbers here but are artificially constraining the expected utility differential to be positive.

If I really wanted to consider this example realistically, I'd say that this guy has no magic powers, so I wouldn't worry about him killing 3^^^^3 people or whatever. A slightly more realistic scenario would be something like a guy with a bomb in a school, in which case I'd defer to the experts (presumably whoever in the police force deals with people like that) on their judgment of how best to calm him down. There I could see an (approximate) probability calculation being relevant, but, again, they key thing would be whether giving him $5 (or whatever) would make him more or less likely to set the fuse. It wouldn't be appropriate to say a priori that it could only help.

OK, one more try. First, you're picking 3^^^^3 out of the air, so I don't see why you can't pick 1/3^^^^3 out of the air also.

You're not picking 3^^^^3 out of the air. The other guy told you that number.

You can't pick probabilities out of the air. If you could, why not just set the probability that you're God to one?

If I really wanted to consider this example realistically, I'd say that this guy has no magic powers, so I wouldn't worry about him killing 3^^^^3 people or whatever.

With what probability? Would you give money to a mugger if their gun probably isn't loaded? Is this example fundamentally different?

There I could see an (approximate) probability calculation being relevant, but, again, they key thing would be whether giving him $5 (or whatever) would make him more or less likely to set the fuse. Even if it comes out as less, the paradox still exists. It's just that then you can't give him 5$. The only way to get out of it is for the probabilities to cancel out to within one part in 3^^^^3, which is absurd.

I think you're on to something, but I think the key is that someone claiming being able to influence 3^^^^3 of anything, let alone 3^^^^3 "people", is such an extraordinary claim that it would require extraordinary evidence of a magnitude similar to 3^^^^3, i.e. I bet we're vastly underestimating the complexity of what our mugger is claiming.

pdf23ds, under certain straightforward physical assumptions, 3^^^^3 people wouldn't even fit in anyone's future light-cone, in which case the probability is literally zero. So the assumption that our apparent physics is the physics of the real world too, really could serve to decide this question. The only problem is that that assumption itself is not very reasonable.

Lacking for the moment a rational way to delimit the range of possible worlds, one can utilize what I'll call a Chalmers prior, which simply specifies directly how much time you will spend thinking about matrix scenarios. (I name it for David Chalmers because I once heard him give an estimate of the odds that we are in a matrix; I think it was about 10%.) The rationality of *having* a Chalmers prior can be justified by observing one's own cognitive resource-boundedness, and the apparently endless amount of time one could spend thinking about matrix scenarios. (Is there a name for this sort of scheduling meta-heuristic, in which one limits the processing time available for potentially nonterminating lines of thought?)

I'm not aware of any (and I'm not sure it really solves this problem in particular), but there should be, because processing time is absolutely critical to bounded rationality.

Well... I think we act diffrently from the AI because we not only know Pascals Mugging, we know that it is known. I don't see why an AI could not know the knowledge of it, though, but you do not seem to consider that, which might simply show that it is not relevant, as you, er, seem to have given this some thought...

But maybe an AI cannot in fact know the knowledge of something.

What possible reason would you have to assume that? If we're talking about an actually intelligent AI, it'd presumably be as smart as any other intelligent being(like, say, a human). If we're talking about a dumb program, it can take into account anything that we want it to take into account.

Konrad: *In computational terms, you can't avoid using a 'hack'. Maybe not the hack you described, but something, somewhere has to be hard-coded.*

Well, yes. The alternative to code is not solipsism, but a rock, and even a rock can be viewed as being hard-coded as a rock. But we would prefer that the code be elegant and make sense, rather than using a local patch to fix specific problems as they come to mind, because the latter approach is guaranteed to fail if the AI becomes more powerful than you and refuses to be patched.

Andrew: *You're saying that your priors have to come from some rigorous procedure*

The priors have to come from some computable procedure. We would prefer it to be a good one, as agents with nonsense priors will not attain sensible posteriors.

*but your utility comes from simply transcribing what some dude says to you.*

No. Certain hypothetical scenarios, which we describe using the formalism of Turing machines, have fixed utilities - that is, if some description of the universe is true, it has a certain utility.

The problem with this scenario is not that we believe everything the dude tells us. The problem is that the description of a certain very large universe with a very large utility, does not have a correspondingly tiny prior probability if we use Solmonoff's prior. And then as soon as we see any evidence, no matter how tiny, anything whose entanglement is not as tiny as the very large universe is large, that expected utility differential instantly wipes out all other factors in our decision process.

*Second, even if for some reason you really want to work with the utility of 3^^^^3, there's no good reason for you not to consider the possibility that it's really -3^^^^3, and so you should be doing the opposite.*

A Solomonoff inductor might indeed consider it, though there's the problem of any bounded rationalist not being able to consider all computations. It seems "reasonable" for a bounded mind to consider it here; you did, after all.

*The issue is not that two huge numbers will exactly cancel out; the point is that you're making up all the numbers here but are artificially constraining the expected utility differential to be positive.*

Let the differential be negative. Same problem. If the differential is not zero, the AI will exhibit unreasonable behavior. If the AI literally thinks in Solomonoff induction (as I have described), it won't *want* the differential to be zero, it will just compute it.

Well, are you going to give us your answer?

To solve this problem, the AI would need to calculate the probability of the claim being true, for which it would need to calculate the probability of 3^^^^3 people even existing. Given what it knows about the origins and rate of reproduction of humans, wouldn't the probability of 3^^^^3 people even existing be approximately 1/3^^^^3? It's as you said, multiply or divide it by the number of characters in the bible, it's still nearly the same damned incomprehensably large number. Unless you are willing to argue that there are some bizarre properties of the other universe that would allow so many people to spontaneously arise from nothing- but this is yet another explanatory assumption, and one that I see no way of assigning a probability to.

Here's one for you: Lets assume for arguement's sake that "humans" could include human cosciousnesses, not just breathing humans. Then, if a universe with 3^^^^3 "humans" actually existed, what would be the odds that they were NOT all copies of the same parasitic consciousness?

Pascal's wager type arguments fail due to their symmetry (which is preserved in finite cases).

Eliezer Sorry to say (because it makes me sound callous), but if someone can and is willing to create and then destroy 3^^^3 people for less than $5, then there is no value in life, and definitely no moral structure to the universe. The creation and destruction of 3^^^3 people (or more) is probably happening all the time. Therefore the AI is safe declining the wager on purely selfish grounds.

So, if there is someone out there committing grevious holocausts (if we use realistic numbers like "10 million deaths", "20 billion deaths", the probability of this is near 1), then none of us have any moral obligations ever?

I guess so. It's an interesting idea--kind of like social cooperation problems like recycling; if too many other people are not doing it, then there isn't much point in doing it yourself. Applying it to morality is interesting. But wrong, I think.

Eliezer, I'd like to take a stab at the internal criterion question. One differerence between me and the program you describe is that I have a hoped for future. Say "I'd like to play golf on Wednesday." Now, I could calculate the odds of Wednesday not actually arriving (nuclear war,asteroid impact...), or me not being alive to see it (sudden heartattack...), and I would get an answer greater than zero. Why don't I operate on those non-zero probabilities? (The other difference between me and the program you describe) I think it has to do with faith. That is I have faith that my hoped for future will occur, or at least some semblance of it. I seem to have this faith despite previous losses. Take the field of AI. There is a hoped for future, a computer will demonstrate intelligence, some hope the machine will become conscious. There is a faith that "we can solve these problems" I'm not sure the machine you describe would have either characteristic. I don't know how to formalize this, but it seems an important aspect of the situation.

*IIRC, Peter de Blanc told me that any consistent utility function must have an upper bound (meaning that we must discount lives like Steve suggests). The problem disappears if your upper bound is low enough. Hopefully any realistic utility function has such a low upper bound, but it'd still be a good idea to solve the general problem.*

Nick, please see my blog (just click on my name). I have a post about this.

"Let the differential be negative. Same problem. If the differential is not zero, the AI will exhibit unreasonable behavior. If the AI literally thinks in Solomonoff induction (as I have described), it won't want the differential to be zero, it will just compute it."

How can a computation arrive at a nonzero differential, starting with zero data? If I ask a rational AI to calculate the probability of me typing "QWERTYUIOP" saving 3^^^^3 human lives, it knows literally *nothing* about the causal interactions between me and those lives, because they are totally unobservable.

GeniusNZ, you have to consider not only all proposed gods, but all possible gods and reward/punishment structures. Since the number and range of conceivable divine rewards and punishments is infinite for each action, the incentives are all equally balanced, and thus give you no reason to prefer one action over another.

Ultimately, I think Tom McCabe is right -- the truth of a proposition depends in part on its meaningfulness.

What is the probability that the sun will rise tomorrow? Nearly 1, if you're thinking of dawns. Nearly 0, if you're thinking of Copernicus. Bayesian reasoning can evaluate propositions, but at the limit, one must already have a rational vocabulary in which to express hypotheses.

When someone threatens to kill 3^^^^3 people, this calls into question

1) whether the assertion is meaningful at all 2) whether the lives in question are equivalent to "human lives" already observed, or are unlike in kind -- in other words, whether they should be valued similarly.

After all, analogously to the original Wager's problem, these 3^^^^3 people could be of a negative moral value -- it could be good to kill them. And no, Pascal's Mugger cannot just respond that he means people like you and me, because they are obviously not exactly analogous, since they are unobservable.

I generally share Tom McCabe's conclusion, that is, that they exactly cancel out because a symmetry has not been broken. The reversed hypothesis has the same complexity as the original hypothesis, and the same evidence supporting it. No differential entanglement. However, I think that this problem is worth attention because a) so many people who normally agree disagree here, and b) I suspect that the problem is related to normal utilitarianism with no discounting and an unbounded future. Of course, we already have some solutions in that case and we should try them and see what we get. Our realistic AGI is boundedly rational in some respect to another. How does it's limitation in predicting the mundane consequences of any given action relate to its limitation in predicting the probabilities in Pascal's Mugging.

Benquo, replace "kill 3^^^^3 people" with "create 3^^^^3 disutility units" and the problem reappears.

Michael, do you really think the mugger's statement is *zero* evidence?

It seems to me that the cancellation is an artifact of the particular example, and that it would be easy to come up with an example in which the cancellation does not occur. For example, maybe you have previous experience with the mugger. He has mugged you before about minor things and sometimes you have paid him and sometimes not. In all cases he has been true to his word. This would seem to tip the probabilities at least slightly in favor of him being truthful about his current much larger threat.

Even in that case I would assign enormously higher probability to the hypothesis that my deadbeat pal has caught some sort of brain disease that results in compulsive lying, than that such a person has somehow acquired reality-breaking powers but still has nothing better to do than hit me up for spare change.

Enormously higher probability is not 1. This still doesn't mean the statement is zero evidence.

I don't know - if he did actually have reality breaking powers, he would likely be tempted to put them to more effective use. If he would in fact be *less* likely to be making the statement were it true, then it is evidence against, not evidence for, the truth of his statement.

However clever your algorithm, at that level, something's bound to confuse it. Gimme FAI with checks and balances every time.

Is there a Godel sentence for human consciousness?

(My favorite proposal so far is: "I cannot honestly assert this sentence.")

It's definitely clever, but it's not quite what a Gödel sentence for us would be- it would seem to us to be an intractable statement about something else, and we'd be *incapable* of comprehending it as an indirect reference to our processes of understanding.

So, in particular, a human being can't write the Gödel sentence for humans.

Also, you've only been commenting for a few days- why not say hello on the welcome thread?

You could always just give up being a consequentialist and ontologically refuse to give in to the demands of anyone taking part in a Pascal mugging because consistently doing so would lead to the breakdown of society.

Re: "However clever your algorithm, at that level, something's bound to confuse it. Gimme FAI with checks and balances every time."

I agree that a mature Friendly Artificial Intelligence should defer to something like humanity's volition.

However, before it can figure out what humanity's volition is and how to accomplish it, an FAI first needs to:

- self-improve into trans-human intelligence while retaining humanity's core goals
- avoid UnFriendly Behavior (for example, murdering people to free up their resources) in the process of doing step (1)

If the AI falls prey to a paradoxes early on in the process of self-improvement, the FAI has failed and has to be shut down or patched.

Why is that a problem? Because if the AI falls prey to a paradox later on in the process of self-improvement, when the computer can outsmart human beings, the result could be catastrophic. (As Eliezer keeps pointing out: a rational AI might not *agree* to be patched, just as Gandhi would not *agree* to have his brain modified into becoming a psychopath, and Hitler would not *agree* to have his brain modified to become an egalitarian. All things equal, rational agents will try to block any actions that would prevent them from accomplishing their current goals.)

So you want to create an elegant (to the point, ideally, of being "provably correct") structure that doesn't need patches or hacks. If you have to constantly patch or hack early on in the process, that increases the chances that you've missed something fundamental, and that the AI will fail later on, when it's too late to patch.

Rolf: I agree with everything you just said, especially the bit about patches and hacks. I just wouldn't be happy having a FAI's sanity dependent on *any* single part of it's design, no matter how perfect and elegant looking, or provably safe on paper, or demonstrably safe in our experiments.

*However clever your algorithm, at that level, something's bound to confuse it.*

Odd, I've been reading moral paradoxes for many years and my brain never crashed once, nor have I turned evil. I've been confused but never catastrophically so (though I have to admit my younger self came close). My algorithm must be "beyond clever".

That's a remarkable level of resilience for a brain design which is, speaking professionally, a damn ugly mess. If I can't do aspire to do *at least* that well, I may as well hang up my shingle and move in with the ducks.

The modern human nervous system is the result of upwards of a hundred thousand years of brutal field-testing. The basic components, and even whole submodules, can be traced back even further. A certain amount of resiliency is to be expected. If you want to start from scratch and aspire to the same or higher standards of performance, it might be sensible to be prepared to invest the same amount of time and capital that the BIG did.

That you have not yet been crippled by a moral paradox or other standard rhetorical trick is comparable to saying that a server remains secure after a child spent an afternoon poking around with it and trying out lists of default passwords: a good sign, certainly, and a test many would fail, but not in itself proof of perfection.

Indeed, on a list of things we can expect evolved brains to be, ROBUST is very high on the list. ("rational" is actually rather hard to come by. To some degree, rationality improves fitness. But often its cost outweighs its benefit, hence the sea slug.)

Additionally, people throw away problems if they can't solve the answer or if getting the specifics of the answer are beyond their limits. A badly designed AI system wouldn't have that option and so would be paralyzed by calculation.

I agree with the commenter above who said the best thing to stop anything like this from happening is an AI system with checks and balances which automatically throws out certain problems. In the abstract, that might conceivably be bad. In the real world it probably won't be. Probably isn't very inspiring or logically compelling but I think it's the best that we can do.

Unless we design the first AI system with a complex goal system oriented around fixing itself that basically boils down to "do your best to find and solve any problems or contradictions within your system, ask for our help whenever you are unsure of an answer, then design a computer which can do the same task better than you, etc, then have the final computer begin the actual work of an AI". The thought comes from Douglas Adams' Hitchhiker books, I forget the names of the computers but it doesn't matter.

To anyone who says it's impossible or unfeasible to implement something like this: note that having one biased computer attempt to correct its own biases and create a less biased computer is in all relevant ways equivalent to having one biased human attempt to correct its own biases and create a less biased computer.

Give me five dollars, or I will kill as many puppies as it takes to make you. And they'll go to hell. And there in that hell will be fire, brimstone, and rap with Engrish lyrics.

I think the problem is not Solomonoff inducton or Kolmogorov complexity or Bayesian rationality, whatever the difference is, but you. You don't want an AI to think like this because you don't want it to kill you. Meanwhile, to a true altruist, it would make perfect sense.

*Not really confident. It's obvious that no society of selfish beings whose members think like this could function. But they'd still, absurdly, be happier on average.*

Well, in that case, one possible response is for me to kill YOU (or report you to the police who will arrest you for threatening mass animal cruelty). But if you're really a super-intelligent being from beyond the simulation, then trying to kill you will inevitably fail and probably cause those 3^^^^3 people to suffer as a result.

(The most plausible scenario in which a Pascal's Mugging occurs? Our simulation is being tested for its coherence in expected utility calculations. Fail the test and the simulation will be terminated.)

You don't need a bounded utility function to avoid this problem. It merely has to have the property that the utility of a given configuration of the world doesn't grow faster than the length of a minimal description of that function. (Where "minimal" is relative to whatever sort of bounded rationality you're using.)

It actually seems quite plausible to me that our intuitive utility-assignments satisfy something like this constraint (e.g., killing 3^^^^^3 puppies doesn't *feel* much worse than killing 3^^^^3 puppies), though that might not matter much if you think (as I do, and I expect Eliezer does) that our intuitive utility-assignments often need a lot of adjustment before they become things a really rational being could sign up to.

Nick Tarleton, you say:

"Benquo, replace "kill 3^^^^3 people" with "create 3^^^^3 disutility units" and the problem reappears."

But *what is* a disutility unit? *How* can there be that many? How do you know that what he supposes to be a disutility unit isn't from your persective a utility unit?

Any similarly outlandish claim is a challenge not merely to your beliefs, but to your mental vocabulary. It can't be evaluated for probability until it's evaluated for meaning.

Utility functions have to be bounded basically because genuine martingales screw up decision theory -- see the St. Petersburg Paradox for an example.

Economists, statisticians, and game theorists are typically happy to do so, because utility functions don't really exist -- they aren't uniquely determined from someone's preferences. For example, you can multiply any utility function by a constant, and get another utility function that produces exactly the same observable behavior.

In the INDIVIDUAL case that is true. In the AGGREGATE case it's not.

**[deleted]**· 2014-01-30T19:29:14.635Z · score: 0 (0 votes) · LW · GW

I always wondered why people believe utility functions are U(x): R^n -> R^1 for some n. I'm no decision theorist, but I see no reason utilities can't function on the basis of a partial ordering rather than a totally ordered numerical function.

I'm no decision theorist, but I see no reason utilities can't function on the basis of a partial ordering rather than a totally ordered numerical function.

The total ordering is really nice because it means we can move from the messy world of outcomes to the neat world of real numbers, whose values are *probabilistically relevant*. If we move from total ordering to partial ordering, then we are no longer able to make probabilistic judgments based only on the utilities.

If you have some multidimensional utility function, and a way to determine your probabilistic preferences between any uncertain gamble between outcomes x and y and a certain outcome z, then I believe you should be able to find the real function that expresses those probabilistic preferences, and that's your unidimensional utility function. If you don't have that way to determine your preferences, then you'll be indecisive, which is not something we like to build in to our decision theories.

Tiiba, keep in mind that to an altruist with a bounded utility function, or with any other of Peter's caveats, in may not "make perfect sense" to hand over the five dollars. So the problem is solveable in a number of ways, the problem is to come up with a solution that (1) isn't a hack and (2) doesn't create more problems than in solves.

Anyway, like most people, I'm not a complete utilitarian altruist, even at a philosophical level. Example: if an AI complained that you take up too much space and are mopey, and offered to kill you and replace you with two happy midgets, I would feel no guilt about refusing the offer, even if the AI could guarantee that overall utility would be higher after the swap.

Though, if the AI is a true utilitarian, why must it kill you in order to make the midgets? Aren't there plenty of asteroids that can be nanofabricated into midgets instead?

Candidate for weirdest sentence ever uttered: "Aren't there plenty of asteroids that can be nanofabricated into midgets instead?"

*That's a remarkable level of resilience for a brain design which is, speaking professionally, a damn ugly mess.*

...with vital functions inherited from reptiles. But it's been tested to death through history, serious failures thrown out at each step, and we've lots of practical experience and knowledge about how and why it fails. It wasn't built and run first go with zero unrecoverable errors.

I'm not advocating using evolutionary algorithms or to model from the human brain like Ray Kurzweil. I just mean I'd allow for unexpected breakdowns in any part of the system, however much you trust it. At least enough so if it fails it fails safe.

That's only my opinion, and it shouldn't be taken too seriously as I don't have much knowledge in the field at this time, but I thought I should explain what I meant.

I think that if you consider that the chance of a threat to cause a given amount of disutility being valid is a function of the amount of disutility then the problem mostly goes away. That is, in my experience any threat to cause me X units of disutility where X is beyond some threshold is less than 1/10 as credible as a threat to cause me 1 unit of disutility. If someone threatened to kill another person unless I gave them $5000 I would be worried. If they threatened to kill 10 poeple I would be very slightly less worried. If they threatened to kill 1000 people I would be roughly 10 times less worried. If they threatened to kill 1,000,000 people I wouldn't pay any attention at all. Taking these data points and extrapolating I form the heuristic that the chance of someone threatening me with X units of disutility over a threshold based on how much they are demanding and whether I can fulfill that demand decreases faster than linearly.

[i]Nothing could possibly be that weak.[/i]

On the contrary, I think it is not only that weak but actually far weaker. If you are willing to consider the existance of things like 3^^^3 units of disutility without considering the existence of chances like 1/4^^^4 then I believe that is the problem that is causing you so much trouble.

"Odd, I've been reading moral paradoxes for many years and my brain never crashed once, nor have I turned evil."

Even if it hasn't happened to you, it's quite common- think about how many people under Stalin had their brains programmed to murder and torture. Looking back and seeing how your brain could have crashed is *scary*, because it isn't particularly improbable; it almost happened to me, more than once.

g: *killing 3^^^^^3 puppies doesn't feel much worse than killing 3^^^^3 puppies*

...

..........................

I hereby award G the All-Time Grand Bull Moose Prize for Non-Extensional Reasoning and Scope Insensitivity.

Clough: *On the contrary, I think it is not only that weak but actually far weaker. If you are willing to consider the existance of things like 3^^^3 units of disutility without considering the existence of chances like 1/4^^^4 then I believe that is the problem that is causing you so much trouble.*

I'm certainly willing to consider the existence of chances like that, but to arrive at such a calculation, I can't be using Solomonoff induction.

Consider the plight of the first nuclear physicists, trying to calculate whether an atomic bomb could ignite the atmosphere. Yes, they had to do this calculation! Should they have not even bothered, because it would have killed so many people that the prior probability must be very low? The essential problem is that the universe doesn't care one way or the other and therefore events *do not in fact* have probabilities that diminish with increasing disutility.

Likewise, physics does not contain a clause prohibiting comparatively small events from having large effects. Consider the first replicator in the seas of ancient Earth.

Tiiba: *You don't want an AI to think like this because you don't want it to kill you. Meanwhile, to a true altruist, it would make perfect sense.*

So you're biting the bullet and saying that, faced with a Pascal's Mugger, you should give him the five dollars?

Would any commenters care to mug Tiiba? I can't quite bring myself to do it, but it needs doing.

Krishnaswami: *Utility functions have to be bounded basically because genuine martingales screw up decision theory -- see the St. Petersburg Paradox for an example.*

One deals with the St. Petersburg Paradox by observing that the resources of the casino are finite; it is not necessary to bound the utility function itself when you can bound the game within your world-model.

If you believe in the many worlds interpretation of quantum mechanics, you have to discount the utility of each of your future selves by his measure, instead of treating them all equally. The obvious generalization of this idea is for the altruist to discount the utility he assigns to other people by their measures, instead of treating them all equally.

But instead of using the QM measure (which doesn't make sense "outside the Matrix"), let the measure of each person be inversely related to his algorithmic complexity (his personal algorithmic complexity, which is equal to the algorithmic complexity of his universe plus the amount of information needed to locate him within that universe), and the problem is solved. The utility of a Turing machine can no longer grow much faster than its prior probability shrinks, since the sum of measures of people computed by a Turing machine can't be larger than its prior probability.

But there is another puzzle/paradox with Solomonoff induction that I don't know how to solve. I've written about it at http://groups.google.com/group/everything-list/browse_frm/thread/c7442c13ff1396ec/. Eliezer, do you think it would be suitable for a blog post here?

Wei, would it be correct to say that, under your interpretation, if our universe initially contains 100 super happy people, that creating one more person who is "very happy" but not "super happy" is a net negative, because the "measure" of all the 100 super happy people gets slightly discounted by this new person?

It's hard to see why I would consider this *the right thing to do* - where does this mysterious "measure" come from?

*Eliezer, do you think it would be suitable for a blog post here?*

Mm... sure. "Bias against uncomputability."

That's a much more general problem, the problem of whether to use sums or averages in utility calculations with changing population size.

"Would any commenters care to mug Tiiba? I can't quite bring myself to do it, but it needs doing."

If you don't donate $5 to SIAI, some random guy in China will die of a heart attack because we couldn't build FAI fast enough. Please donate today.

That's not a proper mugging.

"If you don't donate $5 to SIAI, the entire multiverse will be paperclip'd because we couldn't build FAI before uFAI took over."

Eli,

I agree that G's reasoning is an example of scope insensitivity. I suspect you meant this as a criticism. It seems undeniable that scope insensitivity leads to some irrational attitudes (e.g. when a person who would be horrified at killing one human shrugs at wiping out humanity). However, it doesn't seem obvious that scope insensitivity is pure fallacy. Mike Vassar's suggestion that "we should consider any number of identical lives to have the same utility as one life" seems plausible. An extreme example is, what if the universe were periodic in the time direction so that every event gets repeated infinitely. Would this mean that every decision has infinite utility consequence? It seems to me that, on the contrary, this would make no difference to the ethical weight of decisions. Perhaps somehow the utility binds to the information content of a set of events. Presumably, the total variation in experiences a puppy can have while being killed would be exhausted long before reaching 3^^^^^3.

Vann McGee has proven that if you have an agent with an unbounded utility function and who thinks there are infinitely many possible states of the world (ie, assigns them probability greater than 0), then you can construct a Dutch book against that agent. Next, observe that anyone who wants to use Solomonoff induction as a guide has committed to infinitely many possible states of the world. So if you also want to admit unbounded utility functions, you have to accept rational agents who will buy a Dutch book.

And if you do that, then the subjectivist justification of probability theory collapses, taking Bayesianism with it, since that's based on non-Dutch-book-ability.

I think the cleanest option is to drop unbounded utility functions, since they buy you *zero* additional expressive power. Suppose you have an event space S, a preference relation P, and a utility function f from events to nonnegative real numbers such that if s1 P s2, then f(s1) < f(s2). Then, you can easily turn this into a bounded utility function g(s) = f(s)/(f(s) + 1). It's easily seen that g respects the preference relation P in exactly the same way as f did, but is now bounded to the interval [0, 1).

G,

I was essentially agreeing with you that killing 3^^^^^3 vs 3^^^^3 puppies may not be ethically distinct. I would call this scope insensitivity. My suggestion was that scope insensitivity is not necessarily always unjustified.

Eliezer, creating another person in addition to 100 super happy people do not reduce the measures of those 100 super happy people. For example, suppose those 100 super happy people are living in a classical universe computed by some TM. The minimal information needed to locate each person in this universe is just his time/space coordinate. Creating another person does not cause an increase in that information for the existing people.

Is the value of my existence steadily shrinking as the universe expands and it requires more information to locate me in space?

If I make a large uniquely structured arrow pointing at myself from orbit so that a very simple Turing machine can scan the universe and locate me, does the value of my existence go up?

I am skeptical that this solution makes moral sense, however convenient it might be as a patch to this particular problem.

If I make a large uniquely structured arrow pointing at myself from orbit so that a very simple Turing machine can scan the universe and locate me, does the value of my existence go up?

Yes.

Doing something like that proves you're clever enough to come up with a plan for something that's unique in all the universe, and then marshal the resources to make it happen. That's worth something.

No. He is either clever enough or not. Proving it doesn't change his value.

(I originally had a much longer comment, but it was lost in some sort of website glitch. This is the Reader's Digest version)

I think algorithmic complexity does, to a certain degree, usefully represent what we value about human life: uniqueness of experience, depth of character, whatever you want to call it. For myself, at least, I would feel fewer qualms about Matrix-generating 100 atom-identical Smiths and then destroying them, than I would generating 100 individual, diverse people who eacvh had different personalities, dreams, judgements, and feelings. It evens captures the basic reason, I think, behind scope insensitivity; namely, that we see the number on paper as just a faceless mob of many, many, identical people, so we have no emotional investment in them as a group.

On the other hand, I had a bad feeling when I read this solution, which I still have now. Namely, it solves the dilemma, but not at the point where it's problematic; we can immediately tell that there's something wrong with handing over five bucks when we read about it, and it has little to with the individual uniqueness of the people in question. After all, who should you push from the path of an oncoming train: Jaccqkew'Zaa'KK, The Uniquely Damaged Sociopath (And Part-Time Rapist), or a hard-working, middle-aged, balding office worker named Fred Jones?

Are you replying to the correct comment? If so, I don't understand what you mean, but I'm pretty sure Jaccqkew'Zaa'KK goes under the train. Which is a tragedy if he just has cruel friends who give terrible nicknames.

I'm replying to Atorm's disputation of Strange7's response to Eliezer's response to Wei Dai's idea about using algorithmic complexity as a moral principle as a solution to the Pascal's Mugging dilemma. If I got that chain wrong and I'm responding to some completely different discussion, then I apologize for confusing everyone and it would be nice if you could point me to the thread I'm looking for. :)

(And yes, Jaccqkew'Zaa'KK goes under the train, and he really is sociopathic rapist; I was using that thought experiment as an example of a situation where the algorithmic complexity rule doesn't work)

Regarding your second paragraph: which solution are you referring to. I see no mention of five bucks anywhere in this conversation.

Sorry if I was unclear, since I was jumping around a bit; five bucks is the cash demanded by the "mugger" in the original post.

Stephen, you can't have been agreeing with me about that since I didn't say it, even though for some reason I don't understand (perhaps I was very unclear, but I don't see how) Eliezer chose to interpret me doing so and indeed going further to say that it *isn't* ethically distinct.

Random question:

The number of possible Turing machines is countable. Given a function that maps the natural numbers onto the set of possible Turing machines, one can construct a Turing machine that acts like this:

If machine #1 has not halted, simulate the execution of one instruction of machine #1

If machine #2 has not halted, simulate the execution of one instruction of machine #2

If machine #1 has not halted, simulate the execution of one instruction of machine #1

If machine #3 has not halted, simulate the execution of one instruction of machine #3

If machine #2 has not halted, simulate the execution of one instruction of machine #2

If machine #1 has not halted, simulate the execution of one instruction of machine #1

etc.

This Turing machine, if run, would eventually make all possible computations. (One could even run a program like this on a real, physical computer, subject to memory and time limitations.) Does running such a program have any ethical implications? If running a perfect simulation of a reality is essentially the same as creating that reality, would running this program for a long enough period of time actually cause all possible computable universes to come into existence? Does the existence of this program have any implications for the hypothesis that "our universe is a computer simulation being run in another universe?"

I've long felt that simulations are NOT the same as actual realities, though I can't precisely articulate the difference.

I've long felt that simulations are NOT the same as actual realities, though I can't precisely articulate the difference.

One of them has some form of computational device on the outside. One of them doesn't. Does there need to be more difference than that? ie. If you *want* to treat them differently and if some sort of physical distinction between the two is possible then by all means consider them different based on that difference.

The answer seems fairly simple under modal realism (roughly, the thesis that all logically possible worlds exist in the same sense as mathematical facts exist, and thus that the term "actual" in "our actual world" is just an indexical).

If the simulation accurately follows a possible world, and contains a unit of (dis)utility, it doesn't *generate* that unit of (dis)utility, it just "discovers" it; it proves that for a given world-state an event happens which your utility function assigns a particular value. Repeating the simulation again is also only rediscovering the same fact, not in any sense creating copies of it.

As others have basically said:

Isn't the point essentially that we believe the man's statement is uncorrelated with any moral facts? I mean if we did, then its pretty clear we can be morally forced into doing something.

Is it reasonable to believe the statement is uncorrelated with any facts about the existence of many lives? It seems so, since we have no substantial experience with "Matrices", people from outside the simulation visting us, 3^^^^^^3, the simulation of moral persons, etc...

Consider, the statement 'there is a woman being raped around the corner'. We are morally obliged to look around the corner. We have no more direct proof of the truth of this statement than of Pascal's mugger's statement. But we have good reason to believe the statement is correlated with a fact in one case, but no such reason in the other.

Can a machine be made that will consistently give zero correlation to this sort of thing? Hell if I know. Probably no, since you if iterate the known enough you get the absurd. But any claim that conditional probability of 3^^^^3 lives being simulated and destroyed is 1/(3^^^^^3) or something is a pile of horseshit.

Eliezer, you can interpret rocks as minds if you make the interpretation complex enough. Why do you ignore these rock-minds if not because you discount them for algorithmic complexity?

First, questions like "if the agent expects that I wouldn't be able to verify the extreme disutility, would its utility function be such as to actually go through spending the resources to cause the unverifiable disutility?"

That an entity with such a utility function exists would manage to stick around long enough in the first place itself may drop the probabilities by a whole lot.

Perhaps best to restrict ourselves to the case of the disutility being verifiable, but only after the fact. (Has this agent ever pulled this soft of thing before? etc..) and that verification doesn't open in the present a causal link allowing for other means of preventing the disutility. There's alot going on here.

I'm not sure, but maybe the reasoning would go not so much for the single specific case, but the process would reason by computing the expected utility of following a rule which would result in it being utterly vulnerable to any agent that merely claims to be capable of causing bignum units of disutility.

Something reasoning along the lines of following such a rule would allow agents in general to order the process to cause plenty of disutility. And that, in itself, would seem to have plenty of expected disutility.

However, *if* after chugging through the math, it didn't balance out and still the expected disutility from the existance of the disutility threat was greater, then perhaps allowing oneself to be vulnerable to such threats is genuinely the correct outcome, however counterintuitive and absurd it would seem to us.

Eliezer> Is the value of my existence steadily shrinking as the universe expands and it requires more information to locate me in space?

Yes, but the value of everyone else's existence is shrinking by the same factor, so it doesn't disturb the preference ordering among possible courses of actions, as far as I can see.

Eliezer> If I make a large uniquely structured arrow pointing at myself from orbit so that a very simple Turing machine can scan the universe and locate me, does the value of my existence go up?

This is a more serious problem for my proposal, but the conspicuous arrow also increases the values of everyone near you by almost the same factor, so again perhaps it doesn't make as much difference as you expect.

Eliezer> I am skeptical that this solution makes moral sense, however convenient it might be as a patch to this particular problem.

I'm also skeptical, but I'd say it's more than just a patch to this particular problem. Treating everyone as equals no matter what their measures are, besides leading to counterintuitive results in this "Pascal's Mugging" thought experiment, is not even mathematically sound, since the sum of the small probabilities multiplied by the vast utilities do not converge to any finite value, no matter what course of action you choose.

The mathematics says that you *have* to discount each person's value by some function, otherwise your expected utilities won't converge. The only question is which function. Using the inverse of a person's algorithmic complexity seems to lead to intuitive results in many situations, but not all.

But I'm also open to the possibility that this entire approach is wrong... Are there other proposed solutions that make more sense to you at the moment?

I'll respond to a couple of other points I skipped over earlier.

Eliezer> It's hard to see why I would consider this the right thing to do - where does this mysterious "measure" come from?

Suppose you plan to measure the polarization of a photon at some future time and thereby split the universe into two branches of unequal weight. You do not treat people in these two branches as equals, but instead value the people in the higher-weight branch more, right? Can you answer why you consider that to be the right thing to do? That's not a rhetorical question, btw. If I knew the answer to that question I think I'd also know why discounting people by algorithmic complexity (or some other function) might be the right thing to do.

Stephen> Mentioning quantum mechanics serves only as a distraction.

In classical physics, the universe doesn't branch, but instead everything is predetermined by the starting conditions and laws of physics. There is no issue of people in unequal-weight branches, which I think might be analogous to people with different algorithmic complexities. That's why I brought up QM.

Maybe the origin of the paradox is that we are extending the principle of maximizing expected return beyond its domain of applicability. Unlike Bayes formula, which is an unassailable theorem, the principle of maximizing expected return is perhaps just a model of rational desire. As such it could be wrong. When dealing with reasonably high probabilities, the model seems intuitively right. With small probabilities it seems to be just an abstraction, and there is not much intuition to compare it to. When considering a game with positive expected return that comes from big payoffs and small probabilities, it reduces to the intuitive case if we have the opportunity to play the game many times, on the order of one over the payoff probability. This type of frequentist argument seems to be where the principle comes from in the first place. However, if the probabilities are so small that there is no possibility of playing the game that many times, then maybe a rational person just ignores it rather than dutifully investing in an essentially certain loss. Of course, if we relegate the principle of maximizing expected return to being just a limiting case, this leaves open the question of what more general model underlies it.

G: Sorry to put words in your mouth.

Wei: *You do not treat people in these two branches as equals, but instead value the people in the higher-weight branch more, right? Can you answer why you consider that to be the right thing to do?*

Robin Hanson's guess about mangled worlds seems very elegant to me, since it means that I can run a (large) computer with conventional quantum mechanics programmed into it, no magic in its transistors, and the resulting simulation will contain sentient beings who experience the same probabilities we do.

Even so, I'd have to confess myself confused about why I find myself in a simple universe rather than a noisy one.

How come we keep talking about mangled worlds and multiverses... when the Bohm interpretation actually derives the Born probabilities as a stable equilibrium of the quantum potential? In one theory, we have this mysterious thing that no one is sure how to solve... and in the other theory, we have a solution right in front of us. Also, Bohmian mechanics, while nonlocal, does not require us to believe in mysterious inaccessible universes where our measurements turned out differently.

Not all infinities are equal, there exists a hierarchy. Look at real numbers versus integers.

kthxbye

Stephen, no problem. Incidentally, I share your doubt about the optimality of optimizing *expected* utility (though I wonder whether there might be a theorem that says anything coherent can be squeezed into that form).

CC, indeed there are many infinities (not merely infinitely many, not merely more than we can imagine, but more than we can describe), but so what? Any sort of infinite utility, coupled with a nonzero finite probability, leads to the sort of difficulty being contemplated here. Higher infinities neither help with this nor make it worse, so far as I can see. (I suppose it's worth considering that it might conceivably make sense for an agent's utilities to live in some structure "richer" than the usual real numbers, like Conway's surreal numbers, where there are infinities and infinitesimals aplenty. But I think there are technical difficulties with this sort of scheme; for instance, doing calculus over the surreals is problematic. And of course we actually only have finite brains, so whatever utilities we have are presumably representable in finite terms even if they feature incommensurabilities of the sort that might be modelled in terms of something like the surreal numbers. But all this is a separate issue.)

I have a paper which explores the problem in a somewhat more general way (but see especially section 6.3).

Infinite Ethics: http://www.nickbostrom.com/ethics/infinite.pdf

People have been talking about assuming that states with many people hurt have a low (prior) probability. It might be more promising to assume that states with many people hurt have a low *correlation* with what any random person claims to be able to effect.

Eliezer, I think Robin's guess about mangled worlds is interesting, but irrelevant to this problem. I'd guess that for you, P(mangled worlds is correct) is much smaller than P(it's right that I care about people in proportion to the weight of the branches they are in). So Robin's idea can't explain why you think that is the right thing to do.

Nick, your paper doesn't seem to mention the possibility of discounting people by their algorithmic complexity. Is that an option you considered?

*Pascal's wager type arguments fail due to their symmetry (which is preserved in finite cases).*

Even if our priors are symmetric for equally complex religious hypotheses, our posteriors almost certainly won't be. There's too much evidence in the world, and too many strong claims about these matters, for me to imagine that posteriors would come out even. Besides, even if two religions are equally probable, there may be certainly be non-epistemic reasons to prefer one over the other.

*However, if after chugging through the math, it didn't balance out and still the expected disutility from the existance of the disutility threat was greater, then perhaps allowing oneself to be vulnerable to such threats is genuinely the correct outcome, however counterintuitive and absurd it would seem to us.*

I agree. If we really trust the AI doing the computations and don't have reason to think that it's biased, and if the AI has considered all of the points that have been raised about the future consequences of showing oneself vulnerable to Pascalian muggings, then I feel we should go along with the AI's conclusion. 3^^^^3 people is too many to get wrong, and if the probabilities come out asymmetric, so be it.

*Maybe the origin of the paradox is that we are extending the principle of maximizing expected return beyond its domain of applicability.*

In addition to a frequency argument, one can in some cases make a different argument for maximizing expected value even in one-time-only scenarios. For instance, if you knew you would become a randomly selected person in the universe, and if your only goal was to avoid being murdered, then minimizing the expected number of people murdered would also minimize the probability that you personally would be murdered. Unfortunately, arguments like this make the assumption that your utility function on outcomes takes only one of two values ("good," i.e., not murdered, and "bad," i.e., murdered); it doesn't capture the fact that being murdered in one way may be twice as bad as being murdered in another way.

Even if there is nobody currently making a bignum-level threat, maybe the utility-maximizing thing to do is to devote substantial resources to search for low-probability, high-impact events and stop or encourage them depending on the utility effect. After all, you can't say the probability of *every* possibility as bad as killing 3^^^^3 people is zero.

Nick Tarleton,

Yes, it is probably correct that one should devote substantial resources to low probability events, but what are the odds that the universe is not only a simulation, but that the containing world is *much* bigger; and, if so, does the universe just not count, because it's so small? The bounded utility function probably reaches the opposite conclusion that only this universe counts, and maybe we should keep our ambitions limited, out of fear of attracting attention.

"I find myself in a simple world rather than a noisy one."

Care to expand on that?

Robin: Great point about states with many people having low correlations with what one random person can effect. This is fairly trivially provable.

Utilitarian: Equal priors due to complexity, equal posteriors due to lack of entanglement between claims and facts.

Wei Dai, Eliezer, Stephen, g: This is a great thread, but it's getting very long, so it seems likely to be lost to posterity in practice. Why don't the three of you read the paper Neel Krishnaswami referenced, have a chat, and post it on the blog, possibly edited, as a main post?

"The paper I referenced:

Vann McGee (1999)

An airtight Dutch book

Analysis 59 (264), 257â265.

Posted by: Neel Krishnaswami | October 20, 2007 at 06:29 PM"

*It might be more promising to assume that states with many people hurt have a low correlation with what any random person claims to be able to effect.*

*Robin: Great point about states with many people having low correlations with what one random person can effect. This is fairly trivially provable.*

Aha!

For some reason, that didn't click in my mind when Robin said it, but it clicked when Vassar said it. Maybe it was because Robin specified "many people hurt" rather than "many people", or because Vassar's part about being "provable" caused me to actually look for a reason. When I read Robin's statement, it came through as just "Arbitrarily penalize probabilities for a lot of people getting hurt."

But, yes, if you've got 3^^^^3 people running around they can't *all* have sole control over each other's existence. So in a scenario where lots and lots of people exist, one has to penalize *by a proportional factor* the probability that any one person's binary decision can solely control the whole bunch.

Even if the Matrix-claimant says that the 3^^^^3 minds created will be unlike you, with information that tells them they're powerless, if you're in a generalized scenario where anyone has and uses that kind of power, the vast majority of mind-instantiations are in leaves rather than roots.

This seems to me to go right to the root of the problem, not a full-fledged formal answer but it feels right as a starting point. Any objections?

This seems intuitively plausible.

The more outrageous the claim, the correspondingly less plausible is their ability to pull it off.

Especially when you evaluate the amount of resources they are demanding vs the number of resources that you would expect their implausibly difficult plan would require to be achieved.

That's not the point. None of those probabilities are as strong as 3^^^3. Maybe big, buy not THAT big.

The point is that no more than 1/3^^^3 people have sole control over the life or death of 3^^3 people. This improbability, that you would be one of those very special people, IS big enough.

(This answer fails unless your ethics and anthropics use the same measure. That's how the pig example works.)

Even if the Matrix-claimant says that the

3^^^^3minds created will be unlike you, with information that tells them they're powerless, if you're in a generalized scenario where anyone has and uses that kind of power, the vast majority of mind-instantiations are in leaves rather than roots.The point is that no more than

1/3^^^3people have sole control

I was about to express mild amusement about how cavalier we are with jumping to, from and between numbers like 3^^^^3 and 3^^^3. I had to squint to tell the difference. Then it occurred to me that:

The point is that no more than 1/3^^^3 people have sole control over the life or death of 3^^3 people. This improbability, that you would be one of those very special people, IS big enough.

3^^3 is not even unimaginably big, Knuth arrows or no. It's about 1/5th the number of people that can fit in the MCG.

Being cavalier with proofreading =/= being cavalier with number size.

But that is indeed amusing.

Being cavalier with proofreading =/= being cavalier with number size.

Well, I didn't want to declare a proofreading error because 3^^^3 does technically fit correctly in the context, even if you may not have meant it. ;)

I was thinking the fact that we are so cavalier makes it easier to slip between them if not paying close attention. Especially since 3^^^3 is more commonly used than 3^^^^3. I don't actually recall Eliezer going beyond pentation elsewhere.

I know if I go that high I tend to use 4^^^^4. It appeals more aesthetically and is more clearly distinct. Mind you it isn't nearly as neat as 3^^^3 given that 3^^^3 can also be written and visualized conceptually as 3 -> 3 -> 3 while 4^^^^4 is just 4 -> 4 -> 4 not 4 -> 4 -> 4 -> 4.

So you're saying that the implausibility is that I'd run into a person that just happened to have that level of "power" ?

Is that different in kind to what I was saying?

If I find it implausible that the person I'm speaking to can actually do what they're claiming, is that not the same as it being implausible that I happen to have met a person that can do what this person is claiming/ (leaving aside the resource-question which is probably just my rationalisation as to why I think he couldn't pull it off).

Basically I'm trying to taboo the actual BigNum... and trying to fit the concepts around in my head.

It's implausible that *you're* the person with that power. We could easily imagine a world in which everyone runs into a single absurdly powerful person. We could not imagine a world in which everyone was absurdly powerful (in their ability to control other people), because then multiple people would have control over the same thing.

If you knew that he had the power, but that his action wasn't going to depend on yours, then you wouldn't give him the money. So you're only concerned with the situation where you have the power.

Ok, sure thing. I get what you're saying. I managed to encompass that implausibility also into the arguments I made in my restatement anyway, but yeah, I agree that these are different kinds of "unlikely thing"

In fact... let me restate what I think I was trying to say.

The mugger is making an extraordinary claim. One for which he has provided no evidence.

The amount of evidence required to make me believe that his claim is possible, grows at the same proportion as the size of his claim.

Think about it at the lower levels of potential claims.

1)
If he claimed to be able to kill one person - I'd believe that he was capable of killing one person. I'd then weigh that against the likelihood that he'd pick *me* to blackmail, and the low blackmail amount that he'd picked... and consider it more likely that he's lying to make a fast buck, than that he actually has a hostage somewhere ready to kill.

2) If he claimed to be able to kill 3^3 people, I'd consider it plausible... with a greatly diminished likelihood. I'd have to weigh the evidence that he was a small-time terrorist, willing to take the strong risk of being caught while preparing to blow up a buildings-worth of people... or to value his life so low as to actually do it and die in the process. It's not very high, but we've all seen people like this in our lifetime both exist and carry out this threat. So it's "plausible but extremely unlikely".

The likelihood that I've: a) happened to run into one of these rare people and b) that he'd pick *me* (pretty much a nobody) to blackmail combine to be extremely unlikely... and I'd reckon that those two, balanced against the much higher prior likelihood that he's just a con-artist, would fairly well cancel out against the actual value of a buildings-worth of people.

Especially when you consider that the resources to do this would far outweigh the money he's asked for. As far as I know about people wiling to kill large numbers of people - *most* of them do it for a reason, and that reason is almost never a paltry amount of cash. It's still possible... after all the school-killers have done crazy stunts to kill people for a tiny reason... but usually there's fame or revenge involved... not blackmail of a nobody.

3) So now we move to 3^^3 people. Now, I personally have never seen that many die in one sitting (or even as the result of a single person)... but my Grandfather did, and using technology from 65 years ago.

It is plausible, though even less likely than before, that the person I've just run into happens to be willing and able to use a nuke on a large city, or to have the leadership capabilities (and luck) required to take over a country and divert it's resources to killing that number of people.

I would consider it exponentially less likely that he'd pick *me* to blackmail about this... and certainly not for such a pitiful amount of cash. People that threaten this kind of thing are either after phenomenal amounts of money, recognition or some kind of political or religious statement.... they are extremely unlikely to find a random citizen to blackmail for a tiny amount of cash. The likelihood that this is a con seems about as high as the number of people to potentially die.

4) Now we hit the first real BigNum. AFAIK, the world has never seen 3^^^3 sentient intelligences ever die in one sitting. We don't have that many people on the Earth right now. Maybe the universe has seen it somewhere... some planetary system wiped out in a supernova. It's plausible... but now think of the claims the guy is making:

a) that he can create (or knows of) a civilisation that contains that number of sentient beings.

b) that he (and he alone) has the ability to destroy that civilisation, and can do so at whim and that

c) it's worthwhile him doing so for the mere pittance he's demanding from a complete, unrelated nobody... (or potentially the whim of watching said nobody squirm).

I actually think that the required (and missing) evidence for his outrageous claims stack fairly evenly against the potential downside of his claims actually being true.

So, to get back to the original point: In my mind, as each step grows exponentially more extreme, so does the evidence required to support such a ludicrous claim. These two cancel out roughly evenly, leaving the leftovers of "is he likely to have picked me?" and other smaller probabilities to actually sway the balance.

Those, added with the large disutility of "encouraging the guy to do it again" would sway me to choose not to give him £5, but to walk away, then immediately find the nearest police officer...

3+3, 3*3, 3^3, 3^^3, 3^^^3, etc. grows much faster than exponentially. a^b, for any halfway reasonable a and b, can't touch 3^^^3

3^^^3=3^^(3^^3)=3^^(7625597484987)=3^(3^^(7625597484986))

It's not an exponential, it's a huge, huge tower of exponentials. It is simply too big for that argument to work.

Yes, I should not have used the word exponential... but I don't know the word for "grows at a rate that is a tower of exponentials"... "hyperexponential" perhaps?

However - I consider that my argument still holds. That the evidence required grows *at the same rate* as the size of the claim.

The evidence must be of equal value to the claim.

(from "extraordinary claims require extraordinary evidence")

My point in explaining the lower levels is that is that we don't demand evidence from most claimants of small amounts of damage because we've already seen evidence that these threats are plausible. But if we start getting to the "hyperexponential" threats, we hit a point where we suddenly realise that there is no evidence supporting the plausibility of the claim... so we automatically assume that the person is a crank.

3) So now we move to 3^^3 people. Now, I personally have never seen that many die in one sitting (or even as the result of a single person)... but my Grandfather did, and using technology from 65 years ago.

3^^3 is a thousand times larger than the number of people currently alive.

oops, yes I mixed up 3^^3 with 3^^^3

Ok, so skip step 3 and move straight on to 4 ;)

The point is that no more than 1/3^^^3 people have sole control over the life or death of 3^^3 people. This improbability, that you would be one of those very special people, IS big enough.

(This answer fails unless your ethics and anthropics use the same measure. That's how the pig example works.)

So can we solve the problem by putting some sort of upper bound on the degree to which ethics and anthropics can differ, along the lines of "creation of 3^^^^3 people is at most N times less probable than creation of 3^^^^3 pigs, so across the ensemble of possible worlds the prior against your being in a position to influence that many pigs still cuts down the expected utility from something vaguely like 3^^^^3 to something vaguely like N"?

Is that a general solution? What about this: "Give me five dollars or I will perform an action, the disutility of which will be equal to twice that of you giving me five dollars, multiplied by the reciprocal of the probability of this statement being true."

**[deleted]**· 2012-06-07T18:54:22.494Z · score: 1 (1 votes) · LW · GW

Well, I'd rather lose twenty dollars than be kicked in the groin very hard, and the probability of you succeeding in doing that given you being close enough to me and trying to do so is greater than 1/2, so...

But anthropically, since you exist within the matrix, and so does he, and hostages outside the matrix cannot reach you to make such an offer ...

You don't *have* 3^^^^3 people "running around". You have the population of earth running around, plus one matrix lord.

More to the point, if you create 3^^^^3 people, surely a LOT of them are going to be identical, purely by coincidence? Aren't you double-counting most of them?

Robin's anthropic argument seems pretty compelling in this example, now that I understand it. It seems a little less clear if the Matrix-claimant tried to mug you with a threat not involving many minds. For example, maybe he could claim that there exists some giant mind, the killing of which would be as ethically significant as the killing of 3^^^^3 individual human minds? Maybe in that case you would anthropically expect with overwhelmingly high probability to be a figment inside the giant mind.

I think that Robin's point solves this problem, but doesn't solve the more general problem of an AGI's reaction to low probability high utility possibilities and the attendant problems of non-convergence.

The guy with the button could threaten to make an extra-planar factory farm containing 3^^^^^3 pigs instead of killing 3^^^^3 humans. If utilities are additive, that would be worse.

*The guy with the button could threaten to make an extra-planar factory farm containing 3^^^^^3 pigs instead of killing 3^^^^3 humans. If utilities are additive, that would be worse.*

Congratulations, you made my brain asplode.

Once again, vegetarians win at morality.

3^^^^^^3 copies of that brain, fates all dependent on the original pondering this thread.

All fates equal, I *think* their incentive to solve the mystery equals that for one alone.

Eliezer, what if the mugger (Matrix-claimant) also says that he is the only person who has that kind of power, and he knows there is just one copy of you in the whole universe? Is the probability of that being true less than 1/3^^^^3?

Don't dollars have an infinite expected value (in human lives or utility) anyway, especially if you take into account weird low-probability scenarios? Maybe the next mugger will make even bigger threats.

*next* mugger?
There's a distinctly high probability that *this* mugger will return with higher blackmail demands.

*Even if the Matrix-claimant says that the 3^^^^3 minds created will be unlike you, with information that tells them they're powerless, if you're in a generalized scenario where anyone has and uses that kind of power, the vast majority of mind-instantiations are in leaves rather than roots.*

You would have to abandon Solomonoff Induction (or modify it to account for these anthropic concerns) to make this work. Solomonoff Induction doesn't let you consider just "generalized scenarios"; you have to calculate each one in turn, and eventually one of them is guaranteed to be nasty.

To paraphrase Wei's example: the mugger says, "Give me five dollars, or I'll simulate and kill 3^^^^3 people, *and I'll make sure they're aware that they are at the leaf and not at the node*". Congratulations, you now have over 3^^^^3 bits of evidence (in fact, it's a tautology with probability 1) that the following proposition is true: "*if* the mugger's statement is correct, then I am the one person at the node and am not one of the 3^^^^3 people at the leaf." By Solomonoff Induction, this scenario where his statement is literally true has > 1 / 2^(10^50) probability, as it's easily describable in much less than 10^50 bits. Once you try to evaluate the utility differential of that scenario, boom, we're right back where we started.

On the other hand, you could modify Solomonoff Induction to reflect anthropic concerns, but I'm not sure it's any better than just modifying the utility function to reflect anthropic concerns.

And, of course, there's still the pig problem in either case.

Michael, your pig example threw me into a great fit of belly-laughing. I guess that's what my mind look likes when it explodes. And I recall that was Marvin Minsky's prediction in *Society of Minds*.

*You would have to abandon Solomonoff Induction (or modify it to account for these anthropic concerns) to make this work.*

To be more specific, you would have to alter it in such a way that it accepted Brandon Carter's Doomsday Argument.

"Congratulations, you made my brain asplode."

Read http://www.spaceandgames.com/?p=22 if you haven't already. Your utility function should not be assigning things arbitrarily large additive utilities, or else you get precisely this problem (if pigs qualify as minds, use rocks), and your function will sum to infinity. If you "kill" by destroying the exact same information content over and over, it doesn't seem to be as bad, or even bad at all. If I made a million identical copies of you, froze them into complete stasis, and then shot 999,999 with a cryonics-proof Super-Plasma-Vaporizer, would this be immoral? It would certainly be less immoral than killing a million ordinary individuals, at least as far as I see it.

Wei, no I don't think I considered the possibility of discounting people by their algorithmic complexity.

I can see that in the context of Everett it seems plausible to weigh each observer with a measure proportional to the amplitude squared of the branch of the wave function on which he is living. Moreover, it seems right to use this measure both to calculate the anthropic *probability* of me finding myself as that observer and the moral *importance* of that observer's well-being.

Assigning anthropic probabilities over infinite domains is problematic. I don't know of a fully satisfactory explanation of how to do this. One natural approach might to explore might be to assign some Turing machine based measure to each of the infinite observers. Perhaps we could assign plausible probabilities by using such an approach (although I'd like to see this worked out in detail before accepting that it would work).

If I understand your suggestion correctly, you propose that the same anthropic probability measure should also be used as a measure of moral importance. But there seems to me to be a problem. Consider a simple classical universe with two very similar observers. On my reckoning they should each get anthropic probability measure 1/2 (rejecting SIA, the Self-Indication Assumption). Yet it appears that they should each have a moral weight of 1. Does your proposal require that one accepts the SIA? Or am I misinterpreting you? Or are you trying to explicate not total utilitarianism but average utilitarianism?

It seems like this may be another facet of the problem with our models of expected utility in dealing with very large numbers. For instance, do you accept the Repugnant conclusion?

I'm at a loss for how to model expected utility in a way that doesn't generate the repugnant conclusion, but my suspicion is that if someone finds it, this problem may go away as well.

Or not. It seems that our various heuristics and biases against having correct intuitions about very large and small numbers are directly tied up in producing a limiting framework that acts as a conservative.

One thought, the expected utility of letting our god-like figure run this Turing simulation might well be positive! S/He is essentially *creating* these 3^^^3 people and then killing them. And in fact, it's reasonable to assume that expected disutility of killing them is entirely dependent on (and thus exactly balanced by) the utility of their creation.

So, our mugger doesn't really hand us a dilemma unless the claim is that this simulation is already *running*, and those people have lives worth living, but if you don't pay the $5, the program will be altered (sun will stop in the sky, so tto speak) and they will all be killed). This last is more of a nitpick.

It does seem to me that the bayesian inference we draw from this person's statement must be *extraordinarily* low, with an uncertainty much larger than its absolute value. Because a being which is both capable of this and willing to offer such a wager (either in truth or as a test) is deeply beyond our moral or intellectual comprehension. Indeed, if the claim is true, that fact will have utility implications that completely dwarf the immediate decision. If they are willing to do this much over 5 dollars, what will they do for a billion? Or for some end that money cannot normally purchase? Or merely at whim? It seems that the information we receive by failing to pay may be of value commensurate with the disutility of them truthfully carrying out their threat.

Regarding the comments about exploding brains, it's a wonder to me that we *are* able to think about these issues and not lose our sanity. How is it that a brain evolved for hunting/gathering/socializing is able to consider these problems at all? Not only that, but we seem to have some useful intuitions about these problems. Where on Earth did they come from?

Nick> Does your proposal require that one accepts the SIA?

Yes, but using a complexity-based measure as the anthropic probability measure implies that the SIA's effect is limited. For example, consider two universes, the first with 1 observer, and the second with 2. If all of the observers have the same complexity you'd assign a higher prior probability (i.e., 2/3) to being in the second universe. But if the second universe has an infinite number of observers, the sum of their measures can't exceed the measure of the universe as a whole, so the "presumptuous philosopher" problem is not too bad.

Nick> If I understand your suggestion correctly, you propose that the same anthropic probability measure should also be used as a measure of moral importance.

Yes, in fact I think there are good arguments for this. If you have an anthropic probability measure, you can argue that it should be used as the measure of moral importance, since everyone would prefer that was the case from behind the veil of ignorance. On the other hand, if you have a measure of moral importance, you can argue that for decisions not involving externalities, the global best case can be obtained if people use that measure as the anthropic probability measure and just consider their self interests.

BTW, when using both anthropic reasoning and moral discounting, it's easy to accidentally apply the same measure twice. For example, suppose the two universes both have 1 observer each, but the observer in the second universe has twice the measure of the one in the first universe. If you're asked to guess which universe you're in with some payoff if you guess right, you don't want to think "There's 2/3 probability that I'm in the second universe, and the payoff is twice as important if I guess 'second', so the expected utility of guessing 'second' is 4 times as much as the EU of guessing 'first'."

I think that to avoid this kind of confusion and other anthropic reasoning paradoxes (see http://groups.google.com/group/everything-list/browse_frm/thread/dd21cbec7063215b/), it's best to consider all decisions and choices from a multiversal objective-deterministic point of view. That is, when you make a decision between choices A and B, you should think "would I prefer if everyone in my position (i.e., having the same perceptions and memories as me) in the entire multiverse chose A or B?" and ignore the temptation to ask "which universe am I likely to be in?".

But that may not work unless you believe in a Tegmarkian multiverse. If you don't, you may have to use both anthropic reasoning and moral discounting, being very careful not to double-count.

How is it that a brain evolved for hunting/gathering/socializing is able to consider these problems at all?

To be fair, humans are surrounded by thousands of other species that evolved under the same circumstances and can't consider them.

Before I get going, please let me make clear that I do not

understand the math here (even Eliezer's intuitive bayesian paper

defeated me on the first pass, and I haven't yet had the courage to

take a second pass), so if I'm Missing The Point(tm), please tell

me.

It seems to me that what's missing is talking about the probability

of given level of resourcefulness of the mugger. Let me 'splain.

If I ask the mugger for more detail, there are a wide variety of

different variables that determine how resourceful the mugger claims

to be. The mugger could, upon further questioning, reveal that all

the death events are the same entity being killed in the same way,

which I call one death; given the unlikelyhood of the mugger telling

the truth in the first place, I'd not pay. Similarily, the mugger

could reveal that the deaths, while of distinct entities, happen one

at a time, and may even include time for the entities to grow up and

become functioning adults (i.e. one death every 18 years), in which

case I can almost certainly put the money to better use by giving it

to SIAI.

On the other end of the scale, the mugger can claim infinite

resources, so that the can complete the deaths (of entirely distinct

entities, which have lives, grow up, and then are slaughtered) in an

infinitely small amount of time. If the mugger does so, they don't

get the money, because I assign an infinitely small value to

probability of the mugger having infinite resources. Yes, the

mugger may live in a magical universe where having infinite

resources is easy, but you don't get a

get-out-of-probability-assignment-free card because you say the word

"magic"; I still have to base my probability assignment of your

claims on the world around me, in which we don't yet have the

computing power to simulate even one human in real time (ignoring

the software problem entirely).

Between these two extremes is an entire range of possibilities. The

important part here is that the probability I assign to "the mugger

is lying" is going to increase exponentially as their claim of

resources increases. Until the claimed rate of birth, growing, and

dying exceeds the rate of deaths we already have here on Earth, I

don't care, because I can better spend the money here. After we

reach that point (~150K per day), I don't care, because my

probability is something like 1/O(2^n) (Computer Science big-O

there; sorry, that's my background) where n is the multiple of

computer resources claimed over "one mind in realtime", so n is,

umm, 150K deaths per day = 53400000 deaths per year, 18 years for

each person, so I think n is 961200000?. That's not even counting

the probability discount due to the ridiculousness of the whole

claim.

The point here is that I don't care about the 3^^^^3 number; I only

care about the claimed deaths per unit time, how that compares to

the number of people currently dying on Earth (on whom I *know* I

can well-spend the $5) and the claimed resourcefulness of the

mugger. By the time we get up to where the 3^^^^3 number matters,

i.e. "I can kill one-onemillionth of 3^^^^3 people every realtime

year", my probability assignment for their claimed resourcefulness

is so incredibly low (and so incredibly lower than the numbers they

are throwing at me) that I laugh and walk away.

There is not, as far as I can tell, a sweet spot where the number of

lives I *might* save by giving the mugger the $5 is enough more than

the number of people currently dying on Earth to offset the

ridiculously low probability I'd be assiging to the mugger's

resourcefulness. I'd rather give the $5 to SIAI.

-Robin

My apologies for the horrific formatting; I wrote that huge diatribe in w3m before discovering the captcha needed javascript, and then pasted it here. If an admin can fix it, please do so.

-Robin

One idea is to tell the AI not to expend a portion of its resources greater than the chance of the mugger's statement being true.

Should I think the universe is probably a coarse-grained simulation of my mind rather than real quantum physics, because a coarse-grained human mind is fifty(?) orders of magnitude cheaper than real quantum physics? Should I think the galaxies are tiny lights on a painted backdrop, because that Turing machine would require less space to compute?

I think a large universe full of randomly scattered matter is much more probable than a small universe that consists of a working human mind and little else.

"But, small as this probability is, it isn't anywhere near as small as 3^^^^3 is large"

Eliezer, I contend your limit!

I think this scenario is ingenious. Here are a few ideas, but I'm really not sure how far one can pursue them / how 'much work' they can do:

(1) Perhaps the agent needs some way of 'absolving itself of responsibility' for the evil/arbitrary/unreasonable actions of another being. The action to be performed is the one that yields highest expected utility but only along causal pathways that don't go through an adversary that has been labelled as 'unreasonable'.

(Except this approach doesn't defuse the variation that goes "You can never wipe your nose because you've computed that the probability of this action killing 3^^^^3 people in a parallel universe is ever so slightly greater than the probability of it saving that number of people".)

(2) We only have a fixed amount of 'moral concern', apportioned somehow or other to the beings we care about. Our utility function looks like: Sum(over beings X) ConcernFor(X)*HappinessOf(X). Allocation of 'moral concern' is a 'competitive' process. The only way we can gain some concern about Y is to lose a bit of concern about some X, but if we have regular and in some sense 'positive' interactions with X then our concern for X will be constantly 'replenishing itself'. When the magician appears and tells us his story, we may acquire a tiny bit of concern about him and the people he mentions, but the parts of us that care about the people we know (a) aren't 'told' the magician's story and thus (b) refuse to 'relinquish' very much.

The trouble with is that it sounds too reminiscent of the insanely stupid moral behaviour of human beings (where e.g. they give exactly as much money to save a hundred penguins as ten thousand.)

(3) We completely abandon the principle of using minimum description length as some kind of 'universal prior'. (For some reason. And replace it with something else. For some reason.)

tried it out in Berlin. epic.

Our best understanding of the nature of the "simulation" we call reality has this concept we call "cause and effect" in place. So when something happens it has non-zero (though nigh infinitely small) effects on everything else in existence (progressively smaller effect with each degree of separation).

The effect that affecting 3^^^3 things (regardless of type or classification) has on other things (even if the individual effects of affecting one thing would be extremely small) would be non-trivial (enormously large even after a positively ludicrous degree of separations).

Once you consider the level of effect that this would have on the whole "simulation" you are forced to consider basically all possible futures. You have nigh-infinite good (when these things are removed/effected you end up with utopia and a range of all possible net benefits for the whole of the simulation) and nigh-infinite penalty (when these things are removed/effected you end up with hell and a range of all possible net losses for the whole of the simulation). I cannot foresee how an AI can possibly have enough processing power to overcome the vagary being unable to predict all possible futures following the event.

Moreover, I personally balk at the assumption of that level of responsibility. It is for the same reason that I balk at time travel scenarios. I refuse to be responsible for whatever changes are wrought across all of reality (which in sum become quite large when you consider a Vast possibly infinite universe regardless of how "small" the initial event seems).

Also does the probability assignment take into account the likelihood of the actor in question approaching you? Assuming there are 3^^^3 people (minds), then surely the probability assignment of approaching you specifically must be adjusted accordingly. I understand that "somebody has to be approached," but surely no one here is willing to contend that any of us have traits which are so exceptional that they cannot be found inside of a population which is 3^^^3 in size?

Assume that the basic reasoning for this is true, but nobody actually does the mugging. Since the probability doesn't actually make a significant difference to the expected utility, I'll just simplify and say there equal.

The total expected marginal utility, assuming you're equally likely to save or kill the people, would be (3^^^3 - 3^^^3) + (3^^^^3 - 3^^^^3) + (3^^^^^3 - 3^^^^^3) + ... = 0. At least, it would be if you count it by alternating with saving and killing. You could also count it as 3^^^3 + 3^^^^3 - 3^^^3 + 3^^^^^3 - 3^^^^3 + ... = infinity. Or you could count it as -3^^^3 - 3^^^^3 + 3^^^3 - 3^^^^^3 + 3^^^^3 - ... = -infinity. Or you could even do 3^^^3 - 3^^^3 + 3^^^^3 - 3^^^^3 + 3^^^^^3 - 3^^^^^3 + ... (without parentheses) which doesn't even converge to anything.

You could also construct hypothetical possibility sets where you can set it to add to any given number by rearranging the possibilities.

It's one thing when order matters for talking about total utility of an infinitely long universe. It at least has an order, assuming you don't mind abandoning special relativity, but what order are you even supposed to count expected utility in?

I figure the only way out of this is to use a prior that decreases with expected utility faster than those formulations of Occam's razor would suggest. I don't like the idea of doing this, but not doing so just doesn't add up.

The probability of some action costing delta-utility x and resulting in delta-utility y, where y >> x, is low. The Anti Gratis Dining modifier is x/y. These things I conjecture, anyways.

The apple-salespeep who says, "Give me $0.50, and I will give you an apple" is quite believable, unlike the apple-salespeep who claims, "Give me $3.50, and I will give apples to all who walk the Earth". We understand how buying an apple gets us an apple, but we know far less about implementing global apple distribution.

Suppose I have a Holy Hand Grenade of FAI, which has been carefully proofed by all the best mathematicians, programmers, and philosophers, and I am (of course) amongst them. And am randomly selected to activate it! Sadly, there is an ant caught in the pin. I can not delay to extricate it, for that means more deaths left unprevented. I pull the pin and kill the ant anyways.

So, the more understanding you have about the situation at hand, the less the AGD factor applies to the situation.

[Late edit: I have since retracted this solution as wrong, see comments below; left here for completeness. The ACTUAL solution that really works I've written in a different comment :) ]

I do believe I've solved this. Don't know if anyone is still reading or not after all this time, but here goes.

Eliezer speaks of the symmetry of Pascal's wager; I'm going to use something very similar here to solve the issue. The number of things that could happen next - say, in the next nanosecond - is infinite, or at the very least incalculable. A lot of mundane things could happen, or a lot of unforeseen things could happen. It could happen that a car would go through my living room and kill me. Or it could happen that the laws of energy conservation were violated and the whole world would turn into bleu cheese. Each of these possibilities could, in theory, have a probability assigned to it, given our priors.

But! We only have enough computing power to calculate a finite number of outcomes at any given moment. That means that we CANNOT go around assigning probabilities by calculation. Rather, we're going to need some heuristic to deal with all the probabilities we do NOT calculate.

Suppose our AI is very good at predicting things. It manages to assign SOME probability to what will happen next about 99% of the time (Note: My solution works equally well for anything from 0% to 100% minus epsilon - and I shouldn't have to explain why a Bayesian AI should never be 100% certain that it got an answer right). That means that 1% of the time, something REALLY surprises it; it just did not assign any probability at all. Now, because the number of things that could be in that category is infinite, they cancel out. Sure, we could all turn to cheese if it says "abracadabra". Or we could turn to cheese UNLESS it says so. The utility functions will always end in 0 for the uncalculated mass of probabilities.

That means that the AI always works under the assumptions that "or something I didn't see coming will happen; but I must be neutral regarding such an outcome until I know more about it".

Now. Say the AI manages to consider 1 million possibilities per prediction it makes (how it still gets 1% of them wrong is beyond me but again, the exact number doesn't matter for my solution). So any outcome that has NOT been calculated could, in fact, be considered to have a probability of 1%/ 1 million - not because there are only a million possibilities the AI hasn't considered, but because that is how many it could TRY to consider.

This number is your cutoff. Before you multiply a probability with a utility function, you subtract this number from the probability, first. So now if someone comes up to you and says it'll kill 3^^^^3 people and you decide to actually spend the cycles to consider how likely that is, and you get 1/googol, that number is LESS than the background noise of everything you don't have time to calculate. You round it down to zero, not because it is *arbitrarily* small enough, but because anything you have not considered for calculation must be considered to have higher probability - and like in Pascal's wager, those options' utility is infinite and can counter any number that Pascal's Mugger can throw at me. You subtract, not an arbitrary number, but rather a number depending on how long the AI is thinking about the problem; how many possibilities it takes into account.

Does this solve the problem? I think it does.

(By the way: ChrisA's way also works against this problem, except that coding your AI so that it may disregard value and morality if certain conditions are met seems like a pretty risky proposition).

The problem is one of rational behavior, not of bounded-rational hacks.

Are you saying that it's a good thing that the AI uses this rounding system and goes against its values this particular time?

If so, how did you tell that it was a good thing?

Can you mathematically formalize that intuition?

If you cannot do so, there is probably some other conflict between your intuitions and your AI code.

Actually, I think I made a mistake there.

Don't get me wrong, in my suggestion the AI is NOT going against its values nor being irrational, and this was not meant as a hack. Rather I'm claiming that the basic method of doing rationality as described needs revision that accounts for practicality, and if you disagree with that then your next rational move should DEFINITELY be to send me 50$ RIGHT NOW because I TOTALLY have a button that kicks 4^^^^4 puppies if I press it RIGHT HERE.

Having said that, I do think I might have made an error of intuition in there, so let's rethink it. Just because we should rethink what constitutes rational behavior does not mean I got it right.

Suppose I am an omnipotent being and have created a button that does something, once, if pressed. I truthfully tell you that there are several possible outcomes:

- You receive 10$. This has a chance of 45% chance of happening.
- You lose 5$. This, too, has a chance of 45% chance of happening.
- Something else happens.

You should be pretty interested in what this "something else" might be before you press the button, since I've put absolutely no bounds on it. You could win 1000$. Or you could die. The whole world could die. You would wake up in a protein bath outside the Matrix. etc. etc. Some of these things you might be able to prepare for, if you know about them in advance.

If you're rational and you get no further information, you should probably press the button. The overall gain is 5$; as in Pascal's Wager, the infinity of possibilities that stem from the third option cancel each other out.

Now, suppose before I tell you that you get 10 guesses as to what the third thing is. Every time you guess, I tell you the precise probability that this thing is possible. Furthermore, the third option could do at least 12 different things, so no matter what you guessed, you would not be able to tell exactly what the button might do.

So you start guessing. One of your guesses is "3^^^^3 people will die horribly". I rate that one as a 10^-100 chance.

You've reached the end of the guesses and still a full 5% of probability remain - half of the third option's share.

So. Now do we press the button?

My claim was that the you should ignore every outcome smaller than 1% chance in this case, regardless of its utility. This now seems to me like a mistake. In theory, when we add the utility of all known options, it comes out extremely negative. Because the remaining 5% unknowns still have effectively zero chance of happening each, and they STILL cancel each other out.

I think I even know where my mathematical error was: I was assuming that anything less than 1% is a waste of a guess and therefore we should have guessed something else, which quite possibly has a higher chance - this establishes a cutoff for "a calculation that was not worth doing". However in this new example there are at least 12 things the button can do; essentially the number is infinite as far as I know. I should count myself VERY lucky to get 1% or more for anything I guess. In fact I should expect to get an answer of zero or epsilon for pretty much everything. That means that no guess is truly wasted or trivial.

Of course, if we don't press the button the Pascal Muggers will have won...

Back to the drawing board, I guess? :-/

If the injured parties are humans, I should be very skeptical of the assertion because a very small fraction, (1/3^^3)*1/10^(something), of people have the power of life and death over 3^^^3 other people, whereas 1/10^(something smaller) hear the corresponding hoax.

That's the only answer that makes sense because it's the only answer that works on a scale of 3^^^3.

I think.

"If the injured parties are humans, I should be very skeptical of the assertion because a very small fraction, (1/3^^3)*1/10^(something)"

You don't know that. In fact, you don't know that with some degree of uncertainty that, if I thought had a lot on the line, I might not take lightly.

I'm trying to think up several avenues. One is that the higher the claimed utility, the lower the probability (somehow); another tries to use the implications that accepting the claim would have on other probabilities in order to cancel it out.

I'll post a new comment if I manage to come up with anything good.

I know because of anthropics. It is a logical impossibility for more than 1/3^^^3 individuals to have that power. You and I cannot both have power over the same thing, so the total amount of power is bounded, hopefully by the same population count we use to calculate anthropics.

Not in the least convenient possible world. What if someone told you that 3^^^3 copies of you were made before you must make your decision and that their behaviour was highly correlated as applies to UDT? What if the beings who would suffer had no consciousness, but would have moral worth as judged by you(r extrapolated self)? What if there was one being who was able to experience 3^^^3 times as much eudaimonia as everyone else? What if the self-indication assumption is right?

If you're going to engage in motivated cognition at least consider the least convenient possible world.Am I talking to Omega now, or just some random guy? I don't understand what is being discussed. Please elaborate?

Then my expected utility would not be defined. There would be relatively simple worlds with arbitrarily many of them. I honestly don't know what to do.

Then my expected utility would not be defined. There would be relatively simple agents with arbitrarily sensitive utilities.

Then I would certainly live in a world with infinitely many agents (or I would not live in any worlds with any probability), and the SIA would be meaningless.

My cognition is motivated by something else - by the desire to avoid infinities.

1) Sorry, I confused this with another problem; I meant some random guy.

2/3) Isn't how you decision process handles infinities rather important? Is there any corresponding theorem to the Von Neumann–Morgenstern utility theorem but without using either version of axiom 3? I have been meaning to look into this and depending on what I find I may do a top-level post about it. Have you heard of one?

edit: I found Fishburn, 1971, A Study of Lexicographic Expected Utility, Management Science. It's behind a paywall at http://www.jstor.org/pss/2629309. Can anyone find a non-paywall version or email it to me?

4) Yeah, my fourth one doesn't work. I really should have known better.

Sometimes, infinities must be made rigourous rather than eliminated. I feel that, in this case, it's worth a shot.

What worries me about infinities is, I suppose, the infinite Pascal's mugging - whenever there's a single infinite broken symmetry, nothing that happens in any finite world matters to determine the outcome.

This implies that all are thought should be devoted to infinite rather than finite worlds. And if all worlds are infinite, it looks like we need to do some form of SSA dealing with utility again.

This is all very convenient and not very rigorous, I agree. I cannot see a better way, but I agree that we should look. I will use university library powers to read that article and send it to you, but not right now.

I don't see any way to avoid the infinite Pascal's mugging conclusion. I think that it is probably discouraged due to a history of association with bad arguments and the actual way to maximize the chance of infinite benefit will seem more acceptable.

I will use university library powers to read that article and send it to you, but not right now.

Thank you.

Consider an infinite universe consisting of infinitely many copies of Smallworld, and other one consisting of infinitely many copies of Bigworld.

It seems like the only reasonable way to compute expected utility is to compute SSA or pseudo-SSA in Bigworld and Smallworld, thus computing the average utility in each infinite world, with an implied factor of omega.

Reasoning about infinite worlds that are made of several different, causally independent, finite components may produce an intuitively reasonable measure on finite worlds. But what about infinite worlds that are not composed in this manner? An infinite, causally connected chain? A series of larger and larger worlds, with no single average utility?

How can we consider them?

It seems like the only reasonable way to compute expected utility is to compute SSA or pseudo-SSA in Bigworld and Smallworld, thus computing the average utility in each infinite world, with an implied factor of omega.

Be careful about using an infinity that is not the limit of an infinite sequence; it might not be well defined.

An infinite, causally connected chain?

It depends on the specifics. This is a very underdefinded structure.

A series of larger and larger worlds, with no single average utility?

A divergent expected utility would always be preferable to a convergent one. How to compare two divergent possible universes depends on the specifics of the divergence.

I will formalize my intuitions, in accordance with your first point, and thereby clarify what I'm talking about in the third point.

Suppose agents exist on the real line, and their utilities are real numbers. Intuitively, going from u(x)=1 to u(x)=2 is good, and going from u(x)=1 to u(x)=1+sin(x) is neutral.

The obvious way to formalize this is with the limiting process:

limit as M goes to infinity of ( the integral from -M to M of u(x)dx, divided by 2M )

This gives well-defined and nice answers to some situations but not others.

However, you can construct functions u(x) where ( the integral from -M to M of u(x)dx, divided by 2M ) is an arbitrary differentiable function of M, in particular, one that has no limit as M goes to infinity. However, it is not necessarily divergent - it may oscillate between 0 and 1, for instance.

I'm fairly certain that if I have a description of a single universe, and a description of another universe, I can produce a description in the same language of a universe consisting of the two, next to each other, with no causal connection. Depending on the description language, for some universes, I may or may not be able to tell that they cannot be written as the limit of a sum of finite universes.

For any decision-making process you're using, I can probably tell you what an infinite causal chain looks like in it.

Suppose agents exist on the real line, and their utilities are real numbers. Intuitively, going from u(x)=1 to u(x)=2 is good, and going from u(x)=1 to u(x)=1+sin(x) is neutral.

Why must there be a universe that corresponds to this situation? The number of agents has cardinality beth-1. A suitable generalization of Pascal's wager would require that we bet on the amount of utility having a larger cardinality, if that even makes sense. Of course, there is no maximum cardinality, but there is a maximum cardinality expressible by humans with a finite lifespan.

The obvious way to formalize this is with the limiting process:

limit as M goes to infinity of ( the integral from -M to M of u(x)dx, divided by 2M )

That is intuitively appealing, but it is arbitrary. Consider the step function that is 1 for positive agents and -1 for negative agents. Agent 0 can have a utility of 0 for symmetry, but we should not care about the utility of one agent out of infinity unless that agent is able to experience an infinity of utility. The limit of the integral from -M to M of u(x)dx/2M is 0, but the limit of the integral from 1-M to 1+M of u(x)dx/2M is 2 and the limit of the integral from -M to 2M of u(x)dx/3M is +infinity. While your case has an some appealing symmetry, it is arbitrary to privilege it over these other integrals. This can also work with a sigmoid function, if you like continuity and differentiability.

I'm fairly certain that if I have a description of a single universe, and a description of another universe, I can produce a description in the same language of a universe consisting of the two, next to each other, with no causal connection.

Wouldn't you just add the two functions, if you are talking about just the utilities, or run the (possibly hyper)computations in parallel, if you are talking about the whole universes?

Depending on the description language, for some universes, I may or may not be able to tell that they cannot be written as the limit of a sum of finite universes.

Yes, how to handle certain cases of infinite utility looks extremely non-obvious. It is also necessary.

Why must there be a universe that corresponds to this situation?

So that the math can be as simple as possible. Solving simple cases is advisable. beth-1 is easier to deal with in mathematical notation than beth-0, and anything bigger is so complicated that I have no idea.

The limit of the integral from -M to M of u(x)dx/2M is 0, but the limit of the integral from 1-M to 1+M of u(x)dx/2M is 2 and the limit of the integral from -M to 2M of u(x)dx/3M is +infinity. While your case has an some appealing symmetry, it is arbitrary to privilege it over these other integrals. This can also work with a sigmoid function, if you like continuity and differentiability.

Actually, those mostly go to 0

1-M to 1+M gets you 2/2M=1/M, which goes to 0. -M to 2M gets you M/3M=1/3.

This doesn't matter, as even this method, the most appealing and simple, fails in some cases, and there do not appear to be other, better ones.

Wouldn't you just add the two functions, if you are talking about just the utilities, or run the (possibly hyper)computations in parallel, if you are talking about the whole universes?

Yes, indeed. I would run the computations in parallel, stick the Bayes nets next to each other, add the functions from policies to utilities, etc. In the first two cases, I would be able to tell how many separate universes seem to exist. In the second, I would not.

Yes, how to handle certain cases of infinite utility looks extremely non-obvious. It is also necessary.

I agree. I have no idea how to do it. We have two options:

Find some valid argument why infinities are logically impossible, and worry only about the finite case.

Find some method for dealing with infinities.

Most people seem to assume 1, but I'm not sure why.

Oh, and I think I forgot to say earlier that I have the pdf but not your email address.

My email address is endoself (at) yahoo (dot) com.

1-M to 1+M gets you 2/2M=1/M, which goes to 0. -M to 2M gets you M/3M=1/3.

I seem to have forgotten to divide by M.

Why must there be a universe that corresponds to this situation?

So that the math can be as simple as possible. Solving simple cases is advisable.

I didn't mean to ask why you chose this case; I was asking why you thought it corresponded to any possible world. I doubt any universe could be described by this model, because it is impossible to make predictions about. If you are an agent in this universe, what is the probability that you are found to the right of the y-axis? Unless the agents do not have equal measure, such as if agents have measure proportional to the complexity of locating them in the universe, as Wei Dai proposed, this probability is undefined, due to the same argument that shows the utility is undefined.

This could be the first step in proving that infinities are logically impossible, or it could be the first step in ruling out impossible infinities, until we are only left with ones that are easy to calculate utilities for. There are some infinities that seem possible: consider an infinite number of identical agents. This situation is indistinguishable from a single agent, yet has infinitely more moral value. This could be impossible however, if identical agents have no more reality-fluid than single agents or, more generally, if a theory of mind or of physics, or, more likely, one of each, is developed that allows you to calculate the amount of reality-fluid from first principles.

In general, an infinity only seems to make sense for describing conscious observers if it can be given a probability measure. I know of two possible sets of axioms for a probability space. Cox's theorem looks good, but it is unable to handle any infinite sums, even reasonable ones like those used in the Solomonoff prior or finite, well defined integrals. There's also Kolmogorov's axioms, but they are not self evident, so it is not certain that they can handle any possible situation.

Once you assign a probability measure to each observer-moment, it seems likely that the right way to calculate utility is to integrate the utility function over the probability space, times some overall possibly infinite constant representing the amount of reality fluid. Of course this can't be a normal integral, since utilities, probabilities, and the reality-fluid coefficient could all take infinite/infinitesimal values. That pdf might be a start on the utility side; the probability side seems harder, but that may just be because I haven't read the paper on Cox's theorem; and the reality-fluid problem is pretty close to the hard problem of consciousness, so that could take a while. This seems like it will take a lot of axiomatization, but I feel closer to solving this than when I started writing/researching this comment. Of course, if there is no need for a probability measure, much of this is negated.

So, of course, the infinities for which probabilities are ill-defined are just those nasty infinities I was talking about where the expected utility is incalculable.

What we actually want to produce is a probability measure on the set of individual experiences that are copied, or whatever thing has moral value, not on single instantiations of those experiences. We can do so with a limiting sequence of probability measures of the whole thing, but probably not a single measure.

This will probably lead to a situation where SIA turns into SSA.

What bothers me about this line of argument is that, according to UDT, there's nothing fundamental about probabilities. So why should undefined probabilities be more convincing than undefined expected utilities?

We still need something very much like a probability measure to compute our expected utility function.

Kolomogorov should be what you want. A Kolomogorov probability measure is just a measure where the measure of the whole space is 1. Is there something non-self-evident or non-robust about that? It's just real analysis.

I think the whole integral can probably contained within real--analytic conceptions. For example, you can use an alternate definition of measurable sets.

I disagree with your interpretation of UDT. UDT says that, when making choices, you should evaluate all consequences of your choices, not just those that are causally connected to whatever object is instantiating your algorithm. However, while probabilities of different experiences are part of our optimization criteria, they do not need to play a role in the theory of optimization in general. I think we should determine more concretely whether these probabilities exist, but their absence from UDT is not very strong evidence against them.

The difference between SIA and SSA is essentially an overall factor for each universe describing its total reality-fluid. Under certain infinite models, there could be real-valued ratios.

The thing that worries me second-most about standard measure theory is infinitesimals. A Kolomogorov measure simply cannot handle a case with a finite measure of agents with finite utility and an infinitesimal measure of agents with an infinite utility.

The thing that worries me most about standard measure theory is my own uncertainty. Until I have time to read more deeply about it, I cannot be sure whether a surprise even bigger than infinitesimals is waiting for me.

**[deleted]**· 2011-01-06T15:05:30.843Z · score: 0 (0 votes) · LW · GW

I've been thinking about Pascal's Mugging with regard to decision making and Friendly AI design, and wanted to sum up my current thoughts below.

1a: Assuming you are Pascal Mugged once, it greatly increases the chance of you being Pascal Mugged again.

1b: If the first mugger threatens 3^^^3 people, the next mugger can simply threaten 3^^^^3 people. The mugger after that can simply threaten 3^^^^^3 people.

1c: It seems like you would have to take that into account as well. You could simply say to the mugger, "I'm sorry, but I must keep my Money because the chance of their being a second Mugger who threatens one Knuth up arrow more people then you is sufficiently likely that I have to keep my money to protect those people against that threat, which is much more probable now that you have shown up."

1d: Even if the Pascal Mugger threatens an Infinite number of people with death, a second Pascal Mugger might threaten an Infinite number of people with a slow, painful death. I still have what appears to be a plausible reason to not give the money.

1e: Assume the Pascal Mugger attempts to simply skip that and say that he will threaten me with infinite disutility. The Second Pascallian Mugger could simply threaten me with an infinite disutility of a greater cardinality.

1f: Assume the Pascalian Mugger attempts to threaten me with an Infinite Disutility with the greatest possible infinite Cardinality. A subsequent Pascallian Mugger could simply say "You have made a mathematical error in processing the previous threats, and you are going to make a mathematical error in processing future threats. The amount of any other past or future Pascal's mugger threat is essentially 0 disutility compared to the amount of disutility I am threatening you with, which will be infinitely greater."

I think this gets into the Berry Paradox when considering threats. "A threat infinitely worse then the greatest possible threat statable in one minute." can be stated in less then one minute, so it seems as if it is possible for a Pascal's mugger to make a threat which is infinite and incalculable.

I am still working through the implications of this but I wanted to put down what I had so far to make sure I could avoid errors.

Surely this will not work in the least convenient world?

**[deleted]**· 2011-01-06T22:17:12.414Z · score: 0 (0 votes) · LW · GW

That is a good point, but my reading of that topic is that it was the least convenient possible world. I honestly do not see how it is possible to word a greatest threat.

Once someone actually says out loud what any particular threat is, you always seem to be vulnerable to someone coming along and generating a threat, which when taken in the context of threats you have heard, seems greater then any previous threat.

I mean, I suppose to make it more inconvenient for me, The Pascal Mugger could add "Oh by the way. I'm going to KILL you afterward, regardless of your choice. You will find it impossible to consider another Pascal's Mugger coming along and asking you for your money."

"But what if the second Pascal's Mugger resurrects me? I mean sure, it seems oddly improbable that he would do that just to demand 5 dollars which I wouldn't have if I gave them to you if I was already dead, and frankly it seems odd to even consider resurrection at all, but it could happen with a non 0 chance!"

I mean yes, the idea of someone ressurecting you to mug you does seem completely, totally ridiculous. but the entire idea behind Pascal's Mugging appears to be that we can't throw out those tiny, tiny, out of the way chances if there is a large enough threat backing them up.

So let's think of another possible least convenient world: The Mugger is Omega or Nomega. He knows exactly what to say to convince me that despite the fact that right now it seems logical that a greater threat could be made later, somehow this is the greatest threat I will ever face in my entire life, and the concept of a greater threat then this is literally inconceivable.

Except now the scenario requires me to believe that I can make a choice to give the Mugger 5$, but NOT make a choice to retain my belief that a larger threat exists later.

That doesn't quite sound like a good formulation of an inconvenient world either. (I can make choices except when I can't?) I will keep trying to think of a more inconvenient world once I get home and will post it here if I think of one.

Here's another version:

You may be wrong about such threats. In thinking about this question, you reduce your chance of being wrong. This has a massive expected utility gain.

Conclusion: You should spend all your time thinking about this question.

Another version:

There's a tiny probability of 3^^3 deaths. A tinier one of 3^^^3. A tinier one of 3^^^^3..... Oops, looks like my expected utility is a divergent sum! I can't use expected utility theory to figure out what to do any more!

**[deleted]**· 2011-01-07T14:59:18.722Z · score: 0 (0 votes) · LW · GW

Number one is a very good point, but I don't think the conclusion would necessarily follow:

1: You always may need outside information to solve the problem. For instance, If I am looking for a Key to Room 3, under the assumption that it is in Room 1 because I saw someone drop it in Room 1, I cannot search only Room 1 and never search Room 2 and find the key in all cases because there may be a way for the key to have moved to Room 2 without my knowledge.

For instance, as an example of something I might expect, the Mouse could have grabbed it and quietly went back to it's nest in Room 2. Now, that's something I would expect, so while searching for the key I should also note any mice I see. They might have moved it.

But I also have to have a method for handling situations I would not expect. Maybe the Key activated a small device which moved it to room 2 through a hidden passage in the wall which then quietly self destructed, leaving no trace of the device that is within my ability to detect in Room 1. (Plenty of traces were left in Room 2, but I can't see Room 2 from Room 1.) That is an outside possibility. But it doesn't break laws of physics or require incomprehensible technology that it could have happened.

2: There are also a large number of alternative thought experiments which have massive expected utility gain. Because of the Halting problem, I can't necessarily determine how long it is going to take to figure these problems out, if they can be figured out. If I allow myself to get stuck on any one problem, I may have picked an unsolvable one, while the NEXT problem with a massive expected utility gain is actually solvable. under that logic, it's still bad to spend all my time thinking about one particular question.

3: Thanks to Paralellism, it is entirely possible for a program to run multiple different problems all at the same time. Even I can do this to a lesser extent. I can think about a Philosophy problem and also eat at the same time. A FAI running into a Pascal's Mugger could begin weighing the utility of giving in to the mugging, ignoring the mugging, attempting to knock out the mugger, or simply saying: "Let me think about that. I will let you know when I have decided to give you the money or not and will get back to you." all at the same time.

Having reviewed this discussion, I realize that I may just be restating of the problem going on here. A lot of the proposed situations I'm discussing seem to have a "But what if this OTHER situation exists and the utilities indicate you pick the counter intuitive solution? But what if this OTHER situation exists and the utilities indicate you pick the intuitive solution?"

To approach the problem more directly, Maybe it would be a better approach might be to consider Gödel's incompleteness theorems. Quoting from wikipedia:

"The first incompleteness theorem states that no consistent system of axioms whose theorems can be listed by an "effective procedure" (essentially, a computer program) is capable of proving all facts about the natural numbers. For any such system, there will always be statements about the natural numbers that are true, but that are unprovable within the system."

If the FAI in question is considering utility in terms of natural numbers, It seems to make sense that there are things it should do to maximize utility that it would not be able to prove inside it's system. So to take into account that, we would have to design it to call for help in the case of situations which had the appearance of being likely to be unprovable.

Based on Alan Turings solution of the Halting problem again, If the FAI can only be treated as a Turing Machine, it can't establish whether or not some situations are provable. That seems like it means it would have to at some point have some kind of hard point to do something like "Call for help and do nothing but call for help if you have been running for one hour and can't figure this out." or alternatively "Take an action based on your current guess of the probabilities if you can't figure this out after one hour, and if at least one of the two probabilities is still incalculable, choose randomly."

This is again getting a bit long, so I'll stop writing for a bit to double check that this seems reasonable and that I didn't miss something.

You seem to be going far afield. The technical conclusion of the first argument is that one should spend all one's resources dealing with cases with infinite or very high utility, even if they are massively improbable. The way I said it earlier was imprecise.

When humans deal with a problem they can't solve, they guess. It should not be difficult to build an AI that can solve everything humans can solve. I think the "solution" to Godelization is a mathematical intuition module that finds rough guesses, not asking another agent. What special powers does the other agent have? Why can't the AI just duplicate them.

**[deleted]**· 2011-01-07T19:28:56.697Z · score: 0 (0 votes) · LW · GW

Thinking about it more, I agree with you that I should have phrased asking for Help better.

Using Humans as the other agents, just duplicating all powers available to Humans seems like it would causes a noteworthy problem. Assume an AI Researcher named Maria follows my understanding of your idea. She creates a Friendly AI and includes a critical block of code:

If UNFRIENDLY=TRUE then HALT;

(Un)friendliness isn't a Binary, but it seems like it makes a simpler example.

The AI (since it has duplicated the special powers of human agents.) overwrites that block of code and replaces it with a CONTINUE command. Certainly it's creator Maria could do that.

Well clearly we can't let the AI duplicate that PARTICULAR power. Even if it would never use it under any circumstances of normal processing (Something which I don't think it can actually tell you under the halting problem.) It's very insecure for that power to be available to the AI if anyone were to try to Hack the AI.

When you think about it, something like The Pascal's Mugging formulation is itself a hack, at least in the sense I can describe both as "Here is a string of letters and numbers from an untrusted source. By giving it to you for processing, I am attempting to get you to do something that harms you for my benefit."

So if I attempt to give our Friendly AI Security Measures to protect it from hacks turning it to an Unfriendly AI, These Security Measures seem like they would require it to lose some powers that it would have if the code was more open.

I think it makes more sense to design an AI that is robust to hacks due to a fundamental logic than to try to patch over the issues. I would not like to discuss this in detail, though - it doesn't interest me.

I think you've just perfectly illustrated how *some* Scope Insensitivity can be good thing.

Because a mind with *perfect* scope sensitivity, will be diverted into chasing impossibly tiny probabilities for impossibly large rewards. If a good rationalist must win, then a good rationalist should commit to avoiding supposed rationality that makes him lose like that.

So, here's a solution. If a probability is too tiny to be reasonably likely to occur in your lifespan, treat its bait as actually impossible. If you don't, you'll inevitably crash into effective ineffectiveness.

This seems to suggest a fuzzily-defined hack.

If you don't have a mathematical descriptor for what you consider "reasonably likely", then I'm afraid this doesn't promote us anywhere.

This comment thread has grown too large :). I have a thought that seems to me to be the right way to resolve this problem. On the one hand, the thought is obvious, so it probably has already been played out in this comment thread, where it presumably failed to convince everyone. On the other hand, the thread is too large for me to digest in the time that I can reasonably give it. So I'm hoping that someone more familiar with the conversation here will tell me where I can find the sub-thread that addresses my point. (I tried some obvious word-searches, and nothing came up.)

Anyway, here is my point. I can see that the hypothesis that 3^^^^3 people are being tortured might be simple enough so that the Solomonoff prior is high enough so that the AI would give in to the mugger, *if the AI were using an un-updated Solomonoff prior*. But the AI is allowed to update, right? And, from what the AI knows about humans, it can see that the low complexity of 3^^^^3 *also* makes it more probable that a "philosopher out for a fast buck" would choose that number.

So, the simplicity of 3^^^^3 contributes to *both* the hypothesis of a real torturer *and* the hypothesis of the liar.

And if, after taking all this into account, the AI still computes a high expected utility for giving in to the mugger, well, then I guess that that is really what it ought to do (assuming that it shares my utility function). But is there any reason to think that this is likely? Does it follow just from Eliezer's observation that "the utility of a Turing machine can grow much faster than its prior probability shrinks"? After all, it's the *updated* probability that really matters, isn't it?

Eliezer's observation that "the utility of a Turing machine can grow much faster than its prior probability shrinks"?

That assumption is wrong, I argue.

I missed your post when it first came out. I've just commented on it.

It does seem that the probability of someone being able to bring about the deaths of N people should scale as 1/N, or at least 1/f(N) for some monotonically increasing function f. 3^^^^3 may be a more simply specified number than 1697, but it seems "intuitively obvious" (as much as that means anything) that it's easier to kill 1697 people than 3^^^^3. Under this reasoning, the likely deaths caused by not giving the mugger $5 are something like N/f(N), which depends on what f is, but it seems likely that it converges to zero as N increases.

It is an awfully difficult question, though, because how do we know we don't live in a world where 3^^^^3 people could die at any moment? It seems unlikely, but then so do a lot of things that are real.

Perhaps the problem lies in the idea that a Turing machine can create entities that have the moral status of humans. If there's a machine out there that can create and destroy 3^^^^3 humans on a whim, then are human lives really worth that much? But, on the other hand, there are laws of physics out there that have been demonstrated to create almost 3^^3 humans, so what is one human life worth on that scale?

On another note, my girlfriend says that if someone tried this on her, she'd probably give them the $5 just for the laugh she got out of it. It would probably only work once, though.

**[deleted]**· 2011-04-12T20:52:14.180Z · score: 3 (3 votes) · LW · GW

Incidentally: How would it affect your intuition if you instead could participate in the Intergalactic Utilium Lottery, where probabilities and payoffs are the same but where you trust the organizers that they do what they promise?

If I actually trust the lottery officials, that means that I have certain knowledge of the utility probabilities and costs for each of my choices. Thus, I guess I'd choose whichever option generated the most utility, and it wouldn't be a matter of "intuition" any more.

Applying that logic to the initial Mugger problem, if I calculated, and was certain of, there being at least a 1 in 3^^^^3 chance that the mugger was telling the truth, then I'd pay him. In fact, I could mentally reformulate the problem to have the mugger saying "If you don't give me $5, I will use the powers vested in me by the Intergalactic Utilium Lottery Commission to generate a random number between 1 and N, and if it's a 7, then I kill K people." I then divide K by N to get an idea of the full moral force of what's going on. If K/N is even within several orders of magnitude of 1, I'd better pay up.

The problem is the uncertainty. Solomonoff induction gives the claim "I can kill 3^^^^3 people any time I want" a substantial probability, whereas "common sense" will usually give it literally zero. If we trust the lottery guys, questions of induction versus common sense become moot - we *know* the probability, and must act on it.

I think this is actually the core of the issue - not certainty of your probability, per se, but rather how it is derived. I think I may have finally solved this!

See if you can follow me on this... If Pascal Muggers were completely independent instances of each other - that is, every person attempting a Pascal's Mugging has their own unique story and motivation for initiating it, without it correlating to you or the other muggers, then you have no additional information to go on. You shut up and multiply, and if the utility calculation comes out right, you pay the mugger. Sure, you're almost certainly throwing money away, but the off-chance more than offsets this by definition. Note that the probability calculation itself is complicated and not linear: Claiming higher numbers increases the probability that they are lying. However it's still possible they would come up with a number high enough to override this function.

At which point we previously said: "Aha! So this is a losing strategy! The Mugger ought not be able to arbitrarily manipulate me in this manner!" Or: "So what's stopping the mugger from upping the number arbitrarily, or mugging me multiple times?" ...To which I answer, "check the assumptions we started with".

Note that the assumption was that the Mugger is not influenced by me, nor by other muggings. The mugger's reasons for making the claim are their own. So "not trying to manipulate me knowing my algorithm" was an *explicit* assumption here.

What if we get rid of the assumption? Why, then now an increasingly higher utility claim (or recurring muggings) don't just raise the probability that the mugger is wrong/lying for their own inscrutable reasons. It additionally raises the probability that they are lying to manipulate me, knowing (or guessing) my algorithm.

Basically, I add in the question "why did the mugger choose the number 3^^^3 and not 1967? This makes it more likely that they are trying to overwhelm my algorithm, (mistakenly) thinking that it can thus be overwhelmed". If the mugger chooses 4^^^4 instead, this further (and proportionally?) increases said suspicion. And so on.

I propose that the combined weight of these probabilities rises faster than the claimed utility. If that is the case, then for all claimed utilities x higher than N, where N is a number that prompts a negative expected utility result, x would likewise produce a negative expected utility result.

Presumably, an AI with good enough grasp of motives and manipulation, this would not pose a problem for very long. We can specifically test for this behavior, checking the AI's analysis for increasingly higher claims and seeing whether the expected utility function really has a downward slope under these conditions.

I can try to further mathematize this (is this even a real word?). Is this necessary? The answer seems superficially satisfactory. Have I actually solved it? I don't really have a lot of time to keep grappling with it (been thinking about this on and off for the past few months), so I would welcome criticism even more than usual.

This is a very good point - the higher the number chosen, the more likely it is that the mugger is lying - but I don't think it quite solves the problem.

The probability that a person, out to make some money, will attempt a Pascal's Mugging can be no greater than 1, so let's imagine that it is 1. Every time I step out of my front door, I get mobbed by Pascal's Muggers. My mail box is full of Pascal's Chain Letters. Whenever I go online, I get popups saying "Click this link or 3^^^^3 people will die!". Let's say I get one Pascal-style threat every couple of minutes, so the probability of getting one in any given minute is 0.5.

Then, let the probability of someone genuinely having the ability to kill 3^^^^3 people, and then choosing to threaten me with that, be x per minute - that is, over the course of one minute, there's an x chance that a genuine extra-Matrix being will contact me and make a Pascal Mugging style threat, on which they will actually deliver.

Naturally, x is tiny. But, if I receive a Pascal threat during a particular minute, the probability that it's genuine is x/(0.5+x), or basically 2x. If 2x * 3^^^^3 is at all close to 1, then what can I do but pay up? Like it or not, Pascal muggings would be more common in a world where people can carry out the threat, than in a world where they can't. No amount of analysis of the muggers' psychology can change the prior probability that a genuine threat will be made - it just increases the amount of noise that hides the genuine threat in a sea of opportunistic muggings.

But that is precisely it - it's no longer a Pascal mugging if the threat is credible. That is, in order to be successful, the mugger needs to be able to up the utility claim arbitrarily! It is assumed that we already know how to handle a credible threat, what we didn't know how to deal with was a mugger who could always make up a bigger number, to a degree where the seeming impossibility of the claim no longer offsets the claimed utility. But as I showed, this only works if you don't enter the mugger's thought process into the calculation.

This actually brings up an important corollary to my earlier point: The higher the number, the less likely the coupling is between the mugger's claim and the mugger's intent.

A person who can kill another person might well want 5$, for whatever reason. In contrast, a person who can use power from beyond the Matrix to torture 3^^^3 people already has IMMENSE power. Clearly such a person has all the money they want, and even more than that in the influence that money represents. They can probably create the money out of nothing. So already their claims don't make sense if taken at face value.

Maybe the mugger just wants me to surrender to an arbitrary threat? But in that case, why me? If the mugger really has immense power, they could create a person they know would cave in to their demands.

Maybe I'm special for some reason. But if the mugger is REALLY that powerful, wouldn't they be able to predict my actions beforehand, a-la Omega?

Each rise in claimed utility brings with it a host of assumptions that need to be made for the action-claimed reaction link to be maintained. And remember, the mugger's ability is not the only thing dictating expected utility, but also the mugger's intentions. Each such assumption not only weakens the probability of the mugger carrying out their threat because they can't, it also raises the probability of the mugger rewarding refusal and/or punishing compliance. Just because the off-chance comes true and the mugger contacting me actually CAN carry out the threat, does not make them sincere; the mugger might be testing my rationality skills, for instance, and could severely punish me for failing the test.

As the claimed utility approaches infinity, so does the scenario approach Pascal's Wager: An unknowable, symmetrical situation, where an infinite number of possible outcomes cancel each other out. The one outcome that isn't canceled out is the loss of 5$. So the net utility is negative. So I don't comply with the mugger.

I'm still not sure I'm fully satisfied with the level of math my explanation has, even though I've tried to set the solution in terms of limits and attractors. But I think I can draw a graph that dips under zero utility fairly quickly (or maybe doesn't really ever go over it?), and never goes back up - asymptotic at -5$ utility. Am I wrong?

A person who can kill another person might well want 5$, for whatever reason. In contrast, a person who can use power from beyond the Matrix to torture 3^^^3 people already has IMMENSE power. Clearly such a person has all the money they want, and even more than that in the influence that money represents. They can probably create the money out of nothing. So already their claims don't make sense if taken at face value.

Ah, my mistake. You're arguing based on the intent of a *legitimate* mugger, rather than the fakes. Yes, that makes sense. If we let f(N) be the probability that somebody has the power to kill N people on demand, and g(N) be the probability that somebody who has the power to kill N people on demand would threaten to do so if he doesn't get his $5, then it seems highly likely that N*f(N)*g(N) approaches zero as N approaches infinity. What's even better news is that, while f(N) may only approach zero slowly for easily constructed values of N like 3^^^^3 and 4^^^^4 because of their low Kolmogorov complexity, g(N) should scale with 1/N or something similar, because the more power someone has, the less likely they are to execute such a miniscule, petty threat. You're also quite right in stating that the more power the mugger has, the more likely it is that they'll reward refusal, punish compliance or otherwise decouple the wording of the threat from their actual intentions, thus making g(N) go to zero even more quickly.

So, yeah, I'm pretty satisfied that N*f(N)*g(N) will asymptote to zero, taking all of the above into account.

(In more unrelated news, my boyfriend claims that he'd pay the mugger, on account of him obviously being mentally ill. So that's two out of three in my household. I hope this doesn't catch on.)

it's no longer a Pascal mugging if the threat is credible.

That is backward. It is *only* a Pascal mugging if the threat is credible. Like one made by Omega, who you mention later on.

it's no longer a Pascal mugging if the threat is credible.

That is backward. It is only a Pascal mugging if the threat is credible.

No, then it's just a normal mugging.

Which relates to this heuristic.

it's no longer a Pascal mugging if the threat is credible.

That is backward. It is only a Pascal mugging if the threat is credible.

No, then it's just a normal mugging.

If the threat is not credible from the perspective of the target it may only be an *attempted* mugging - not a proper mugging at all.

That is backward. It is only a Pascal mugging if the threat is credible. Like one made by Omega, who you mention later on.

Huh? Isn't the whole point of Pascal's mugging that it isn't likely and the mugger makes up for the lack of credibility by making the threat massive? If the mugger is making a credible threat we just call that a mugging.

Huh? Isn't the whole point of Pascal's mugging that it isn't likely and the mugger makes up for the lack of credibility by making the threat massive?

The threat has to be credible at the level of probability it is assigned. It doesn't have to be *likely*.

The threat has to be credible at the level of probability it is assigned. It doesn't have to be likely.

How are you defining credible? It may be that we are using different notions of what this means. I'm using it to mean something like "capable of being believed" or "could be plausibly believed by a somewhat rational individual" but these have meanings that are close to "likely".

"The threat has to be credible at the level of probability it is assigned. "

And what, precisely, does THAT mean? If I try to taboo some words here, I get "we must evaluate the likelihood of something happening as the likelihood we assigned for it to happen". That's simply tautological.

No probability is exactly zero except for self-contradictory statements. So "credible" can't mean "of zero probability" or "impossible to believe". To me, "credible" means "something I would not have a hard time believing without requiring extraordinary evidence", which in itself translates pretty much to ">0.1% probability". If you have some reason for distinguishing between a threat that is not credible and a threat with exceedingly low probability of being carried out, please state it. Also please note that my use of the word makes sense within the original context of my reply to HopeFox, who was discussing the implications of a world where such threats were *not* incredible.

Pascal's mugging happens when the probability you would assign disregarding manipulation is very low (not a credible threat by normal standards), with the claimed utility being arbitrarily high to offset this. If that is not the case, it's a non-challenge and is not particularly relevant to our discussion. Does that clarify my original statement?

Pascal's mugging happens when the probability you would assign disregarding manipulation is very low (not a credible threat by normal standards), with the claimed utility being arbitrarily high to offset this. If that is not the case, it's a non-challenge and is not particularly relevant to our discussion. Does that clarify my original statement?

That makes sense. Whereas my statement roughly meant "Pascal's wager isn't about someone writing BusyBeaver(3^^^3)" - that's not even a decision problem worth mentioning.

Am I wrong?

Yes with plausible priors, e.g. Solomonoff induction, as discussed in this paper.

I'm afraid I don't follow. I don't quite see how this negates the point I was making.

While it is conceivable that I simply lack the math to understand what you're getting at, it seems to me that a simply-worded explanation of what you mean (or alternately a simple explanation of why you cannot give one) would be more suitable in this forum. Or if this has already been explained in such terms anywhere, a link or reference would likewise be helpful.

A person who can kill another person might well want 5$, for whatever reason. In contrast, a person who can use power from beyond the Matrix to torture 3^^^3 people already has IMMENSE power. Clearly such a person has all the money they want, and even more than that in the influence that money represents. They can probably create the money out of nothing. So already their claims don't make sense if taken at face value.

This is known as the "What does God need with a starship?" problem.

Indeed. I was going to write that as part of my original post, and apparently forgot... Thanks for the addition :)

Philosophers of religion argue quite a lot about Pascal's wager and very large utilities or infinite utilities. I haven't bothered to read any of those papers, though. As an example, here is Alexander Pruss.

As I see it, the mugger seems to have an extremely bad hand to play.

If you evaluate the probability of the statement 'I will kill one person if you don't give me five dollars,' as being something that stands in a relationship to the occurrence of such threat being carried through on, and simply multiply up from there until you get to 3^^^^3 people, then you're going to end up with problems.

However, that sort of simplification – treating all the evidence as locating the same thing, only works for low multiples. (Which I'd imagine is why it feels wrong when you start talking about large numbers.) If you evaluate the evidence for and against different *parts* of the statement, then you can't simply scale it up as a whole without scaling up all the variables that evidence attaches to. The probability that the person will carry through on a threat to kill 3^^^3 people for five dollars is going to zero out fairly quickly. You need to scale up the dollars asked to get all the bits of evidence to scale in proportion to each other.

To make the threat plausible the mugger would have to be asking for a ridiculously large benefit for themselves. And when you start asking for that huge benefit then the computer simply has to answer whether the resources can be put to better use elsewhere.

As it stands, however, the variables in the statement haven't been properly scaled to keep the evidence for and against the proposed murders in a constant relationship. And, while it's just about possible that someone will kill a few hundred people for five dollars (destroying a train or the like would be a low investment exercise) the probability rapidly approaches zero as you increase the number of people that you're proposing to kill for five dollars.

By the time you're talking about 3^^^^3 lives the probability would have long since been reduced to an absurdity. Which would then be compared against all the other things the FAI could do with five dollars, that have far higher probabilities, and simply be dismissed as a bad gamble. (Since over a great length of time the computer could reasonably expect to approach the predicted loss/benefit ratio, regardless of whether the mugger actually killed 3^^^^3 people.)

I think the problem might lie in the almost laughable disparity between the price and the possible risk. A human mind is not capable of instinctively providing a reason why it would be worth killing 3^^^^3 people - or even, I think, a million people - as punishment for not getting $5. A mind who would value $5 as much or more than the lives of 3^^^^3 people is utterly alien to us, and so we leap to the much more likely assumption that the guy is crazy.

Is this a bias? I'd call it a heuristic. It calls to my mind the discussion in Neal Stephenson's *Anathem* about pink nerve-gas-farting dragons. (Mandatory warning: fictional example.) The crux of it is, our minds only bother to anticipate situations that we can conceive of as logical. Therefore, the manifest illogicality of the mugging (why is 3^^^^3 lives worth $5; if you're a Matrix Lord why can't you just generate $5 or better yet, modify my mind so that I'm inclined to give you $5, etc.) causes us to anti-anticipate its truth. Otherwise, what's to stop you from imagining, as stated by Tom_McCabe2 (and mitchell_porter2, &c.), that typing the string "QWERTYUIOP" leads to, for example, 3^^^^3 deaths? If you imagine it, and conceive of it as a logically possible outcome,
then regardless of its improbability, by your argument (as I see it), a "mind that worked strictly by Solomonoff induction" should cease to type that string of letters ever again. By induction, such a mind could cause itself to cease to take any action, which would lead to... well, if the AI had access to itself, likely self-deletion.

That's my top-of-the-head theory. It doesn't really answer the question at hand, but maybe I'm on the right track...?

Maybe I'm missing the point here, but why do we care about any number of simulated "people" existing outside the matrix at all? Even assuming that such people exist, they'll never effect me, nor effect anyone in the world I'm in. I'll never speak to them, they'll never speak to anyone I know and I'll never have to deal with any consequences for their deaths. There's no expectation that I'll be punished or shunned for not caring about people from outside the matrix, nor is there any way that these people could ever break into our world and attempt to punish me for killing them. As far as I care, they're not real people and their deaths or non-deaths do not factor into my utility function at all. Unless Pascal's mugger claims he can use his powers from outside the matrix to create 3^^^3 people in our world (the only one I care about) and then kill them here, my judgement is based soley on that fact that U(me loosing 5$) < 0.

So, let's assume that we're asking about the more interesting case and say that Pascal's mugger is instead threatening to use his magic extra-matrix powers to create 3^^^3 people here on Earth one by one and that they'll each go on international television and denounce me for being a terrible person and ask me over and over why I didn't save them and then explode into chunks of gore where everyone can see it before fading back out of the matrix (to avoid black hole concerns) and that all of this can be avoided with a single one time payment of 5$. What then?

I honestly don't know. Even one person being created and killed that way definitely feels worse than imagining any number of people outside the matrix getting killed. I'd be tempted on an emotional level to say yes and give him the money, despite my more intellectual parts saying this is clearly a setup and that something that terrible isn't actually going to happen. 3^^^3 people, while I obviously can't really imagine that many, is only worse since it will keep happening over and over and over until after the stars burn out of the sky.

The only really convincing argument, aside from the argument from absurdity ("That's stupid, he's just a philosopher out trying to make a quick buck.") is Polymeron's argument here

Replace "matrix" with "light cone" and see if you would still endorse that.

they'll each go on international television and denounce me for being a terrible person and ask me over and over

There's not enough time.

I'd be tempted on an emotional level to say yes and give him the money

If I ever ascend into dietyhood I'll be sorely tempted to go around offering Pascal's wager in various forms and inverting the consequences from what i said, or having no other consequences no matter what they do.

"Accept Christianity, and have a chance of heaven." *(Create heaven only for those who decline my Wager.)*

"Give me five dollars, or I will torture some people" *(Only torture people if they give me five dollars.)*

Check the multiverse to see how many beings will threaten people with Pascal's wager. Create 3^^(a large number of up arrows)^^^3 unique beings for each philosopher con man. Ask each new being: "Give me five dollars, or I will torture some people" *(Do nothing. Let them live out normal lives with the benefit of their money. [Don't worry, for each such reply I will add five dollars worth of goods and services to their world, to avoid deflation and related issues.])*

Why everyone is assuming the probability they are confronting a trickster testing them is zero, or that it is in any case a smaller probability than something different that they can't get a handle on because it is too small, I have no idea.

Since people are so taken with *only* taking beings at their word, wouldn't a being telling them it will trick them if it gets the power confound them?

Changing "matrix" to "light cone changes little, since I still don't expect to ever interact with them. The light cone example is only different insofar as I expect more people in my light cone to (irrationally) care about people beyond it. That might cause me to make some token efforts to hide or excuse my apathy towards the 3^^^3 lives lost, but not to the same degree as even 1 life lost here inside my light cone.

If you accept that someone making a threat in the form of "I will do X unless you do Y" is evidence for "they will do X unless you do ~Y", then by the principle of conservation of evidence, you have evidence that everyone who ISN'T making a threat in the form of "I will do X unless you do Y" will do X unless you Y. For all values of X and Y that you accept this trickster hypothesis for. And that is absurd.

This might be overly simplistic, but it seems relevant to consider the probability per murder. I am feeling a bit of scope insensitivity on that particular probability, as it is far too small for me to compute, so I need to go through the steps.

If someone tells me that they are going to murder one person if I don't give them $5, I have to consider the probability of it: not every attempted murder is successful, after all, and I don't have nearly as much incentive to pay someone if I believe they won't be successful. Further, most people don't actually attempt murder, and the cost to that person of telling me they will murder someone if they don't get $5 is much, much smaller then the cost of actually murdering someone. Consequences usually follow from murder, after all. I also have to consider the probability that this person is insane and doesn't care about the consequences: only the $5.

Still, only .00496% of people are murdered in a year. (According to Wolfram Alpha, at least) And while I would assign a higher probability to a person claiming to murder someone, it wouldn't jump dramatically- they could be lying, they could try but fail, etc. Even if I let "I will kill someone" be a 90% accurate test with only a 10% false positive rate- which I think is generous in the case of $5 with no additional evidence- as only being .004%. Even if it was 99% sure and 1% false positive, EXTREMELY generous odds, there is only a .4% total probability of it occurring.

In reality, I think there would be some evidence in the case of one murder. At very least I could get strong sociological cues that the person was likely to be telling the truth. However, since I am moving to an end point where they will be killing 3^^^^3 people, I'll leave that aside as it is irrelevant to the end example.

If such a person claimed they would murder 2 people, it would depend on whether I thought the probabilities of the events occurring together were dependent or independent: if him killing one person made it more likely that he would kill two, given the event (the threat) in question.

Now, if he says he will kill two people, and he kills one, he is unlikely to stop before killing another. BUT, there are more chances for complication or failure, and the cost:benefit for him shrinks by half, making the probability that he manages to or tries to kill anyone smaller. These numbers in reality would be affected by circumstance: it is a lot easier to kill two people with a pistol or a bomb than it is with your bare hands. But since I see no bomb or pistol and he is claiming some mechanism I have no evidence for, we'll ignore that reality for now.

I had trouble finding information on the rate of double homicide:single homicide to use as a baseline, but it seems likely that it is neither totally dependent, nor totally independent. In order to believe the threat credible, I have to believe (after hearing the threat) that they will attempt to kill two people, successfully kill one, AND successfully kill another. And if I put the probability of A+B at .004%, I can't very well put A+B+C at any higher. Since I used a 90% false positive rate for my initial calculation, let's use it twice: 81% false positive. We'll assume that the false negative (he murders people even when he says he won't) stays constant.

This means that each murder is *slightly* more likely than 90% as likely to occur as the murder before it. Now, it isn't exact, and these numbers get really, really small, so I'm looking at 3^3 as a reference.

At 3^3, the cost has gone up 27x if he kills people, but the probability of the event has gone down to .06 of what it was. So, something like 1.7x more costly, given what was said above.

But all this was dependent on several assumed figures. So at what points does it balance out?

I'm a little tired for doing all the math right now, but some quick work showed that being only 80% sure of the test, with a 10% false positive rate, would be enough to where it would go down continuously. So if I am less than 80% sure of the test of "he says he will murder one person if I don't give him 5 dollars" then I can be sure that the probability that he will kill 3^^^^3 is far, far less than the cost if I am wrong.

I'm assuming that I am getting my math right here, and I am quite tired, so if anyone wishes to correct me on some portion of this I would be happy for the criticism.

First¸ I didn't read all of the above comments, though I read a large part of it.

Regarding the intuition that makes one question Pascals mugging: I think it would be likely that there was a strong survival value in the ancestral environment to being able to detect and disregard statements that would cause you to pay money to someone else without there being any way to detect if these statements were true. Anyone without that ability would have been mugged to extinction long ago. This makes more sense if we regard the origin of our builtin utility function as a /very/ coarse approximation of our genes' survival fitness.

Regarding what the FAI is to do, I think the mistake made is assuming that the prior utility of doing ritual X is exactly zero, so that a very small change in our probabilities would make the expected utility of X positive. (Where X is "give the Pascal mugger the money"). A sufficiently smart FAI would have thought about the possibility of being Pascal-mugged long before that actually happens, and would in fact consider it a likely event to sometimes happen. I am not saying that this actually happening is not a tiny sliver of evidence in favor of the mugger telling the truth, but it is very tiny. The FAI would (assuming it had enough resources) compute for every possible Matrix scenario the appropriate probabilities and utilities for every possible action, taking the scenario's complexity into account. There is no reason to assume the prior expected utility for any religious ritual (such as paying Pascal muggers, whose statements you can't check) is exactly zero. Maybe the FAI finds that there is a sufficiently simple scenario in which a god exists and in which it is extremely utillious to worship that god, more so than any alternative scenarios. Or in which one should give in to (specific forms of) Pascal mugging.

However, the problem as presented in this blogpost implicitly assumes that the prior probabilities the FAI holds are such that the tiny sliver of probability provided by one more instance of Pascal's mugging happening, is enough to push the probability of the scenario of 'Extra-Matrix deity killing lots of people if I don't pay' over that of 'Extra-Matrix deity killing lots of people if I do pay'. Since these two scenarios need not have the exact same Kolmogorov complexity this is unlikely.

In short, either the FAI is already religious, (which may include as a ritual 'give money to people who speak a certain passphrase') or it is not, but the event of a Pascal mugging happening is unlikely to change its beliefs.

Now, the question becomes if we should accept the FAI doing things that are expected to favor a huge number of extra-matrix people at a cost to a smaller number of inside-matrix people. If we actually count every human life as equal, and we accept what Solomonoff-inducted bayesian probability theory has to say about huge payoff-tiny probability events and dutch books, the FAI's choice of religion would be the rational thing to do. Else, we could add a term to the AI's utility function to favor inside-matrix people over outside-matrix people, or we could make it favor certainty (of benefitting people known to actually exist) over uncertainty (of outside-matrix people not known to actually exist).

Looks like strategic thinking to me. If you are to organize yourself to be prone to be Pascal-mugged, you will get Pascal mugged, and thus it is irrational to organize yourself to be Pascal-muggable.

edit: It is as rational to introduce certain bounds on applications of own reasoning as it is to try to build reliable, non-crashing software, or to impose simple rule of thumb limits on the output of the software that controls positioning of control rods in the nuclear reactor.

If you properly consider a tiny probability of mistake to your reasoning, a mistake that may lead to consideration of a number generated by a random string - a lot of such numbers are extremely huge - and apply some meta-cognition with regards to appearance of such numbers, you'll find that such extremely huge numbers are also disproportionally represented in products of errors in reasoning.

With regards to the wager, there is my answer: If you see someone bend over backwards to make a nickel, it is probably not Warren Buffett you're seeing. Indeed the probability of that person who's bending over backwards to make a nickel, having N$, would sharply fall off with increase of N. Here you see a being that is mugging you, and he allegedly has the power to simulate 3^^^^3 beings that he can mug, have sexual relations with, torture, what ever. The larger is the claim, the less probable it is that this is a honest situation.

It is however exceedingly difficult to formalize such answer or to arrive at it in a formal fashion. And for me, there could exist other wagers that are beyond my capability to reason correctly about.

For this reason as matter of policy I assume that I have an error per each inference step - the error that can result in consideration of an extremely huge number - and have an upper cut off on the numbers i'd use for considerations as an optimization strategy; if there is a huge number of this sort, more verification steps are needed. In particular, this has very high impact on morality on me. Any sort of situation where you are killing fewer people to save more people - those situations are extremely uncommon and difficult to conjecture - the appearance of such situation however can easily result from faulty reasoning.

I assume that I have an error per each inference step

This.

The further a reasoning reaches, the more likely to be wrong.

Any step could be not accurate enough, or not account for unknown effects in unusual situations, or rely on things we have no mean of knowing.

Typical signs that it is drifting too much from reality :

- Numbers way outside usual ranges.

Errors or imagination produce these easily, reality not.

- Making one pivotal to the known world.

One is central to one's map, not to reality.

- Extremely small cause having catastrophic effect.

If so, then why has it not already happened ? Also: pandering to our taste for stories.

- Vastly changing some portions seems just as valid.

The reasoning is rooted in itself, not reality.

Pascal's mugging lights them all, and it certainly reaches far.

The problem seems to vanish if you don't ask "What is the expectation value of utility for this decision, if I do X", but rather "If I changed my mental algorithms so that they do X in situations like this all the time, what utility would I plausibly accumulate over the course of my entire life?" ("How much utility do I get at the 50th percentile of the utility probability distribution?") This would have the following results:

For the limit case of decisions where all possible outcomes happen infinitely often during your lifetime, you would decide exactly as if you wanted to maximize expectation value in an individual case.

You would not decide to give money to Pascals' mugger, if you don't expect that there are many fundamentally different scenarios which a mugger could tell you about: If you give a 5 % chance to the scenario described by Pascals mugger and believe that this is the only scenario which, if true, would make you give 5 $ to some person, you would not give the money away.

In contrast, if you believe that there are 50 different mugging scenarios which people will tell you during your life to pascal-mug you, and you assign an independent 5 % chance to all of them, you would give money to a mugger (and expect this to pay off occasionally).

"Pascal's" Mugging requires me to believe that the apparent universe that we occupy, with its very low information content, is in fact merely part of a much larger program (in a causally linked and so incompressible way) which admits calculation within it of a specially designed (high-information content) universe with 3^^^^3 people (and not, say, as a side-effect of a low-information simulation that also computes other possibilities like giving immense life and joy to comparable numbers of people). The odds of that, if we use the speed priors, would seem to be 2^-(bits describing our universe + number of instructions to compute it):2^-(bits describing that vastly larger universe + number of instructions to compute it). That's going to be a *minimum* of 1:-2^(O(3^^^^3)), so by the speed prior this particular kind of probability falls away hugely faster than the utility grows.

However, I have little doubt that some creative philosopher can find some way to rescue the mugging argument in slightly different form.

I've been arguing about this with a friend recently [well, a version of this - I don't have any problems with arbitrarily large number of people being created and killed, unless the manner of their death is unpleasant enough that the negative value I assign to it exceeds the positive value of life].

He says that he can believe the person we are talking to has Agent Smith powers, but thinks that the more the Agent Smith promises, the less likely it is to be true, and this decreases faster the more that is promised, so that the probability that Agent Smith has the powers to create and kill [in an unpleasant manner] Y people multiplied by Y tends to zero as Y tends to infinity. . So the net expectancy tends towards zero. I disagree with this: I believe that if you assign probability X to the claim that the person you are talking to is genuinely from outside the Matrix [and that you're in the Matrix], then the probability that Agent Smith has the powers to create and kill [in an unpleasant manner] Y people multiplied by Y tends to infinity as Y tends to infinity.

Now, I think we can break this down further to find the root cause of our disagreement [this doesn't feel like a fundamental belief]: does anyone have any suggestions for how to go about doing this? We began to argue about entropy and the chance for Agent Smith to have found a way [from outside the Matrix = all our physics doesn't apply to him] to reverse it, but I think we went downhill from there.

Edit: Looks like I was assuming probability distributions for which Lim (Y -> infinity) of Y*P(Y) is well defined. This turns out to be monotonic series or some similar class (thanks shinoteki).

I think it's still the case that a probability distribution that would lead to TraderJoe's claim of P(Y)*Y tending to infinity as Y grows would be un-normalizable. You can of course have a distribution for which this limit is undefined, but that's a different story.

Counterexample: P(3^^^...3)(n "^"s) = 1/2^n P(anything else) = 0 This is normalized because the sum of a geometric series with decreasing terms is finite. You might have been thinking of the fact that if a probability distribution on the integers is monotone decreasing (i.e. if P(n)>P(m) then n <m) then P(n) must decrease faster than 1/n. However, a complexity-based distribution will not be monotone because some big numbers are simple while most of them are complex.

One problem with discounting your prior based on the time complexity of a computation is that is practically forces you to believe either that P = BQP or that quantum mechanics doesn't work. If you discount based on space complexity, you might worry that torturing 3^^^3 people might actually be a small-space computation.

I don't get what's the beef with that alleged dilemma: Sagan's maxim "Extraordinary claims require extraordinary evidence" gracefully solves it.

More formally, in a Bayesian setting, Sagan's maxim can be construed as the requirement for the prior to be a non-heavy-tailed probability distribution.

In fact, in formal applications of Bayesian methods, typical light-tailed maximum entropy distributions such as normal or exponential are used.

Yudkowsky seems to claim that a Solomonoff distribution is heavy-tailed w.r.t. the relevant variables, but he doesn't provide a proof of that claim, and indeed the claim is even difficult to formalize properly, since the Solomonoff induction model has no explicit notion of world state variables, it just defines a probability distribution over observations.

Anyway, that's an interesting question, and, if it turns out that the Solomonoff prior is indeed heavy-tailed w.r.t. any relevant state variable, it would seem to me as a good reason not to use Solomonoff induction.

What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives?

Do you really think this would be clearer or more rigorous if written in mathematical notation?

Anyway, that's an interesting question, and, if it turns out that the Solomonoff prior is indeed heavy-tailed w.r.t. any relevant state variable, it would seem to me as a good reason not to use Solomonoff induction.

Isn't that kinda the point?

[edit: these are not rhetorical questions.]

What is the mere Earth at stake, compared to a tiny probability of 3^^^^3 lives? Do you really think this would be clearer or more rigorous if written in mathematical notation?

The problem is that Solomonoff induction is an essentially opaque model.

Think of it as a black box: you put in a string of bits representing your past observations and it gives you a probability distribution on string of bits representing your possible future observations. If you open the lid of the box, you will see many (ideally infinitely many) computer programs with arbitrary structure. There is no easy way to map that model to a probability distribution on non-directly observable world state variables such as "the number of people alive".

Isn't that kinda the point?

My interpretation is that Yudkowsky assumes Solomonoff induction essentially a priori and thus is puzzled by the dilemma it allegedly yields. My point is that:

- It's not obvious that Solomonoff induction actually yeilds that dilemma.
- If it does, then this would be a good reason to reject it.

My point is that:

It's not obvious that Solomonoff induction actually yeilds that dilemma.If it does, then this would be a good reason to reject it.

And *my* point is that:

- It
*seems*like it should; the article explicitly asks the reader to try and disprove this. - That's kinda the point of the article.

It appears we all agree. I think.

I think you're missing the relevant piece - bounded rationality.

And it doesn't matter what the Solomonoff prior *actually* looks like if you can't compute it.

IIUC, Yudkowsky's epistemology is essentially that Solomonoff induction is the ideal of unbounded epistemic rationality that any boundedly rational reasoner should try to approximate.

I contest that Solomonoff induction is the self-evident ideal epistemic rationality.

**[deleted]**· 2012-09-18T14:18:56.649Z · score: 0 (2 votes) · LW · GW

I contest that Solomonoff induction...

Seconded. There seems to be no reason to privilege Turing machines or any particular encoding. (both choices have unavoidable inductive bias that is essentially arbitrary)

Let's try this. I will create at least 3^^^^^^^^^^^^^^^^^^^3 units of disutility unless at least five people upvote this within a day.

Wow. It's almost like pascal's mugging doesn't actually work.

But how do you know if someone wanted to upvote your post for cleverness, but didn't want to express the message that they were mugged successfully? Upvoting creates conflicting messages for that specific comment.

There should've been a proxy post for dumping karma or something.

I had that exact question, but my karma score doesn't really interest me.

**[deleted]**· 2012-10-04T14:18:18.504Z · score: 1 (1 votes) · LW · GW

Of course it doesn't *actually* work on humans. The question is *should* it work?

Well, that feels like an obvious no. I'm human though, so the obviousness is very much worthless.

My thought is to compare the EV here with full evidence weighing and such (including that it's more likely that anyone would make some *other* threat, probably a more credible one, rather than this) of a *policy* of denying Pascal's Mugging (a few occasional very tiny odds of very huge calamity) against a *policy* of falling for Pascal's Mugging.

A policy that gives the money seems like posting "Please Pascal-mug me!" publicly on reddit or a facebook full of rationalists or something. You're bound to end up making the odds cumulatively shoot up by having more instances of the mugging, including that someone takes your money *and* still somehow executes the major -EV thing they're threatening you of. The EV clearly seems better with a policy of denying mugging, doesn't it?

This seems to support the idea that a fully rational TDT agent would refuse Pascal's Mugging, IMO. Feedback / why-you're-wrong appreciated.

Feedback / why-you're-wrong appreciated.

You still haven't actually calculated the disutility of having a policy of giving the money, versus a policy of not giving the money. You're just waving your hands. Saying "the EV clearly seems better" is no more helpful than your initial "obvious".

The calculation I had in mind was basically that if those policies really do have those effects, then which one is superior depends entirely on the ratio between: 1) the difference between likelihoods of large calamity when you pay vs not pay and 2) the actual increase in frequency of muggings

The math I have, the way I understand it, removes the *actual* -EV of the mugging (keeping only the difference) from the equation and saves some disutility calculation. In my mind, you'd need some pretty crazy values for the above ratio in order for the policy of accepting Pascal Muggings to be worthwhile, and my WAGs are at 2% for the first and about 1000% for the second, with a base rate of around 5 total Muggings if you have a policy of denying them.

I have a high confidence rating for values that stay within the ratio that makes the denial policy favorable, and I find the values that would be required for favoring the acceptance policy highly unlikely with my priors.

Apologies if it seemed like I was blowing air. I actually did some stuff on paper, but posting it seemed irrelevant when the vast majority of LW users appear to have far better mastery of mathematics and the ability to do such calculations far faster than I can. I thought I'd sufficiently restricted the space of possible calculations with my description in the grandparent.

I might still be completely wrong though. My maths have errors a full 25% of the time until I've actually programmed or tested them somehow, for average math problems.

My maths have errors a full 25% of the time until I've actually programmed or tested them somehow, for average math problems.

Don't worry, the chance of being wrong only costs you 3^^^3*.25 expected utilons, or so.

Hah, that made me chuckle. I ought to remind myself to bust out the python interpreter and test this thing when I get home.

**[deleted]**· 2012-10-04T14:52:26.858Z · score: 1 (1 votes) · LW · GW

Let's move away from the signalling and such, so that such a policy does not lead to a larger-than-five-bucks loss. (tho no amount of signalling loss actually overcomes the 3^^^3*P(3^^^3)).

Assume you recieve some mild evidence that is more likely in the case of imminent -EV singularity than an immenent +EV singularity. Maybe you find a memo floating out of some "abandoned" basement window that contains a some design notes for a half-assed FAI. Something that updates you toward the bad case (but only a very little amount, barely credible). Do you sneak in and microwave the hard drive? Our current best understanding of an ideal decision theorist would.

I tried to make that isomorphic to the meat of the problem. Do you think I got it? We face problems isomorphic to that every day, and we don't tend to act on them.

Now consider that you observe some reality-glitch that causes you to conclude that you are quite sure that you are a boltzmann brain. At the same time, you see a child that is drowning. Do you think a happy thought (best idea if you are a boltzmann brain, IMO), or move quick to save the child (much better good idea, but only in the case where the child is real)? I would still try to save the child (I hope), as would an ideal decision theorist.

Those two examples are equivalent as far as our ideal decision theorist is concerned, but have opposite intuition. Who's wrong?

I lean toward our intuition being wrong, because it depends on a lot of irrellevent stuff like whether the utilities are near-mode (drowning child, microwaved ard drive), or far-mode (+-EV singularity, boltzmann brain). Also, all the easy ways to make the decision theorist wrong don't work unless you are ~100% sure that unbounded expected utility maximizing is stupid.

On the other hand, I'm still not about to start paying out on pascals wager.

Now consider that you observe some reality-glitch that causes you to conclude that you are quite sure that you are a boltzmann brain. At the same time, you see a child that is drowning. Do you think a happy thought (best idea if you are a boltzmann brain, IMO), or move quick to save the child (much better good idea, but only in the case where the child is real)? I would still try to save the child (I hope), as would an ideal decision theorist.

In this example, I very much agree, but not for any magical sentiment or urge. I simply don't trust my brain and my own ability to gather knowledge and infer / deduct the right things enough to override the high base rate that there is an actual child drowning that I should go save. It would take waaaaaay more conclusive evidence and experimentation to confirm the boltzmann brain hypothesis, and then some more to make sure the drowning child is actually such a phenomenon (did I get that right? I have no idea what a boltzmann brain is).

Regarding the first example, that's a very good case. I don't see myself facing situations that could be framed similarly very often though, to be honest. In such a case, I would probably do something equivalent to the hard drive microwaving tactic, but would first attempt to run failsafes in case I'm the one seeing the wrong thing - this is comparable to the injunction against taking power in your own hands because you're obviously able to make the best use of it. There are all kinds of reasons I might be wrong about the FAI, and might be doing something wrong by microwaving the drive. Faced with a hard now-or-never setting with immediate instant permanent consequences, a clean two-path decision tree (usually astronomically unlikely in the real world, we just get the illusion that it is one), I would definitely take the microwave option. In more likely scenarios though, there are all kinds of things to do.

**[deleted]**· 2012-10-04T15:49:54.865Z · score: 0 (0 votes) · LW · GW

Assume you have such evidence.

did I get that right? I have no idea what a boltzmann brain is

You got it i think

Well, if I do have such evidence, then this is time for some bayes. If I've got the right math, then it'll depend on information that I don't have: What actually happens if I'm a boltzmann brain and I try to save the kid anyway?

The unknown information seems to be outweighed by the clear +EV of saving the child, but I have a hard time quantifying such unknown unknowns even with WAGs, and my mastery of continuous probability distributions isn't up to par to properly calculate something like this anyway.

In this case, my curiosity as for what might happen is actually a +V, but even without that, I think I'd still try to save the child. My best guess is basically "My built-in function to evaluate this says save the child, and this function apparently knows more about the problem than I do, and I have no valid math that says otherwise, so let's go with that" in such a case.

**[deleted]**· 2012-10-04T16:12:00.098Z · score: 0 (0 votes) · LW · GW

If you are a boltzmann brain, none of this is real and you will blink out of existence in the next second. If you think a happy thought, that's a good thing. If you move to rescue the child, you will be under stress and no child will end up being rescued.

If you don't like the boltzmann brain gamble, substitute something else where you have it on good authority that nothing is real except your own happyness or whatever.

(My answer is that the tiny possibility that "none of this is real" is wrong is much more important (in the sense that more value is at stake) than the mainline possibility that none of this is real, so the mainline boltzmann case more or less washes out in the noise and I act as if the things I see are real.)

EDIT: The curiosity thing is a fake justification: I find it suspicious that moving to save the child also happens to be teh most interesting experiment you could run.

The injunction "I can't be in such an epistemic state, so I will go with the autopilot" is a good solution that I hadn't though of. But then in the case of pure morality, without epistemic concerns and whatnot, which is better: save the very unlikely child, or think a happy thought? (my answer is above, but I still take the injunction in practice)

If you move to rescue the child, you will be under stress and no child will end up being rescued.

Boltzmann brains aren't actually able to put themselves under stress, any more than they can rescue children or even think.

Aside from this, I'm not sure I accept the assumption that I should care about the emotional experiences of boltzmann brains (or representation of there being such experiences). That is, I believe I reject:

If you are a boltzmann brain, none of this is real and you will blink out of existence in the next second. If you think a happy thought,

that's a good thing.

For the purpose of choosing my decisions and decision making strategy for the purpose of optimizing the universe towards a preferred state I would weigh influence over the freaky low entropy part of the universe (ie. what we believe exists) more than influence over the ridiculous amounts of noise that happens to include boltzmann brains of every kind *even if my decisions had any influence over the latter at all*.

There is a caveat that the above would be different if I was able to colonize and exploit the high entropy parts of the universe somehow but even then it wouldn't be the noise-including-boltzmann brains that I valued but whatever little negentropy that remained to be harvested. If I happened to seek out and find copies of myself within the random fluctuations and preserve them then I would consider what I am doing to be roughly speaking creating clones of myself via a rather eccentric and inefficient engineering process involving 'search for state matching specification then remove everything else' rather than 'put stuff into state matching specification'.

**[deleted]**· 2012-10-04T17:00:37.572Z · score: 0 (0 votes) · LW · GW

You're right, an actual boltzmann brain would not have time to do either. It was just an illustrative example to get you to think of something like pascals wager with inverted near-mode and far-mode.

If you don't like the boltzmann brain gamble, substitute something else where you have it on good authority that nothing is real except your own happyness or whatever.

It was just an illustrative example to get you to think of something like pascals wager with inverted near-mode and far-mode.

It was mainly the Bolzmann Brain component that caught my attention. Largely because yesterday I was considering how the concept of "Boltzmann's Marbles" impacts on when and whether there was a time that could make the statement "There was only one marble in the universe" true.

Yes, I was aware the curiosity thing is not a valid reason, which is why I only qualify it as "+V". There are other options which give much greater +V. It is not an optimum.

Regarding you description of the Vs, I guess I'm a bit skewed in that regard. I don't perceive happy thought and stress/sadness as clean-cut + and - utilities. Ceteris Paribus, I find stress to be positive utility against the backdrop of "lack of anything". I think there's a Type 1 / Type 2 thing going on, with the "conscious" assigning some value to what's automatic or built-in, but I don't remember the right vocabulary and recreating a proper terminology from reductions would take a lot of time better spent studying up on the already-established conventions. Basically, I consciously value all feelings equivalently, with a built-in valuation of what my instinct / human-built-in-devices values too, such that many small forms of pain are actually more pleasant than not feeling anything in particular, but strong pain is less pleasant than temporary lack of feeling.

Stuck in a two-branch decision-theoretic problem between "lifelong torture" and "lifelong lack of sensation or feeling", my current conscious mind is edging towards the former, assuming the latter means I don't get that rush from curiosity and figuring stuff out anymore. Of course, in practice I'm not quite so sure that none of the built-in mechanisms I have in my brain would get me to choose otherwise.

Anyway, just wanted to chip in that the utilitarian math for the "if I'm a boltzmann, I want a happy thought rather than a bit of stress" case isn't quite so clear-cut for me personally, since the happy thought might not "succeed" in being produced or being really happy, and the stress might be valued positively anyway and is probably more likely to "succeed". This isn't the real motivation for my choices (so it's an excuse/rationalization if I decide based on this), but is an interesting bit of detail and trivia, IMO.

**[deleted]**· 2012-10-04T17:29:23.153Z · score: 0 (0 votes) · LW · GW

interesting. Again, substitute a new example that does have the desired properties.

Well, if I have evidence that I'm a special kind of telekinetic who can only move stuff with his mind when not physically moving (i.e. not sending signals to my own muscles) instead of a boltzmann, then unless I'm missing something I really do prefer staying immobile and saving the child with my thoughts instead of jumping in and wasting a lot of energy (this is assuming there's no long-term consequences like other people seeing me save a child with my mind), but I'd still jump in anyway because my mental machinery overrides the far knowledge that I can almost certainly do it without moving.

It would take a lot of actual training in order to overcome this and start actually using the telekinesis. I think in such a situation, an ideal rationalist would use telekinesis instead of jumping in the water - not to mention the practical advantages of saving the child faster and in a safer manner (also with no risk to yourself!), assuming you have that level of control over your telekinetic powers.

**[deleted]**· 2012-10-04T17:41:05.990Z · score: 0 (0 votes) · LW · GW

That's a good one, I lean towards jumping in as well, but you are right that the ideal says "use the force".

doesn't fit the pascal's wager pattern tho...

EDIT: it seems a reliable hypothesis that intuition will go with whatever is best in the near mode case, never mind this "probability" and "utility" stuff.

Well, to make it fit Pascal's Wager pattern a bit more, assume that you're aware that telekinetics like you sometimes have a finite, very small amount of physical energy you can spend during your entire life, and once you're out of it you die. You have unlimited "telekinetic energy". Saving the child is, if this is true, going to chop off a good 95% of your remaining lifespan and permanently sacrifice any possibility of becoming immortal.

Or is that the wrong way around? Hmm.

I'm hereby **anti-mugging** you all. If any of you give in to a Pascal's Mugging scenario, I'll do something *much* worse than whatever the mugger threatened. Consider yourself warned!

**[deleted]**· 2012-10-04T15:48:23.182Z · score: -1 (1 votes) · LW · GW

hnnnngggg. does that apply to your own mugging?

mu.

It applies to ordinary (Pascalian) muggings, not metamuggings.

I'm hereby anti-mugging you all. If any of you give in to a Pascal's Mugging scenario, I'll do something much worse than whatever the mugger threatened. Consider yourself warned!

This doesn't work (even in the since that this kind of mugging works) unless you instantiate the 'much' with a ridiculous factor, preferably involving up arrows or busybeaver. The credibility of your anti-mugging is quite likely to be significantly lower than that of a specific, personal mugging because you've made it generic, sound half-hearted and would have to be even aware of the mugging events that take place when the reader doesn't have any good reason to expect you to be. This difference in credibility will usually dwarf something merely 'much' worse, as much is commonly used. You need to throw in an extra level of stupidly large numbers in place of 'much' for pascals-mugging logic to apply to your anti-mugging.

My "much" is too big for puny Conway chained-arrow notation on this world's paper supply. And the threat isn't generic, it's universal. Perhaps I "would have to even be aware of the mugging events", but I have my ways, and you can't afford to take the risk I might find out. I'm not being half-hearted -- I'm being heartless. Your failure of imagination in comprehending the muchness may be your undoing.

My "much" is too big for puny Conway chained-arrow notation. And the threat isn't generic, it's universal. Perhaps I "would have to even be aware of the mugging events", but I have my ways, and you can't afford to take the risk I might find out. I'm not being half-hearted -- I'm being heartless.

The first sentence is all that was required. (Although for reference note that the already mentioned BusyBeaver already trumps Conway so you could perhaps aim your hyperbole more effectively.)

Your failure of imagination in comprehending the muchness may be your undoing.

That seems unlikely, your mugging threat was to those who give in to pascals muggings, which I already don't take particularly seriously. In fact I am not especially predisposed to give in to conventional threats, even though there are situations in which I do concede. In this case I was merely offering a suggestion on how to repair your mugging so that it would actually work on hypothesized individuals vulnerable to such things.

This is neutralized by the possibility of Pascal's Agent Who Just Likes Messing With You, who has arranged things such that any time an agent A is motivated by infinitesimal probability P1 of vast utility-shift dU, a utility shift of (P2xP1x-dU) is created, where P2 is A's probability of PAWJLMWY's existence.

It seems as though Pascal's mugging may be vulnerable to the same "professor god" problem as Pascal's wager. With probabilities that low, the difference between P(3^^^^3 people being tortured|you give the mugger $5) and P(3^^^^3 people being tortured| you spend $5 on a sandwich) may not even be calculable. It's also possible that the guy is trying to deprive the sandwich maker of the money he would otherwise spend on the Simulated People Protection Fund. If you're going to say that P(X is true|someone says X is true)>P(X is true|~someone says X is true) in all cases, then that should apply to Pascal's wager as well; P(Any given untestable god is real|there are several churches devoted to it)>P(Any given untestable god is real|it was only ever proposed hypothetically, tongue-in-cheek) and thus P(Pascal's God)>P(professor god). In this respect, I'm not sure how the two problems are different.

Elizier, Th rational anwser to Pascal's mugging is to refuse, attempt to persuade the mugger, and when that fails (which I postulate based on an ethical entity able to comprehend 3^^^3 and an unethical entity willing to torture that many) to initiate conflict.

The calculational algebra of loss over probability has to be tempered by future prediction:

What is the chance the mugger will do this again? If my only options are to give 5 or not give 5 does it mean 3^^^^^3 will end up being at risk as the mugger keeps doing this? How do I make it stop?

The responsible long term anwser is: let the hostages die if needed to kill the terrorist, because otherwise you get more terrorists taking hostages.

A thought on this, and apologies if it repeats something already said here. Basically: question the structure that leads to someone saying this to you, and question how easy it is to talk about 3^^^^3 people as opposed to, say, 100. If suddenly said person manifests Magic Matrix God Powers (R) then the evidence gained by observing this or anything that contains it (they're telling the truth about all this, you have gone insane, aliens/God/Cthulhu is/are causing you to see this, this person really did just paint mile-high letters in the sky and there is no Matrix) should be more than enough to tip the balance of evidence in favor of "yeah, holy crap, this person deserves my $5!" In short - don't even take seriously your model of an agent as being truthful or untruthful; it could be outside-context-problem wrong, especially if it behaves pathologically, being overly sensitive to small amounts of evidence for bad reasons. Similar idea to your whole model possibly just being wrong if it reports 1-1/3^^^3 certainty.

I would protest that a program to run our known laws of physics (which only predict that 10^80 *atoms* exist...so there's no way 3^^^^3 *distinct minds* could exist) is smaller by some number of bits on the order of log_2(3^^^^3) than one in which I am seemingly running on the known laws of physics, and my choice whether or not to hand over $5 dollars (to someone acting as if they are running on the known laws of physics...seeming generally human, and trying to gain wealth without doing hard work) is *positively correlated* with whether or not 3^^^^3 minds running on a non-physical server somewhere (yes, this situation requires the postulation of the supernatural) are extinguished. The mugger is claiming to be God in the flesh. This problem is the steel-manned Pascal's Wager.

And you should *not* give the mugger the money, for the same reason you should not become a Christian because of Pascal's wager. Most of the (vast,vast) improbability lies in the mugger's claim to be God. The truth of eir claim is about as probable as the truth of an idea that popped into a psychopath's head that e needs to set eir wallet on fire *this instant,* in order to please the gods, or else they will torture em for 3^^^^3 years. I claim that Eliezer vastly, vastly underestimated the complexity of this situation...which is already enough to solve the problem. But, suppose, just suppose, this mugger *really is* God. Then which God is e? For every possible mind that does X in situation A, there is a mind that does not-X in situation A. We don't interact with gods on a day-to-day basis; we interact with humans. We have no prior information about what this mugger will do in this situation if we give em the money. E could be the "Professor Mugger" that punishes you iff you give em the money, because you acted in a way such that any person off the street could have come up to you and taken your money. E could just not do anything either way. *You don't know.* Ignorance prior, .5. I have no effect in this situation; I do $5 better in the (extremely, extremely, more likely) situation where this mugger is a con artist. No money for the mugger. There's also the slight issue that I'm more likely to be mugged this way if I precommit to losing my money...so I'd be pretty stupid to do that.

The AI will shut up and multiply if it's programmed properly, and get the right answer, my answer, the answer that also happens to be the one we instinctually lean towards here. If we're running a human-friendly dynamic, there's no need to worry about the AI making the wrong choice. Do we seriously think *we* could do better? If so, then why create the AI in the first place?

(Edit: A lot of this comment is wrong. See my reply to hairyfigment.)

is smaller by some number of bits on the order of log_2(3^^^^3)

How do you figure? Since we're not talking about speed, the program seems to this layman like one a super-intelligence could write while still on Earth (perhaps with difficulty). While the number you just named, and even the range of numbers if I take it that way, looks larger than the number of atoms on Earth. The whole point is that you can describe "3^^^^3" pretty simply by comparison to the actual number.

For every possible mind that does X in situation A, there is a mind that does not-X in situation A.

And some are more likely than others (in any *coherent* way of representing what we know and don't know), often by slight amounts that matter when you multiply them by 3^^^3. Anyway, the problem we face is not the original Mugger. The problem is that expected value for any given decision may not converge if we have to think this way!

The whole point is that you can describe 3^^^^3 pretty simply

In retrospect, I think Eliezer should not have focused on that as much as he did. Let's cut to the core of the issue: How should an AI handle the problem of making choices, which, maybe, just maybe, could have a huge, huge effect?

I think Eliezer overlooked the complexity inherent in a *mind*...the complexity of the situation isn't in the number; it's in what the things being numbered are. To create 3^^^^3 *distinct,* *complex* things that would be valued by a posthuman would be an incredibly difficult, time-consuming task. Of course, at this moment, the AI doesn't care about doing that; it cares whether or not the universe is *already* running 3^^^^3 of these things. I do think a program to run these computations might be more complex than writing a program to simulate our physics, but stepping back, it would not have to be anywhere near log_2(3^^^^3) bits more complex. Really, really bad case of scope insensitivity on my part.

For every possible mind that does X in situation A, there is a mind that does not-X in situation A.

My first comment was wrong. That argument should have been the primary argument, and the other shouldn't have been in there, at all...but let's step back from Eliezer's *exact* given situation. This is a general problem which applies to, as far as I can see, pretty much *any* action an AI could take (see Tom_McCabe2's "QWERTYUIOP" remark).

Let's say the AI wants to save a drowning child. However, the universe happens to care about this single moment in time, and iff the AI saves the child, 3^^^^3 people will die instant instantly, and then the AI will be given information to verify that this has occurred with high probability. One of the simplest ways for the universe-program to implement this is:

If (AI saves child), then reset all bits in that constantly evolving 3^^^^3-entry long data structure over there to zero, send proof to AI. Else, proceed normally.

Note that this is magic. Magic is that which cannot be understood, that which correlates with no data other than itself. The code could just as easily be this:

If (AI saves child), then proceed normally. Else, reset all bits in that constantly evolving 3^^^^3-entry long data structure over there to zero, send proof to AI.

Those two code segments are equally complicated. The AI shouldn't weight either higher than the other. For each small increment in complexity to the "malevolent" code you make from there, to have it carry out the same function, I contend that you can make a corresponding increment in the "benevolent" code to do the same thing.

If our universe was *optimized* to give us hope, and then thwart our values, there's nothing even an AI can do about that. An AI can only optimize that which it both understands, and is permitted to optimize by the universe's code. The universe's code could be such that it gives the AI false beliefs about pretty much everything, and then the AI would be unable to optimize anything.

If the "malevolent" code runs, then the AI would make a HUGE update after that, possibly choosing not to save any drowning children anymore (though that update would be wrong if the code *were* as above...overfitting). But it can't update on the possibility that it might update - that would violate conservation of expected evidence. All disease might magically immediately be cured if the AI saves the drowing child. I don't see how this is any more complex.

some are more likely than others

So, this is what I contest. If one was really that much more likely, the AI would have already known about it (cf. what Eliezer says in "Technical Explanation": "How would I explain the event of my left arm being replaced by a blue tentacle? The answer is that I wouldn't. It isn't going to happen....If I was worried I might someday need a clever excuse for waking up with a tentacle, the *reason I was nervous about the possibility* would *be* my explanation."). An AI is designed to accomplish this task as best as is possible. I noticed my confusion when I recalled this paper referring to AIXI I'd previously taken a short look at. The AI won on Partially Observable Pacman; it did much better than I could ever hope to do (if I were given the data in the form of pure numerical reward signals, written down on paper). It didn't get stuck wondering whether it would lose 2,000,000 points when the most it had ever lost before was less than 100.

I know almost nothing about AI. I don't know the right way we should approximate AIXI, and modify it so that it knows *it* is a part of its environment. I do know enough about rationality from reading Less Wrong to know that we shouldn't shut it off *just* because it does something counterintuitive, if we *did* program it right. (And I hope to one day make both of the first two sentences in this paragraph false.)

Also note that, between Tegmark's multiverses and the Many-Worlds Hypothesis, there are many, many more than 10^80 atoms in existence. 10^80 atoms are only the number atoms we could ever see, assuming FTL is impossible.

Was Kant an analytic philosopher? I can't remember, but thinking in terms of your actions as being the standard for a "categorical imperative" followed by yourself in all situations as well as by all moral beings, the effect of giving the mugger the money is more than $5. If you give him the money once he'll be able to keep on demanding it from you as well as from other rationalists. Hence the effect will be not $5 but all of your (plural) money, a harm which might be in a significant enough ratio to the deaths of all those people to warrant not giving him the money.

I think we have assume that, although this sounds awfully like that quote about "a million deaths are a statistic", the cost of additional deaths decreases. I'm not really sure how to justify that though.

I think the answer to this question concerns the Kolmogorov complexity of various things, and the utility function as well. What is the Kolmogorov complexity of 3^^^3 simulated people? What is the complexity of the program to generate the simulated people? What is the complexity of the threat, that for each of these 3^^^3 people, this particular man is capable of killing each of them? What sort of prior probability do we assign to "this man is capable of simulating 3^^^3 people, killing each of them, and willing to do so for $5"?

Similarly, the utility function for this calculation needs to be defined. Utility is usually calculated with decreasing marginal returns, such that we usually are described as having scope insensitivity. Likewise, we attach disproportionately lower utility to things with small chances. We'd probably also have lower utility for these 3^^^3 simulated people on account of their being simulated, being generated by a program with very low Kolmogorov complexity (ie, lack individuality and "realness"), and existing at the whim of a seemingly cruel and crazy person (meaning they're probably doomed anyways).

There's a few other things to consider, such as the probability that one of the 3^^^3 simulated people will be able to rescue his people, and the probability that someone else will come and threaten to kill 4^^^^4 people and demand enough money that I can't afford to part with the $5 for only 3^^^3 people.

Overall, I think the problem is poorly defined for the above reasons, but perhaps my main objection would be the attempt to use a formalism intended to reduce unnecessary complexity, as a guide to whether something is true.

Keep in mind that I have very limited knowledge of probability or analytic philosophy, but wouldn't a very easy answer be that if you can conceive of a scenario with the same outcome assigned to NOT doing the action, and that scenario has an equal probability to be true they're both irrelevant?

If it's possible that you can get an infinite amount of gain by believing in god, it's equally possible you can get an infinite amount of gain by NOT believing in god.

"Give me five dollars, or I'll use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people."

If there's an arbitrarily small probability that this statement is true, there's an equal arbitrarily small probability of the same result if you do nothing.

So the probability of those deaths remains the same regardless of your actions. So the statement is meaningless.

Obviously this doesn't always apply.

If this overlooks something incredibly obvious then I'm just a stupid teenager and please be kind.

I have a very poor understanding of both probability and analytic philosophy so in the inevitable scenario where I'm completely wrong be kind.

But if you can conceive of a scenario where there's a probability that doing something will result in infinite gain, but you can also picture an equally probable scenario where doing NOTHING will result in equal gain, then don't they cancel each other out?

If there's a probability that believing in god will give you infinite gain, isn't there an equal probability that not believing in god will result in infinite gain?

So if the only merit to a scenario is that someone came up with it it can be countered with a contradicting scenario that someone came up with. There are an infinite amount of claims with no evidence to support them that all have a finitely small probability, but every single one of those claims has a contradicting claim with equal probability.

So only beliefs with evidence to support them should be considered, because only those beliefs don't have a contradicting belief with an equal probability.

So isn't pascals wager pretty stupid? If there's an infinite gain in believing in god there's also an infinite gain in not believing in god. The probability is equal to an infinite amount of contradicting probabilities, therefore non-existent.

Now please tell me how I'm wrong so I can stop having a false sense of accomplishment.

Now please tell me how I'm wrong

You aren't.

But if you can conceive of a scenario...

Pascal's Wager isn't about what your mind can possibly conceive, it's a bet about the way reality works.

If there's a probability that believing in god will give you infinite gain, isn't there an equal probability that not believing in god will result in infinite gain?

No. The whole point of Pascal's Wager is asymmetry. It posits that there are two possible states of the world: in one you can have eternal life, in the other you can not.

That said, Shield's question is not whether *according to Pascal's Wager* that symmetric probability exists.

If (as Shield suggests) "only beliefs with evidence to support them should be considered, because only those beliefs don't have a contradicting belief with an equal probability," then accepting the posited asymmetry without evidence is an error.

There is certainly evidence to support the existence of god (God, a god, gods, etc.) Most people around here don't find it convincing but billions of people around the globe do.

Perhaps the issue should be formulated in the form of the balance of evidence for the proposition A as compared to evidence for not-A. However this would lead you to probability-weighted outcomes and the usual mechanics which Pascal's Wager subverts by dropping an infinity into them.

All in all, the objection "But absence of God could symmetrically lead to eternal life as well" to Pascal's Wager doesn't look appealing to me.

There is certainly evidence to support the existence of god (God, a god, gods, etc.) Most people around here don't find it convincing but billions of people around the globe do.

We're not talking about the existence of god. You're forgetting the law of burdensome detail.

Pascals wager doesn't posit that God exists, it posits that God exists and he'll give us eternal joy if we believe in him.

The claim *god exists* has an above negligible probability, the claim *god will give you eternal joy, but only if you believe in him* has no absolutely no evidence to support it, and is therefore equal to the claim *god will give you eternal joy, but only if you don't believe in him.*

If a God exists, since he hasn't given us any indication of any of his characteristics (if you feel otherwise please argue), we have no evidence to indicate he'd do either.

Hell I find it more probable that an intelligent deity would reward us for concluding he didn't exist since that's by far the most probable version of reality as determined by the evidence at hand. He'd have to be malevolent to reward us for believing in him if this is the evidence he gives for his existence, and there isn't any evidence for that either. Maybe life is a test and you win if you realize that based on available evidence the existence of god isn't sufficiently likely to claim he exists *cough* sarcasm *cough*. This is of course assuming vaguely human motivation and values.

There isn't any evidence to indicate either, the point of pascals wager seems to be that a finitely small probability multiplied by an infinite gain is cause for motivation, but this is untrue if that claim is equally true to any made up contradicting claim.

The claim god exists has an above negligible probability, the claim god will give you eternal joy, but only if you believe in him has no absolutely no evidence to support it

I am not quite sure how do you reconcile the former and the latter parts of this sentence.

So you think there's some credible evidence for god's existence but absolutely none, zero, zilch, nada evidence for the claim that god can give you eternal life and that believing in him increases your chances of receiving it?

If a God exists, since he hasn't given us any indication of any of his characteristics (if you feel otherwise please argue)

Of course he did. There is a large volume of sacred literature in most cultures which deals precisely with characteristics of gods. A large chunk of it claims to be revelatory and have divine origin.

I am not quite sure how do you reconcile the former and the latter parts of this sentence.

I am not quite sure why I would have issue. Above negligible in this case means any probability above that of a completely random unfalsifiable hypothesis with no evidence to support it.

So you think there's some credible evidence for god's existence but absolutely none, zero, zilch, nada evidence for the claim that god can give you eternal life and that believing in him increases your chances of receiving it?

No, and there's perfectly valid evidence to believe he wants us to not believe in him. Of course that isn't actually any evidence of a reward or an afterlife, nor would evidence that he wants us to believe in him be.

The current evidence at hand only indicates that God doesn't care about whether or not we believe in his existence, as god is omnipotent and could just give us ACTUAL evidence to convince everyone of his existence, which doesn't exist.

Of course he did. There is a large volume of sacred literature in most cultures which deals precisely with characteristics of gods. A large chunk of it claims to be revelatory and have divine origin.

This isn't evidence. There's an equal probability of people writing these things in universes where there is no God and universes where there is a God. This is of course an estimation, we haven't seen what these texts look like in an universe with a god compared to religious texts in an universe without a god, nor the amount of them, so the texts we have don't actually indicate anything about the existence of god.

The absolutely only difference between a religious text and a random hypothesis with no evidence to support it is that a religious text is a random hypothesis with no evidence to support it that someone wrote down.

There isn't anything in these texts to imply divine origin. They're full of logical errors, scientific errors, and they contradict themselves internally and among each other.

And if say, the bible was actually of divine origins, as it is full of logical errors, contradictions and scientific errors it would only indicate that god doesn't want us to believe in him, which is what you're trying to prove in the first place.

You're falling into the atheist-arguing-with-believers mode.

The original issue was whether you have discovered a new failure mode in Pascal's Wager (besides a few well-known ones). My view on that remains unchanged.

You're falling into the atheist-arguing-with-believers mode.

I've only made arguments I think are correct in response to points that you made. If I have offended you, that was certainly not the intent and you can point to where you think I was rude.

But this is a theological argument. If you did not want to start a theological argument, then why did you start a theological argument?

What is your point?

The original issue was whether you have discovered a new failure mode in Pascal's Wager (besides a few well-known ones). My view on that remains unchanged.

"The original issue"? Were still talking about the same issue. Whether or not there's evidence to suggest that a god would do these things is an integral part of Pascals wager, *aka the thing we've been talking about for 5 posts*, and it's the only point you've made against my argument.

And in discussion it's customary to explain why your view hasn't changed. If my logic isn't incorrect, it is obviously correct, and it would be nice of you to explain why you think it isn't, instead of just offhandedly dismissing me without explanation.

I've only made arguments I think are correct in response to points that you made. If I have offended you, that was certainly not the intent and you can point to where you think I was rude.

It's not about offending people, and I doubt that Lumifer is actually offended.

It's just that there are certain scripted / cached modes of debate that we try to avoid on this site, because they don't actually aid in the pursuit of rationality.

TLDR: Lumifer is trying to help you become stronger. You stand to learn an important skill if you pay careful attention.

Not once in my life have I had these debates (no, not exaggerating) and I find it a strange assumption that I have. Don't spend an immense amount of time on these sort of forums ya' see.

If this sort of debate is truly so scripted could you point me to one? Since I'd gain an equal amount of information, apparently.

I do actually want to know what the apparently so common christian reply to these arguments is, it's sort of why I asked. I'm here to get information, not to be told that the information has already been given. This fact doesn't really help me.

I do actually want to know what the apparently so common christian reply to these arguments is

Find a smart Christian and talk to her.

You could also think about what is *evidence* and what is ideas in your mind about what God (according to your convenient definition of him) must do or cannot do. There's a big difference. You might consider meme propagation and ruminate on why certain written down "random hypotheses" become religions and take over the world, while others don't. Oh, and speculations about the probabilities of things happening in universes with gods and universes without gods are neither facts nor arguments.

Find a smart Christian and talk to her.

I don't think Pascal's wager is part of any form of mainstream Christian theology.

You might consider meme propagation and ruminate on why certain written down "random hypotheses" become religions and take over the world, while others don't.

I suggest this book: Religion Explained

I don't think Pascal's wager is part of any form of mainstream Christian theology.

That was an answer to "I do actually want to know what the apparently so common christian reply to these arguments is".

If I have offended you

I am not offended at all. The meaning of the sentence was that the argument started to follow well-worn railroad tracks.

it would be nice of you to explain why you think it isn't

I find your arguments unconvincing. I also don't have the inclination to get into a discussion of the Indifferent God approach which, again, is trampled ground.

This isn't evidence. There's an equal probability of people writing these things in universes where there is no God and universes where there is a God. This is of course an estimation, we haven't seen what these texts look like in an universe with a god compared to religious texts in an universe without a god, nor the amount of them, so the texts we have don't actually indicate anything about the existence of god.

Well, I'd expect more texts in a universe with a God. Where on earth are you getting this "equal probability"?

I would instead break it down into the claim that some Force could theoretically give us eternal bliss or suffering (A), and the further set of complicated claims involved in Pascal's brand of Christianity.

Conditional on A: the further claim that religion would prevent us from using the Force in the way we'd prefer seems vastly more plausible to me, based on the evidence, than Pascal's alternative. And there are various other possibilities we'd have to consider. I don't believe the Wager style of argument works, for the reasons given or alluded to in the OP -- but if it worked I believe it would argue for atheism.

There is certainly evidence to support the existence of god

You'll have to be more specific. What type of evidence are you referring to?

**[deleted]**· 2014-01-30T16:44:14.095Z · score: 1 (1 votes) · LW · GW

Hmmm...

My problem with this scenario is that I've never run Solomonoff Induction, I run evidentialism. Meaning: if a hypothesis's probability is equal to its True Prior, I just treat that as equivalent to "quantum foam", something that exists in my mathematics for ease of future calculations but has no real tie to physical reality, and is therefore dismissed as equivalent to probability 0.0.

Basically, my brain can reason about plausibility in terms of pure priors, but probability requires at least some tiny bit of evidence one way or the other. In fact, even a *very plausible* hypothesis, in terms of being so simple that its Solomonoff Prior is, say, 0.75, would make my brain throw a type-error if I tried to bet on it. My priors don't tell me anything about reality, they're *only* a feature of my mind, they just tell me the starting point for running evidential updates that *do* correlate with reality.

But even agents based on Solomonoff Induction, such as AIXI, are not subject to Pascal's Mugging in reasonable environments.

Consider this paper by Hutter. IIUC, the "self-optimizing" property essentially implies robustness against Pascal's Mugging.

**[deleted]**· 2014-01-30T18:39:31.202Z · score: 0 (0 votes) · LW · GW

Blurgh. That's one of the most symbol-dense papers I've ever seen.

But from what I can tell, it specifies that AIXI will *eventually* reason its way out of any finite Pascal's Mugging. The higher the hypothesized reward in the Mugging, the longer it will take to converge away from the Mugging, but each failure of reality to conform to the Mugging will push down the probability of that environment being true and thus reduce the expected value of acting according to it. Asymptotic convergence is proven.

I'd also bet that Hutter's formalism might consider large rewards generated for little reason to be not merely complex because of "little reason", but actually to have greater Kolmogorov Complexity just because large rewards are more complex than small ones. Possibly. So there would be a question of whether the reward of a Mugging grows faster than its probability shrinks, in the limit. Eliezer claims it does, but the question is whether our convenient notations for very, very, *very* large numbers actually imply some kind of simplicity for those numbers or whether we're hiding complexity in our brains at that point.

It seems downright obvious that "3" ought be considered vastly more simple than "3 ^^^ 3". How large a Turing Machine does it take to write down the fullest expansion of the recursive function for that super-exponentiation? Do we have to expand it out? I would think a computational theory of induction ought make a distinction between *computations outputting large numbers* and *actual large numbers*, after all.

Blurgh. That's one of the most symbol-dense papers I've ever seen.

I know. It seems that this is Hutter's usual style.

But from what I can tell, it specifies that AIXI will eventually reason its way out of any finite Pascal's Mugging. The higher the hypothesized reward in the Mugging, the longer it will take to converge away from the Mugging, but each failure of reality to conform to the Mugging will push down the probability of that environment being true and thus reduce the expected value of acting according to it. Asymptotic convergence is proven.

Well, the Pascal's Mugging issue essentially boils down to whether the agent decision making is dominated by the bias in its prior.

Clearly, an agent that has seen little or no sensory input can't have possibly learned anything, and is therefore dominated by its bias. What Hutter proved is that, for reasonable classes of environments, the agent eventually overcomes its bias. There is of course the interesting question of convergence speed, which is not addressed in that paper.

I'd also bet that Hutter's formalism might consider large rewards generated for little reason to be not merely complex because of "little reason", but actually to have greater Kolmogorov Complexity just because large rewards are more complex than small ones. Possibly. So there would be a question of whether the reward of a Mugging grows faster than its probability shrinks, in the limit. Eliezer claims it does, but the question is whether our convenient notations for very, very, very large numbers actually imply some kind of simplicity for those numbers or whether we're hiding complexity in our brains at that point.

Note that in Hutter's formalism rewards are bounded between 0 and some r_max. That's no accident, since if you allow unbounded rewards, the expectation can diverge.

Yudkowsky seems to assume unbounded rewards. I think that if you tried to formalize his argument, you would end up attempting to compare infinities.

If rewards are bounded, the bias introduced by the fact that the contributions to the expectations from the tail of the distributions don't exactly cancel out over different actions is eventually washed away as more evidence accumulates.

It seems downright obvious that "3" ought be considered vastly more simple than "3 ^^^ 3". How large a Turing Machine does it take to write down the fullest expansion of the recursive function for that super-exponentiation?

It's not really very large.

The point is that there are computable functions that grow faster than exponential. The Solomonoff prior over natural numbers (or any set of computable numbers with an infimum and and not a supremum) has infinite expectation because of the contribution of these functions.

(If the set has neither an infimum nor a supremum, I think that the expectation may be finite, positively infinite or negatively infinite depending on the choice of the universal Turing machine and the number encoding)

**[deleted]**· 2014-01-30T20:52:30.157Z · score: -1 (1 votes) · LW · GW

I've been discussing this whole thing on Reddit, in parallel, and I think this is the point where I would just give up and say: revert to evidentialism when discussing unbounded potential rewards. Any hypothesis with a plausibility (ie: my quantity of belief equals its prior, no evidence accumulated) rather than a probability (ie: priors plus evidence) nulls out to zero and is not allowed to contribute to expected-utility calculations.

(Actually, what does Bayesian reasoning look like if you separate priors from evidence and consider an empty set of evidence to contribute a multiplier of 0.0, thus exactly nulling out all theories that consist of no evidence but their priors?)

**[deleted]**· 2014-01-30T19:23:12.192Z · score: 0 (0 votes) · LW · GW

Ah! I've got it! I think... At least on the probability side.

The problem is our intuition: the utility of human lives grows linearly with the population of humans, while the message size of the hypothesis needed to describe them grows roughly logarithmically. Since we think we climb down the exponential decline in prior probability slower than utility increases, Pascal's Bargain sounds favorable.

This is wrong. A *real* Solomonoff Hypothesis, after all, does not merely say "There are 3^^^3 humans" if there really are 3 ^^^ 3 humans. It describes and predicts each single human, after all, in detail. It's a hypothesis that aims to predict an entire universe from one small piece of fairy cake.

And when you have to make predictive descriptions of humans, the summed size of those descriptions will grow *at least* linearly in the number of purported humans. God help you if they start organizing themselves into complicated societies, which are *more* complex than a mere summed set of single persons. Now your utility from taking the Bargain grows linearly while your negative exponent on the probability of its being real declines linearly.

The question then becomes a simple matter of where your plausability versus utility tradeoff sits for *one* human life.

Or in other words, if Pascal's Mugger catches you in a back ally, you should demand that he start describing the people he's threatening one-by-one.

Or in other words, if Pascal's Mugger catches you in a back ally, you should demand that he start describing the people he's threatening one-by-one.

By this reasoning, if someone has their finger on a trigger for a nuclear bomb that will destroy a city of a million people, and says "give me $1 or I will pick a random number and destroy the city at a 1/1000000 chance", you should refuse. After all, he cannot describe any of the people he is planning to kill, so this is equivalent to killing one person at a 1/1000000 chance, and you could probably exceed that chance of killing someone just by driving a couple of miles to the supermarket.

**[deleted]**· 2014-01-30T20:47:23.859Z · score: 0 (0 votes) · LW · GW

If it's a million people possibly dying at a one-in-a-million chance, then in expected-death terms he's charging me $1 not to kill one person. Since I believe human lives are worth more than $1, I should give him the dollar and make a "profit".

Of course, the *other* issue here, and the reason we don't make analogies between serious military threats and Pascal's Mugging, is that in the nuclear bomb case, *there is actual evidence on which to update my beliefs*. For instance, the button he's got his finger on is either real, or not. If I can see damn well that it's a plastic toy from the local store, I've no reason to give him a dollar.

So in the case of Soviet Russia threatening you, you've got real evidence that they might *deliberately nuke your cities* with a probability much higher than one in a million. In the case of Pascal's Mugger, you've got a 1/1,000,000 chance that the mugger is telling the truth at all, and all the other probability mass points at the mugger being a delusion of your badly-coded reasoning algorithms.

If it's a million people possibly dying at a one-in-a-million chance, *and I use the reasoning you used before*, because the mugger can't describe the people he's threatening to kill, I shouldn't treat that as any worse than a threat to kill one person at a one-in-a-million chance.

**[deleted]**· 2014-01-30T22:10:33.842Z · score: 1 (1 votes) · LW · GW

You misconstrue my position. I'm not saying, "Descriptions are magic!". I'm saying: I prefer evidentialism to pure Bayesianism. Meaning: if the mugger *can't* describe anything about the city under threat, that is *evidence* that he is lying.

Which *misses the point of the scenario*, since a *real* Pascal's Mugging is *not about a physical mugger who could ever be lying*. It's about having a flaw in your own reasoning system.

Why wait until someone wants the money? Shouldn't the AI try to send 5 Dollars to everyone with a note attached reading "Here is a tribute; please don't kill a huge number of people" regardless of whether they ask for it or not?

**[deleted]**· 2015-03-01T14:05:34.802Z · score: 1 (3 votes) · LW · GW

For the most part, when person P says, "I will do X," that is evidence that P will do X, and the probability of P doing X increases. Instead, if P has a reputation for sarcasm, and if P says the same thing, then the probability that P will do X decreases. Clearly, then, our estimation of P's position in mindspace determines weather we increase or decrease the likelihood of P's claims. For the mugging situation, we might adopt a model where the mugger's claims about very improbable actions in no way affect what we expect him to do since we do not have a useful estimate of the mugger's position in mindspace--how could we? We cannot assume the mugger tends to be more honest than not, as we can with humans. Expected utilities balance and cancel, so I should keep my wallet.

**[deleted]**· 2015-03-01T21:21:13.504Z · score: 3 (3 votes) · LW · GW

You're right about this. However the main problem we have here is this:

A compactly specified wager can grow in size much faster than it grows in complexity. The utility of a Turing machine can grow much faster than its prior probability shrinks.

If the expected utility grows proportional to -2^(2^n) but the prior probability decreases proportional to 2^(-n) in the complexity n (measured in Kolmogorov Complexity for all I care), then even if the information we get from their utterance does lead us to have a posterior different from the prior, the utility goes to -∞ for n going to ∞.

The fallacy here is that you're assuming the prior probability shrinks only due to complexity.

For instance, the probability could also shrink due to the fact that higher utilities are more useful to dishonest muggers than lower utilities.

**[deleted]**· 2015-03-02T18:14:21.875Z · score: 1 (1 votes) · LW · GW

Fair enough, even though I wouldn't call that my prior but rather my posterior after updating on my belief of what their expected utility might be.

So you propose that I update my probability to be proportional to the inverse of their expected utility? How do I even begin to guess their utility function if this is a one-shot interaction? How do I distinguish between honest and dishonest people?

**[deleted]**· 2015-03-04T14:49:56.294Z · score: -1 (1 votes) · LW · GW

Under ignorance of the mugger's position in mindspace, we then should assign the same probability to the mugger's claim and the claim's opposite. Then for all n, (n utilons) * Pr(mugger will cause n utilons) + (-n utilons) * Pr(mugger will cause -n utilons) = 0. This response seems to manage the rate difference between the utility and the probability.

**[deleted]**· 2015-03-04T19:46:48.395Z · score: 1 (1 votes) · LW · GW

The question is not only about their position in mindspace. Surely there may be as many possible minds (not just humans) which believe they can simulate 3^^^3 people and torture them as there those that do not believe so. But this does not mean that there are as many possible minds which actually could really do it. So I shouldn't use a maximum entropy prior for my belief in their ability to do it, but for my belief in their belief in their ability to do it!

This is one of those cases where it helps to be a human, because we're dumb enough that we can't possibly calculate the true probabilities involved, and so the expected utilities sum to zero in any reasonable approximation of the situation, *by human standards*.

Unfortunately, a superintelligent AI would be able to get a *much* better calculation out of something like this, and while a .0000000000000001 probability might round down to 0 for us lowly *humans*, an AI *wouldn't* round that down. (After all, why should it? Unlike us, *it* has no reason to doubt *its* capabilities for calculation.) And with enormous utilities like 3^^^^3, even a .0000000000000001 difference in probability is too much. The problem isn't with us, directly, but with the behavior a hypothetical AI agent might take. We certainly don't want our newly-built FAI to suddenly decide to devote all of humanity's resources to serving the first person who comes up with the bright idea of Pascal's Mugging it.

What about optimizing for median expected utility?

I think you are overestimating the probabilities there: it is only Pascal's Mugging if you fail to attribute a low enough probability to the mugger's claim. The problem, in my opinion, is not how to deal with tiny probabilities of vast utilities, but how not to attribute too high probabilities to events whose probabilities defy our brain's capacity (like "magic powers from outside the Matrix").

I also feel that, as with Pascal's wager, this situation can be mirrored (and therefore have the expected utilities canceled out) if you simply think "What if he intends to kill those people only if I abide by his demand ?". As with Pascal's wager, the possibilities aren't only what the wager stipulates: when dealing with infinites in decision making (I'm not sure one can say "the probability of this event doesn't overcome the vast utility gained" with such numbers) you probably have another infinite which you also can't evaluate hiding behind the question.

Tell me your thoughts.

I think you are overestimating the probabilities there: it is only Pascal's Mugging if you fail to attribute a low enough probability to the mugger's claim. The problem, in my opinion, is not how to deal with tiny probabilities of vast utilities, but how not to attribute too high probabilities to events whose probabilities defy our brain's capacity (like "magic powers from outside the Matrix").

The problem here is that you're not "attributing" a probability; you're *calculating* a probability through Solomonoff Induction. In this case, the probability is far too low to actually calculate, but simple observation tells us this much: the Solomonoff probability is given by the expression 2^(-Kolmogorov), which is *mere exponentiation*. There's pretty much no way mere exponentiation can catch up to four up-arrows in Knuth's up-arrow notation; therefore, it doesn't even really matter what the Kolmogorov complexity is, because there's no way it can be *nearly* as low as 3^^^^3 is high.

All would be well and good if we could simply assign probabilities to be whatever we want; then we could just set the probability of Pascal's-Mugging-type situations as low as we wanted. To an extent, since we're humans and thus unable to compute the actual probabilities, we still can do this. But paradoxically enough, as a mind's computational ability increases, so too does its susceptibility to these types of situations. An AI that is *actually* able to compute/approximate Solomonoff Induction would find that the probability is vastly outweighed by the utility gain, which is part of what makes the problem a problem.

I also feel that, as with Pascal's wager, this situation can be mirrored (and therefore have the expected utilities canceled out) if you simply think "What if he intends to kill those people only if I abide by his demand ?". As with Pascal's wager, the possibilities aren't only what the wager stipulates: when dealing with infinites in decision making (I'm not sure one can say "the probability of this event doesn't overcome the vast utility gained" with such numbers) you probably have another infinite which you also can't evaluate hiding behind the question.

But do the two possibilities *really* sum to zero? These are two different situations we're talking about here: "he kills them if I don't abide" versus "he kills them if I do". If a computationally powerful enough AI calculated the probabilities of these two possibilities, will they actually miraculously cancel out? The probabilities will likely *mostly* cancel, true, but even the smallest remainder will still be enough to trigger the monstrous utilities carried by a number like 3^^^^3. If an AI *actually* carries out the calculations, without any *a priori* desire that the probabilities should cancel, can you guarantee that they will? If not, then the problem persists.

Also, your remark on infinities in decision-making is well-taken, but I don't think it applies here. As large as 3^^^^3 is, it's nowhere *close* to infinity. As such, the sort of problems that infinite utilities pose, while interesting in their own right, aren't really relevant here.

You're only "calculating a probability through Solomonoff Induction" if the probability is only affected by complexity. If there are *other* reasons that could reduce the probability, they can reduce it by more. For instance, a lying mugger can increase his probability of being able to extort money from a naive rationalist by increasing the size of the purported payoff, so a large payoff is better evidence for a lying mugger than a small payoff.

Additional factors very well may reduce the probability. The question is whether they reduce it by *enough*. Given how enormously large 3^^^^3 is, I'm practically certain they won't. And even if you somehow manage to come up with a way to reduce the probability by enough, there's nothing stopping the mugger from simply adding another up-arrow to his claim: "Give me five dollars, or I'll torture and kill *3^^^^^3* people!" Then your probability reduction will be rendered pretty much irrelevant. And *then*, if you miraculously find a way to reduce the probability *again* to account for the enormous increase in utility, the mugger will simply add yet *another* up-arrow. So we see that ad hoc probability reductions don't work well here, because the mugger can always overcome those by making his number bigger; what's needed is a probability penalty that scales *with* the size of the mugger's claim: a penalty that can *always* reduce the expected utility of his offer down to ~0. Factors independent of the size of his claim, such as the probability that he's lying (since he could be lying no matter how big or how small his number actually is), are unlikely to accomplish this.

such as the probability that he's lying (since he could be lying no matter how big or how small his number actually is)

He could be lying regardless of the size of the number, but the *probability* that he is lying would still be affected by the size of the number. A larger number is more likely to convince a naive rationalist than a smaller number, precisely because believing the larger number means believing there is more utility. This makes larger numbers more beneficial to fake muggers than smaller numbers. So the larger the number, the lower the chance that the mugger is telling the truth. This means that changing the size of the number can decrease the probability of truth in a way that *keeps pace with the increase in utility* that being true would provide.

(Actually, there's an even more interesting factor that nobody ever brings up: even genuine muggers must have a distribution of numbers they are willing to use. This distribution must have a peak at a finite value, since it is impossible to have an even distribution over all numbers. If the fake mugger keeps adding arrows, he's going to go over this peak and a rationalist's estimate that he is telling the truth should go down because of that as well.)

Is this simply one statement ? Is Solomonoff complexity additive with multiple statements that must be true at once ? Or is it possible that we can calculate the probability as a chain of Solomonoff complexities, something like:

s1, s2 ... etc are the statements. You need all of them to be true: magic powers, matrix, etc. Are they simply considered as one statement with one Solomonoff complexity K = 2^(x) ? Or K1*K2*... = 2 ^ (x1 + x2 + ...) ? Or K1^K2^... = 2^(2^(2^...)) ?

And if it's considered as one statement, does simply calculating the probability with K1^K2^... solve the problem ?

Point taken on the summation of the possibilities, they might not sum to zero.

Also, does invoking "magic powers" equal invoking an infinite ? It basically says nothing except "I can do what I want"

You could argue that doing any action, such as accepting the wager, has a small but much larger than 1/3^^^3 chance of killing 3^^^3 people. You could argue that any action has a small but much larger than 1/3^^^3 chance of guaranteeing blissful immortality for 3^^^3 people. Therefore, declining the wager makes a lot more sense because no matter what you do you might have already doomed all those people.

How about : The logic of a system applies only within that system ?

Variants of this are common in all sorts of logical proofs, and it stands to reason that elements outside a system do not follow the rules of that system.

A construct assuming something out-of-universe acting in-universe just can't be consistent.

I think you're assuming that to give in to the mugging is the wrong answer in a one-shot game for a being that values all humans in existence equally, because it feels wrong to you, a being with a moral compass evolved in iterated multi-generational games.

Consider these possibilities, any one of which would create challenges for your reasoning:

1. **Giving in is the right answer in a one-shot game, but the wrong answer in an iterated game.** If you give in to the mugging, the outsider will keep mugging you and other rationalists until you're all broke, leaving the universe's future in the hands of "Marxians" and post-modernists.

2. Giving in is the right answer for a rational AI God, but **evolved beings** (under the Darwinian definition of "evolved") **can't value all member of their species equally.** They must value kin more than strangers. You would need a theory to explain why any being that evolved due to resource competition wouldn't consider killing a large number of very distantly-related members of its species to be a *good* thing.

3. You should interpret the conflict between your intuition, and your desire for a rational God, not as showing that you're reasoning badly because you're evolved, but that **you're reasoning badly by desiring a rational God bound by a static utility function.** This is complicated, so I'm gonna need more than one paragraph:

Intuitively, my argument boils down to applying the logic behind free markets, freedom of speech, and especially evolution, to the question of how to construct God's utility function. This will be vague, but I think you can fill in the blanks.

**Free-market economic theory** developed only after millenia during which everyone believed that top-down control was the best way of allocating resources. **Freedom of speech** developed only after millenia during which everyone believed that it was rational for everyone to try to suppress any speech they disagreed with. **Political liberalism** developed only after millenia during which everybody believed that the best way to reform society was to figure out what the *best* society would be like, then force that on everyone. **Evolution** was conceived of--well, originally about 2500 years ago, probably by Democritus, but it became popular only after millenia during which everyone believed that life could be created only by design.

**All of these developments came from empiricists**. Empiricism is one of the two opposing philosophical traditions of Western thought. It originated, as far as we know, with Democritus (about whom Plato reportedly said that he wished all his works to be burned--which they eventually were). It went through the Skeptics, the Stoics, Lucretius, nominalism, the use of numeric measurements (re-introduced to the West circa 1300), the Renaissance and Enlightenment, and eventually (with the addition of evolution, probability, statistics, and operationalized terms) created modern science.

A key principle of empiricism, on which John Stuart Mill explicitly based his defense of free speech, is that **we can never be certain**. If you read about the skeptics and stoics today, you'll read that they "believed nothing", but that was because, to their opponents, "believe" meant "know something with 100% certainty".

(The most-famous skeptic, Sextus Empiricus, was called "Empiricus" because he was of the **empirical** school of medicine, which taught learning from experience. Its opponent was the **rational** school of medicine, which used logic to interpret the dictums of the ancient authorities.)

**The opposing philosophical tradition, founded by Plato--is rationalism**. "Rational" does not mean "good thinking". It has a very specific meaning, and it is not a good way of thinking. It means reasoning about the physical world the same way Euclid constructed geometric proofs. No measurements, no irrational numbers, no observation of the world, no operationalized nominalist definitions, no calculus or differential equations, no testing of hypotheses--just armchair *a priori* logic about universal categories, based on a set of unquestionable axioms, done in your favorite human language. Rationalism is the *opposite* of science, which is empirical. *The pretense that "rational" means "right reasoning" is the greatest lie foisted on humanity by philosophers.*

**Dualist rationalism is inherently religious**, as it relies on some concept of "**spirit**", such as Plato's Forms, Augustine's God, Hegel's World Spirit, or an almighty programmer converting sense data into LISP symbols, to connect the inexact, ambiguous, changeable things of this world to the precise, unambiguous, unchanging, and usually unquantified terms in its logic.

(**Monist** rationalists, like Buddha, Parmenides, and post-modernists, believe sense data can't be divided unambiguously into categories, and thus we may not use categories. Modern empiricists categorize sense data using **statistics**.)

**Rationalists support strict, rigid, top-down planning and control.** This includes their opposition to free markets, free speech, gradual reform, and optimization and evolution in general. This is because rationalists believe they can prove things about the real world, and hence their conclusions are reliable, and they don't need to mess around with slow, gradual improvements or with testing. (Of course each rationalist believes that every *other* rationalist was wrong, and should probably be burned at the stake.)

They oppose all **randomness and disorder**, because it makes strict top-down control difficult, and threatens to introduce change, which can only be bad once you've found the truth.

They have to classify every physical thing in the world into a **discrete, structureless, atomic category**, for use in their logic. That has led inevitably to theories which **require all humans to ultimately have, at reflective equilibrium, the same values**--as Plato, Augustine, Marx, and CEV all do.

You have, I think, picked up some of these bad inclinations from rationalism. When you say you want to find the **"right" set of values** (via CEV) and encode them into an AI God, that's exactly like the rationalists who spent their lives trying to find the "right" way to live, and then suppress all other thoughts and enforce that "right way" on everyone, for all time. Whereas an empiricist would never claim to have found final truth, and would always leave room for new understandings and new developments.

Your objection to **randomness** is also typically rationalist. Randomness enables you to sample without bias. A rationalist believes he can achieve complete lack of bias; an empiricist believes that neither complete lack of bias nor complete randomness can be achieved, but that for a given amount of effort, you might achieve lower bias by working on your random number generator and using it to sample, than by hacking away at your biases.

So I don't think we should build an FAI God who has a static set of values. We should build, if anything, an AI referee, who tries only to keep conditions in the universe that will enable evolution to keep on producing behaviors, concepts, and creatures of greater and greater complexity. Randomness must not be eliminated, for without randomness we can have no true exploration, and must be ruled forever by the beliefs and biases of the past.

Your overall point is right and important but most of your specific historical claims here are false - more mythical than real.

Free-market economic theorydeveloped only after millenia during which everyone believed that top-down control was the best way of allocating resources.

Free market economic theory was developed during a period of rapid centralization of power, before which it was common sense that most resource allocation had to be done at the local level, letting peasants mostly alone to farm their own plots. To find a prior epoch of deliberate central resource management at scale you have to go back to the Bronze Age, with massive irrigation projects and other urban amenities built via palace economies, and even then there wasn't really an ideology of centralization. A few Greek city-states like Sparta had tightly regulated mores for the elites, but the famously oppressed Helots were still probably mostly left alone. In Russia, Communism was a massive centralizing force - which implies that peasants had mostly been left alone beforehand. Centralization is about states trying to become more powerful (which is why Smith called his book *The Wealth of Nations*, pitching his message to the people who needed to be persuaded.) Read Tocqueville's *The Old Regime* for more, focusing on centralization in France before and after the Revolution. *War and Peace* has a good empirical treatment of the modernizing/centralizing force vs the old-fashioned empirical impulse in Russia. "Freedom" is not always decentralizing, though, as the book makes clear.

Freedom of speechdeveloped only after millenia during which everyone believed that it was rational for everyone to try to suppress any speech they disagreed with.

There was something much like this in both the Athenian (and probably broader Greek) world (the democratic prerogative to publicly debate things), and the Israelite world (prophets normatively had something close to immunity from prosecution for speech, and there were no qualifications needed to prophesy). In both cases there were limits, but there are limits in our world too. The *ideology* of freedom of speech is new, but your characterization of the alternative is tendentious.

Political liberalismdeveloped only after millenia during which everybody believed that the best way to reform society was to figure out what thebestsociety would be like, then force that on everyone.

Political liberalism is not really an exception to this!

Evolutionwas conceived of--well, originally about 2500 years ago, probably by Democritus, but it became popular only after millenia during which everyone believed that life could be created only by design.

It's really unclear what past generations meant by God, but this one is probably right.