post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by DanielLC · 2012-03-30T16:32:38.256Z · LW(p) · GW(p)

Crucially this can’t be compressed due to the number 3^^^^3 being easily computable because we’re taking the fraction of possible current states that gets the desired output from M, not the smallest number of bits each state can be expressed in.

The fraction of states will diverge. The closest you're likely to get is to weight the programs by length. If you do so, the weighted portion will be slightly more than if you just weight the shortest ways of stating it.

Also, consider that you could take any possible program that works, and add "Also, 3^^^^3 people are killed" on top of it.

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-03-30T17:32:59.548Z · LW(p) · GW(p)

Thank-you for the feedback!

But would the number of states diverge?

Say we're talking about a target and we're predicting that at least one arrow will strike it in the next five minutes; in that case the space for five light-minutes around that target would have to be in one of a set of initial states that would result in there being at least one arrow in the target after five minutes.

Then if we predict that at least two arrows will strike in the next five minutes, then we're narrowing our set of states by only considering the ones were two or more arrows strike. This is necessarily lower than the set for one arrow as all the states in the one-arrow-set are in the two-arrow-set except for all the states that result in just one arrow striking.

Then again for three arrows and so on. Wouldn't this lead to a converging fraction of states?

I'm not sure how taking any possible program and adding "Also, 3^^^^3 people are killed" would effect this, could you elaborate at all?

Thanks again.

Replies from: DanielLC
comment by DanielLC · 2012-03-30T18:10:55.794Z · LW(p) · GW(p)

Wouldn't this lead to a converging fraction of states?

It wouldn't converge for each arrow. If you did the limit by program length, it might converge, but there's no obvious reason that there must be a certain order to count them in. If you can count them in any order, you can alternate between states where the arrow hits and states where it does not, and make it look like it has a 50% chance of hitting.

I'm not sure how taking any possible program and adding "Also, 3^^^^3 people are killed" would effect this, could you elaborate at all?

Given a state, you could add another non-interacting (or just interacting based on one guy's decision) universe where 3^^^^3 people die. This universe is constant complexity, and small complexity compared to 3^^^^3. Let's call the complexity of that part k.

Given a possible universe with complexity n, there is a possible universe where Pascal's mugger is telling the truth with complexity n+k. When you get to n+k, there are 2^k times as many possibilities to choose from, so it's 2^k times less likely. This is much more likely than the 1/3^^^^3 times less likely needed for Pascal's mugger to not be credible enough.

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-03-30T19:13:38.568Z · LW(p) · GW(p)

If you did the limit by program length, it might converge, but there's no obvious reason that there must be a certain order to count them in. If you can count them in any order, you can alternate between states where the arrow hits and states where it does not, and make it look like it has a 50% chance of hitting.

Well for any finite amount of time that you predict into the future you've also got a finite amount of space to consider, as anything too far away wouldn't be able to travel fast enough to effect the outcome of the thing being predicted about. Each state of the universe would really be the state of this finite area of space which would be expressed in binary. One way to order the states would be in terms of how large a binary number they form, from smallest to largest.

I'm not sure how making it look like the arrow had a 50% chance of hitting would make any difference to anything though?

Given a state, you could add another non-interacting (or just interacting based on one guy's decision) universe where 3^^^^3 people die. This universe is constant complexity, and small complexity compared to 3^^^^3. Let's call the complexity of that part k.

Given a state, you could also add another non-interacting (or just interacting based on one guy's decision) universe where 3^^^^3 people lives are saved. I don't know if this is the right terminology, but it seems to me that when you start adding extra possible universes on, their outcomes become causally decoupled from the original decision to give/not-give the mugger $5.

Replies from: DanielLC
comment by DanielLC · 2012-03-31T01:21:26.535Z · LW(p) · GW(p)

Well for any finite amount of time that you predict into the future you've also got a finite amount of space to consider

You have an entire universe to consider. You don't deal with just possible universes just began. You deal with all possible universes. There are simple universes that eventually come out to this, but no simple ones that start this way. Also, a limited speed of light is not guaranteed. As far as we can tell, it's limited, but it might not be.

I'm not sure how making it look like the arrow had a 50% chance of hitting would make any difference to anything though?

If you can always calculate it so it's 50%, along with any other probability, you're clearly doing something wrong. It should only calculate to one value.

I don't know if this is the right terminology, but it seems to me that when you start adding extra possible universes on, their outcomes become causally decoupled from the original decision to give/not-give the mugger $5.

You can also add on modified versions that are coupled, or things like that. It's a bit more complicated than I said, but there's still a good chance (as in more than 1/3^^^^3) that the mugger isn't bluffing.

comment by MileyCyrus · 2012-03-30T07:19:17.595Z · LW(p) · GW(p)

"So for every extra unit of disutility predicted the probability penalty due to not knowing enough about the current state of the universe becomes greater."

Sure, but the probability shrinks slower than the disutility rises. A scenario in which 1000 times 3^^3 people are tortured has more probability that the probability that 3^^3 people are tortured, divided by 1000. Or more formally:

[P(Mugger tortures 1000*3^^3 people)] > [P(Mugger tortures 3^^3 people)]/1000

Read about Solomonoff Induction to find out why this is true.

Replies from: Dmytry, Manfred, Arran_Stirton
comment by Dmytry · 2012-03-31T06:57:47.054Z · LW(p) · GW(p)

How's about that: the probabilities of torture of exact number of beings, got to sum to 1 or less?

comment by Manfred · 2012-03-30T12:32:50.021Z · LW(p) · GW(p)

A word of caution - Solomonoff induction applies to things like the laws of physics, not to all hypotheses. Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don't.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-03-30T15:40:11.305Z · LW(p) · GW(p)

Otherwise, if you flipped a coin 100 times, you would expect to see 100 heads much more often than average, and we don't.

If you flip a coin 15 times, this result:

HHHHHHHHHHHHHHH

is far more probable than this:

HTHTTHTHTTTHHTH

That's because some coins are rigged, and it's much easier to rig a coin to conform the first pattern than the second.

Replies from: Antisuji, philh, Manfred
comment by Antisuji · 2012-03-30T16:09:19.531Z · LW(p) · GW(p)

This is true, but doesn't explain why we're more surprised when we see the former than the latter.

Replies from: None
comment by [deleted] · 2012-03-30T16:21:07.749Z · LW(p) · GW(p)

we're more surprised when we see the former than the latter

I don't think this is actually true. If MileyCyrus successfully predicted the exact sequence of coinflips HTHTTHTHTTTHHTH, wouldn't you be more surprised than if it were HHHHHHHHHHHHHHH?

Replies from: Antisuji
comment by Antisuji · 2012-03-30T18:44:55.946Z · LW(p) · GW(p)

Of course. When I said "we're more surprised" I was referring to the typical person who hasn't read this discussion thread. In the absence of the above prediction, I would be far more surprised to see HHHHHHHHHHHHHHH than HTHTTHTHTTTHHTH. Once the prediction is made, I become extremely surprised if either sequence appears, but somewhat more surprised by HTHTTHTHTTTHHTH.

Replies from: None
comment by [deleted] · 2012-03-30T18:58:47.715Z · LW(p) · GW(p)

Oh, I see. In the case of the typical person, the answer is even easier: Lack of understanding of the conjunction rule of probability. HTHTTHTHTTTHHTH feels more representative of a random series of coin flips, so it is intuitively judged as more probable than HHHHHHHHHHHHHHH.

comment by philh · 2012-03-30T18:56:36.013Z · LW(p) · GW(p)

First reaction: I don't know about "far" more probable. What's the prior that a coin is rigged? I would have said less than 1/32768, but low confidence on that.

According to this, you can't rig a coin to do that, which increases my confidence.

But you can rig your tossing, even by mistake; if it lands heads, and you balance it to flip with heads up again, then it's slightly more likely to land heads. I remember hearing a figure of 51% for that; in which case H*15 has probability 1/24331 instead of 1/32768; about a third more probable. But that scenario (fifteen times) is itself unlikely... if we estimate P(next is heads | last was heads) = 0.505 (corresponding to keeping the same side up 3/4 of the time, I still feel that's an overestimate), we get 1/28204, 16% more likely.

If we switched to dice, I would agree that 666666666666666 is far more probable than 136112642345553.

comment by Manfred · 2012-03-30T16:24:39.278Z · LW(p) · GW(p)

I suppose that isn't all that unintuitive (though does this actually work if you start with a uniform prior over weights and do the math?). But does your intuitive model also predict the fact that HTHTHTHTHT is more probable than HTHHTHTHTT? :D

Replies from: Dmytry
comment by Dmytry · 2012-03-31T07:01:21.268Z · LW(p) · GW(p)

Well, it is the case that all the random sequences together have much larger probability than HHHHHHHHHHHH , and so we should expect the sequence to be one among the random sequences.

edit: interesting issue: suppose you assign some prior probability to each possible sequence. Upon seeing the actual sequence, with probability that your eyes deceived you 0.0001, how are you to update the probability of this particular sequence? Why would we assume sensory failure (or a biased coin) when we observe hundred heads, but not something random-looking? It should have to do with the sensory failure being much less likely for something random looking.

comment by Arran_Stirton · 2012-03-31T14:16:12.496Z · LW(p) · GW(p)

I'm treating the current state of the universe as a different thing entirely to the mugger's implied hypothesis about how the universe works. Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor, but in terms of predicting the shape of a magnetic field in a certain spot, that depends on the current state of the universe (at least the parts of the universe relevant to the equation).

Though if this is an invalid line of reasoning for some reason, please let me know, thanks.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-03-31T17:53:58.098Z · LW(p) · GW(p)

I have no idea where you're going with this.

Both a program simulating Maxwell’s equations would obviously win out over a program simulating Thor,

You use the word "both" but then refer to only one object. Did you forget to include something?

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-04-04T00:35:47.711Z · LW(p) · GW(p)

Sorry I'll try to clarify:

If you want to predict the exact state of a system five minutes into the future you need to know the current state of the system and the laws of that system. Call the current state s and the future state s', the laws of the system are simulated by the Turing machine L. Instead of knowing the state of the system, we only know its laws (or rather we take them as a given).

Then any prediction we make about the future state of the system will restrict the range of value for s' that will validate our prediction. The more specific we are about s' the smaller the range of values it can be. In turn this restricts the range of possible values for s (as L(s) = s') that will give s'.

Because we have no information about the current state of the system all possible states are equally likely, and as such the probability that the system will end up in a particular range of s' is the same as the fraction of s (out of all possible s) that will map there.

This is not in relation to any hypothesis about the laws of the system, but instead the current state of the system. I hope this makes my original argument make more sense. If not I'm sorry; please highlight to me where my explanation is going wrong.

comment by [deleted] · 2012-04-04T03:29:55.033Z · LW(p) · GW(p)

Are Gödel's Incompleteness Theorems applicable to Decision Theory? If so then maybe you just can't get the correct answer to Pascal's Mugging out of the system formally.

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-04-05T08:08:28.206Z · LW(p) · GW(p)

How would you justify that?

comment by Dmytry · 2012-03-30T14:55:27.739Z · LW(p) · GW(p)

Well, few things to note:

  • Making up reasons against pascal's mugging based on 'it must be wrong' feeling sounds awful lot like rationalization. One got to stick to really solid logic; only in mathematics can you believe rather strongly in a conjecture, be motivated to prove it, and then make a valid proof.

  • One man's decision affecting 3^^^^3 people, got to be very rare occurrence to find yourself in; you're much more likely to be among those 3^^^^3 . You got to adjust your prior for that. This should be enough for not giving $5

  • Other issue is that it is a hostage situation, and even in normal human hostage situations, whenever you should, or should not give the money to the hostage holder, depends solely to whenever you have higher probability that the hostages will be killed (or tortured) if money are given, than if money are not given. Without further information about people whom hold 3^^^^3 beings hostage for $5, you can not make any prediction - the expected effect of giving $5 on the suffering of 3^^^^3 beings is 3^^^^3 * 0 = 0, and thus the expected utility of giving $5 is equal to expected utility of not giving $5 , minus the utility of having $5 in your hands rather than in mugger's hands. It does not matter how many up arrows the mugger stacks; it may well be that on average, giving money gets hostages killed in the situation when the kidnapper is this psychopathic. Then one may estimate that giving the money has immense dis-utility. Caveat: one can imagine the inconvenient world where psychopaths hold their words and release hostages when demands are met.

Replies from: jhuffman, Arran_Stirton
comment by jhuffman · 2012-03-30T20:53:55.503Z · LW(p) · GW(p)

Other issue is that it is a hostage situation, and even in normal human hostage situations, whenever you should, or should not give the money to the hostage holder, depends solely to whenever you have higher probability that the hostages will be killed (or tortured) if money are given, than if money are not given.

Actually in real life, we also consider the future consequences of unspecified potential future hostage takers who may be motivated to take hostages if they see a hostage taker paid off. This is ostensibly why the USG will not (directly) pay off a hostage taker.

Also, we have to consider the value of the money, and our next best alternative to saving a hostages life. For example, if Dr. Evil is holding a hostage (doesn't matter who) for $1B, and you know we will not catch him if you pay him off, then you should probably just let him execute the hostage and use the money to buy food for a few thousand starving people somewhere who are just as desperate.

Replies from: Dmytry
comment by Dmytry · 2012-03-30T21:07:07.315Z · LW(p) · GW(p)

Yep. Well, those aspects of it are not so relevant to the 3^^^3 case as they don't scale with N

comment by Arran_Stirton · 2012-03-30T18:33:49.416Z · LW(p) · GW(p)

Thanks for the feedback!

  • I sincerely hope that I'm not making up reasons against Pascal’s mugging based on feeling "it must be wrong". Can't help but agree though on the requirement for mathematics. I've done my best here to keep things as clean and logical as possible, though if I've lapsed somewhere I can't tell; would you mind pointing me to it?

  • Yes, I believe that was the solution put forth by Robin Hanson. It seems to be overly specific, the same calculations should apply if it were a coin flip effecting the lives of 3^^^^3 rather than a person. That's part of what motivated me to come up with this, I just wasn't satisfied with the current answer.

Replies from: Dmytry
comment by Dmytry · 2012-03-30T18:37:54.442Z · LW(p) · GW(p)

1: Well, part of the issue is that feelings can very well be right. The feeling is that the claim is too outrageous, that's a genuine thing but it is too hard to pin any probabilities onto.

One would think that the probability should fall off with the outrageousness of the claim, super-linearly. I.e. suppose that no claim is made; you are to give, or not to give, $5, to a random person who have not claimed that $5 will save 3^^^3 people. It is clear enough that the probability of this $5 saving 3^^^3 people got to be very small then, and would fall off with the claimed number, it's reasonable that it would fall off super-linearly. Then, the person making that claim is just a piece of evidence that can't boost the prior probability by whole lot; see the posts here on Bayesian statistics. Indeed, if one is to give to mugger $5, one should give $5 to people who didn't even ask for money.

Actually, to think about it, i might've just nailed it and also nailed the problem with using probabilistic reasoning in practice. You can easily pick some random hypothesis out of enormously huge space, which gives it very small prior, but then you forget about this enormous space.

2: I don't see how it's overly specific. If we consider (coin or a person), one randomly chosen (coin or a person) affecting 3^^^^3 (coin or a person) is unlikely. Still, the explanation is indeed somewhat problematic.

Replies from: CarlShulman, Arran_Stirton
comment by CarlShulman · 2012-03-30T19:31:48.541Z · LW(p) · GW(p)

Actually, to think about it, i might've just nailed it and also nailed the problem with using probabilistic reasoning in practice. You can easily pick some random hypothesis out of enormously huge space, which gives it very small prior, but then you forget about this enormous space.

You might like to read this post, "Privileging the hypothesis."

2: I don't see how it's overly specific. If we consider (coin or a person), one randomly chosen (coin or a person) affecting 3^^^^3 (coin or a person) is unlikely. Still, the explanation is indeed somewhat problematic.

It assumes a particular account of anthropic reasoning with infinite certainty. If you get your anthropic hypotheses out of something like Solmonoff induction (the programs best approximating our sense inputs can be thought of as a combination of a simulation of our world plus a bit of code that acts as an "anthropic theory" and reads out part of the simulation as our sense inputs), then things like SIA, SSA, and "you're more likely to be a given person if they have more causal influence" are not radically different in complexity. So you get Pascal's mugging from the combination of 1) laws of physics allowing vast quantities of computation and 2) some kind of anthropic theory that makes it unlikely you are one of the mass of simulations.

Replies from: Dmytry
comment by Dmytry · 2012-03-30T20:32:40.448Z · LW(p) · GW(p)

Hypothesis: Yea. The problem is that apart from the trivial cases having to do with clearly made up nonsense, it is very difficult to track how much the hypothesis got 'cherrypicked', as the process of choosing a hypothesis, when not entirely insane, should increase probability of it being true over the hypotheses that this process did not pick up.

Anthropic reasoning: I agree its kind of flimsy. On second thought I don't like this argument too much.

comment by Arran_Stirton · 2012-03-30T19:08:59.652Z · LW(p) · GW(p)

1: Well my reasoning is that the more people the mugger threatens to kill, the less likely his claims are to be true. In the same way that if I were to claim that a row of 3^^^^3 coins would all turn up heads; it would be far less likely to come true than if I predicted two coins would come up heads. At least that's what I'm trying to get across in this post.

2: It seems overly specific to me because it seems like a bit too much of a hack, if you get my meaning?

Replies from: CarlShulman
comment by CarlShulman · 2012-03-30T19:45:51.906Z · LW(p) · GW(p)

As you see more and more heads, you become increasingly convinced the coins are biased. What's the bias? With what probability p will a given flip come up heads? At the start you assign some mass to p=1, and some to lesser biases. After 10^1000 heads you can basically ignore the possibility that the coins are fair, and most of the weight you might have initially placed on a minor bias. Going from 10^1000 to 3^^^^3 coins, you will get to clobber hypotheses like "p=1-10^2000", but you will get no evidence whatsoever against "p=1". So as long as you assigned any non-infinitesimal, non-gerrymandered credence to p=1 at the start, longer sequences can't get probabilities approaching zero.

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-03-30T20:09:33.813Z · LW(p) · GW(p)

True, but that is as you see more heads. You can't actually update your value for p based on evidence you haven't seen yet, longer sequences would still have probabilities approaching zero.

Can someone let me know why this has negative votes please? Thanks.

Replies from: Dmytry
comment by Dmytry · 2012-03-31T06:52:34.258Z · LW(p) · GW(p)

Can someone let me know why this has negative votes please? Thanks.

Because its likes/dislikes not votes. The number of dislikes is greater than number of likes by 1. That being said, as the estimate of bias in coin increases, so does your likehood of future throws being HHHHH. Not sure I understand what is your point.

Replies from: Arran_Stirton
comment by Arran_Stirton · 2012-03-31T13:11:11.813Z · LW(p) · GW(p)

Hover over the thumbs-up / thumbs-down icons, they say "Vote up" and "Vote down". Anyway I was wondering what it was that I'd said that was wrong and thus deserved to be voted down.

Yes, I agree. However what I was trying to point out is that if you start off with no evidence of the coin being biased then you estimate of the bias won’t increase before you start flipping coins.

By the same merits, your estimate of how likely it is the mugger will kill x number of people won’t change because every person he kills is evidence toward him killing them all successfully as you're making the prediction before he does anything. If you read the above comments I believe it makes sense in context.

Replies from: Dmytry
comment by Dmytry · 2012-03-31T13:22:08.989Z · LW(p) · GW(p)

Hover over the thumbs-up / thumbs-down icons, they say "Vote up" and "Vote down".

We have a saying in Russian, along the lines of ' the wall of a shed says [certain swearword common in graffiti, refers to a reproductive organ] but this body part is not present inside the shed ' . edit: anyhow, i kind of don't see anything wrong about what you said.