Median utility rather than mean?

stuart_armstrong

Median utility rather than mean?

post by Stuart_Armstrong · 2015-09-08T16:35:59.533Z · LW · GW · Legacy · 86 comments

  Why the median is like the mean
  Why the median is not like the mean
  For lack of a Cardinal...
None
86 comments

tl;dr A median maximiser will expect to win. A mean maximiser will win in expectation. As we face repeated problems of similar magnitude, both types take on the advantage of the other. However, the median maximiser will turn down Pascal's muggings, and can say sensible things about distributions without means.

Prompted by some questions from Kaj Sotala, I've been thinking about whether we should use the median rather than the mean when comparing the utility of actions and policies. To justify this, see the next two sections: why the median is like the mean, and why the median is not like the mean.

Why the median is like the mean

The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones.

It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.

So you've picked the policy π_m with the highest median value - note that to do this, you need only know an ordinal ranking of worlds, not their cardinal values. In what way is this like maximising expected utility? Essentially, the more options and choices you have - or could hypothetically have - the closer this policy must be to expected utility maximalisation.

Assume u is a utility function compatible with your ordinal ranking of the worlds. Then π_u = 'maximise the expectation of u' is also a policy choice. If we choose π_m, we get a distribution d_mu of possible values of u. Then E(u|π_m) is within the absolute deviation (using d_mu) of the median value of d_mu. This absolute deviation always exists for any distribution with an expectation, and is itself bounded by the standard deviation, if it exists.

Thus maximising the median is like maximising the mean, with an error depending on the standard deviation. You can see it as a risk averse utility maximising policy (I know, I know - risk aversion is supposed to go in defining the utility, not in maximising it. Read on!). And as we face more and more choices, the standard deviation will tend to fall relative to the mean, and the median will cluster closer and closer to the mean.

For instance, suppose we consider the choice of whether to buckle our seatbelt or not. Assume we don't want to die in a car accident that a seatbelt could prevent; assume further that the cost of buckling a seatbelt is trivial but real. To simplify, suppose we have an independent 1/Ω chance of death every time we're in a car, and that a seatbelt could prevent this, for some large Ω. Furthermore, we will be in a car a total of ρΩ, for ρ < 0.5. Now, it seems, the median recommends a ridiculous policy: never wear seatbelts. Then you pay no cost ever, and your chance of dying is less than 50%, so this has the top median.

And that is indeed a ridiculous result. But it's only possible because we look at seatbelts in isolation. Every day, we face choices that have small chances of killing us. We could look when crossing the street; smoke or not smoke cigarettes; choose not to walk close to the edge of tall buildings; choose not to provoke co-workers to fights; not run around blindfolded. I'm deliberately including 'stupid things no-one sensible would ever do', because they are choices, even if they are obvious ones. Let's gratuitously assume that all these choices also have a 1/Ω chance of killing you. When you collect together all the possible choices (obvious or not) that you make in your life, this will be ρ'Ω choice, for ρ' likely quite a lot bigger than 1.

Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident). Now median-maximisation will recommend taking safety precautions for roughly (ρ'-0.5)Ω of these choices. This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.

But now add serious injury to the mix (still assume the costs are incommensurable). This has a rather larger probability, and the median maximiser will now only accept a 49.99999...% chance of serious injury. Or add light injury - now they only accept a 49.99999...% chance of light injury. If light injuries are additive - two injuries are worse than one - then the median maximiser becomes even more reluctant to take risks. We can now relax the assumption of incommensurablility as well; the set of policies and assessments becomes even more complicated, and the median maximiser moves closer to the mean maximiser.

The same phenomena tends to happen when we add lotteries of decisions, chained decisions (decisions that depend on other decisions), and so on. Existential risks are interesting examples: from the selfish point of view, existential risks are just other things that can kills us - and not the most unlikely ones, either. So the median maximiser will be willing to pay a trivial cost to avoid an xrisk. Will a large group of median maximisers be willing to collectively pay a large cost to avoid an xrisk? That gets into superrationality, which I haven't considered yet in this context.

But let's turn back to the mystical utility function that we are trying to maximise. It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so. Since we should, people on this list tend to often assume that we actually have one, skipping over the process of constructing it. But how would that process go? Let's assume we've managed to make our preferences transitive, already a major good achievement. How should we go about making them independent as well? We can do so as we go along. But if we do it ahead of time, chances are that we will be comparing hypothetical situations ("Do I like chocolate twice as much as sex? What would I think of a 50% chance of chocolate vs guaranteed sex? Well, it depends on the situation...") and thus construct a utility function. This is where we have to make decisions about very obscure and unintuitive hypothetical tradeoffs, and find a way to fold all our risk aversion/risk love into the utility.

When median maximising, we do exactly the same thing, except we constrain ourselves to choices that are actually likely to happen to us. We don't need a full ranking of all possible lotteries and choices; we just need enough to decide in the situations we are likely to face. You could consider this a form of moral learning (or preference learning). From our choices in different situations (real or possible), we decide what our preferences are in these situations, and this determines our preferences overall.

Why the median is not like the mean

Ok, so the previous paragraph argues that median maximising, if you have enough choices, functions like a clunky version of expected utility maximising. So what's the point?

The point is those situations that are not faced sufficiently often, or that have extreme characteristics. A median maximiser will reject Pascal's mugging, for instance, without any need for extra machinery (though they will accept Pascal's muggings if they face enough independent muggings, which is what we want - for stupidly large values of "enough"). They cope fine with distributions that have no means - such as the Cauchy distribution or a utility version of the St Petersburg paradox. They don't fall into paradox when facing choices with infinite (but ordered) rewards.

In a sense, median maximalisation is like expected utility maximalisation for common choices, but is different for exceptionally unlikely or high impact choices. Or, from the opposite perspective, expected utility maximising gives high probability of good outcomes for common choices, but not for exceptionally unlikely or high impact choices.

Another feature of the general idea (which might be seen as either a plus or a minus) is that it can get around some issues with total utilitarianism and similar ethical systems (such as the repugnant conclusion). What do I mean by this? Well, because the idea is that only choices that we actually expect to make matter, we can say, for instance, that we'd prefer a small ultra happy population to a huge barely-happy one. And if this is the only choice we make, we need not fear any paradoxes: we might get hypothetical paradoxes, just not actual ones. I won't put too much insistence on this point, I just thought it was an interesting observation.

For lack of a Cardinal...

Now, the main issue is that we might feel that there are certain rare choices that are just really bad or really good. And we might come to this conclusion by rational reasoning, rather than by experience, so this will not show up in the median. In these cases, it feels like we might want to force some kind of artificial cardinal order on the worlds, to make the median maximiser realise that certain rare events must be considered beyond their simple ordinal ranking.

In this case, maybe we could artificially add some hypothetical choices to our system, making us address these questions more than we actually would, and thus drawing them closer to the mean maximising situation. But there may be other, better ways of doing this.

Anyway, that's my first pass at constructing a median maximising system. Comments and critics welcome!

EDIT: We can use the absolute deviation (technically, the mean absolute deviation around the mean) to bound the distance between median and mean. This itself is bounded by the standard deviation, if it exists.

86 comments

Comments sorted by top scores.

comment by Lumifer · 2015-09-08T17:17:23.316Z · LW(p) · GW(p)

Then E(u|πm) is within one standard deviation (using dmu) of the median value of dmu.

As the Wikipedia says, "If the distribution has finite variance". That's not necessarily a good assumption.

Consider a policy with three possible outcomes: one pony; two ponies; the universe is converted to paperclips. What's the median outcome? One pony. Don't you want a pony?

The median is a robust estimator meaning that it's harder for outliers to screw you up. The price for that, though, is indifference to the outliers which I am not sure is advisable in the utility context.

Replies from: Stuart_Armstrong, V_V

↑ comment by Stuart_Armstrong · 2015-09-08T17:21:42.542Z · LW(p) · GW(p)

Indeed. But the argument about convergence when you get more and more options still applies.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-08T17:36:10.972Z · LW(p) · GW(p)

Still -- only is you true underlying distribution has finite variance. Check some plots of, say, a Cauchy distribution -- it doesn't take much of heavy tails to have no defined variance (or mean, for that matter).

Not everything converges to a Gaussian.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T09:19:41.297Z · LW(p) · GW(p)

You did notice that I mentioned the Cauchy distribution by name and link in the text, right?

And the Cauchy distribution is the worst possible example for defending the use of the mean - because it doesn't have one. Not even, a la St Petersburg paradox, an infinite mean, just no mean at all. But it does have a median, exactly placed in the natural middle.

Your argument works somewhat better with one of the stable distributions with an alpha between 1 and 2. But even there, you need a non-zero beta or else median=mean! The standard deviation is an upper bound on the difference, not necessarily a sharp one.

It would be interesting to analyse the difference between mean and median for stable distributions with non-zero beta; I'll get round to that some day. My best guess is that you could use some fractional moment to bound the difference, instead of (the square root of) the variance.

EDIT: this is indeed the case, you can use Jensen's inequality to show that the q-th root of the q-th absolute value central moment, for 1<q<2, can be substituted as a bound between mean and moment. For q<alpha, this should be finite.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T16:35:29.778Z · LW(p) · GW(p)

I only brought up Cauchy to show that infinite-variance distributions don't have to be weird and funky. Show a plot of a Cauchy pdf to someone who had, like, one undergrad stats course and she'll say something like "Yes, that's a bell curve" X-/

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T18:46:22.690Z · LW(p) · GW(p)

Actually, there's no need for higher central moments. The mean absolute deviation around the mean (which I would have called the first absolute central moment) bounds the difference between mean and median, and is sharper than the standard deviation.

↑ comment by V_V · 2015-09-09T09:08:14.697Z · LW(p) · GW(p)

As the Wikipedia says, "If the distribution has finite variance". That's not necessarily a good assumption.

In fact, "Pascal's mugging" scenarios tend to pop up when you allow for utility distributions with infinite variance.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T16:30:09.761Z · LW(p) · GW(p)

For Pascal's Muggings I don't think you care that much about variance -- what you want is a gargantuan skew.

comment by OrphanWilde · 2015-09-08T17:05:44.847Z · LW(p) · GW(p)

It's obvious that humans don't actually maximise a utility function; but according to the axioms, we should do so.

Given a choice between "change people" and "change axioms", I'd be inclined to change axioms.

Replies from: DanielLC

↑ comment by DanielLC · 2015-09-18T20:16:34.660Z · LW(p) · GW(p)

If you're a psychologist and you care about describing people, change the axioms. If you're a rationalist and you care about getting things done, change yourself.

comment by Irgy · 2015-09-10T09:06:06.953Z · LW(p) · GW(p)

This seems to be a case of trying to find easy solutions to hard abstract problems at the cost of failing to be correct on easy and ordinary ones. It's also fairly trivial to come up with abstract scenarios where this fails catastrophically, so it's not like this wins on the abstract scenarios front either. It just fails on a new and different set of problems - ones that aren't talked about because no-one's ever found a way to fail on them before.

Also, all of the problems you list it solving are problems which I would consider to be satisfactorily solved already. Pascal's mugging fails if the believability of the claim is impacted by the magnitude of the numbers in it, since the mugger can keep naming bigger numbers and simply suffer lower credibility as a result. The St Petersburg paradox is intellectually interesting but impossible to actually construct in practice given a finite universe (versions using infinite time are defeated by bounded utility within a time period and geometric future discounting). The Cauchy distribution is just one of many functions with no mean, all that tells me is that it's the wrong function to model the world with if you know the world should have a mean. And the repungent conclusion, well I can't comment usefully about this because "repungent" or not I've never viewed it to be incorrect in the first place - so to me this potentially justifying smaller but happier populations is an error if anything.

I just think it's worth making the point that the existing, complex solutions to these problems are a good thing. Complexity-influenced priors, careful handling of infinite numbers, bounded utility within a time period, geometric future discounting, integratable functions and correct utility summation and zero-points are all things we want to be doing anyway. Even when they're not resolving a paradox! The paradoxes are good, they teach us things which circumventing the paradoxes in this way would not.

PS People feel free to correct my incomplete resolutions of those paradoxes, but be mindful of whether any errors or differences of opinion I might have actually undermine my point here or not.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-10T23:54:11.201Z · LW(p) · GW(p)

Median utility does fail trivially. But it opens the door to other systems which might not. He just posted a refinement on this idea, Mean of Quantiles.

IMO this system is much more robust than expected utility. EU is required to trade away utility from the majority of possible outcomes to really rare outliers, like the mugger. Median utility will get you better outcomes at least 50% of the time. And tradeoffs like the one above, will get you outcomes that are good in the majority of possible outcomes, ignoring rare outliers. I'm not satisfied it's the best possible system, so the subject is still worth thinking about and debating.

I don't think any of your paradoxes are solved. You can't get around Pascal's mugging by modifying your probability distribution. The probability distribution has nothing to do with your utility function or decision theory. Besides being totally inelegant and hacky, there might be practical consequences. Like you can't believe in the singularity now. The singularity could lead to vastly high utility futures, or really negative ones. Therefore it's probability must be extremely small.

The St Petersburg casino is silly of course, but there's no reason a real thing couldn't produce a similar distribution. If you have some sequence of probabilities dependent on each other, that each have 1/2 probability, and give increasing utility.

Replies from: Irgy

↑ comment by Irgy · 2015-09-11T04:12:22.266Z · LW(p) · GW(p)

I do acknowledge that my comment was overly negative, certainly the ideas behind it might lead to something useful.

I think you misunderstand my resolution of the mugging (which is fair enough since it wasn't spelled out). I'm not modifying a probability, I'm assigning different probabilities to different statements. If the mugger says he'll generate 3 units of utility difference that's a more plausible statement than if the mugger says he'll generate 3^^^3, etc. In fact, why would you not assign a different probability to those statements? So long as the implausibility grows at least as fast as the value (and why wouldn't it?) there's no paradox.

Re St Petersburg, sure you can have real scenarios that are "similar", it's just that they're finite in practice. That's a fairly important difference. If they're finite then the game has a finite value, you can calculate it, and there's no paradox. In which case median utility can only give the same answer or an exploitably wrong answer.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-19T12:29:39.441Z · LW(p) · GW(p)

The whole point of the Pascal's Mugging scenario is that the probability doesn't decrease faster than the reward. If for example, you decrease the probability by half for each additional bit it takes to describe, 3^^^3 still only takes a few bits to write down.

Do you believe it's literally impossible that there is a matrix? Or that it can't be 3^^^3 large? Because when you assign these things so low probability, you are basically saying they are impossible. No amount of evidence could convince you otherwise.

I think EY had the best counter argument. He had a fictional scenario where a physicist proposed a new theory that was simple and fit the data perfectly. But the theory also implies a new law of physics that could be exploited for computing power, and would allow unfathomably large amounts of computing power. And that computing power could be used to create simulated humans.

Therefore anyone alive today has a small probability of affecting large amounts of simulated people. Since that is impossible, the theory must be wrong. It doesn't matter if it's simple or if it fits the data perfectly.

If they're finite then the game has a finite value, you can calculate it, and there's no paradox. In which case median utility can only give the same answer or an exploitably wrong answer.

Even in finite case, I believe it can grow quite large as the number of iterations increases. It's one expected dollar each step. Each step having half the probability of the previous step, and twice the reward.

Imagine the game goes for n finite steps. An expected utility maximizer would still spend $n to play the game. A median maximizer would say "You are never going to win in the liftetime of the universe and then some, so no thanks." The median maximizer seems correct to me.

Replies from: Irgy

↑ comment by Irgy · 2015-09-21T00:19:44.144Z · LW(p) · GW(p)

Re St Petersburg, I will reiterate that there is no paradox in any finite setting. The game has a value. Whether you'd want to take a bet at close to the value of the game in a large but finite setting is a different question entirely.

And one that's also been solved, certainly to my satisfaction. Logarithmic utility and/or the Kelly Criterion will both tell you not to bet if the payout is in money, and for the right reasons rather than arbitrary, value-ignoring reasons (in that they'll tell you exactly what you should pay for the bet). If the payout is directly in utility, well I think you'd want to see what mindbogglingly large utility looked like before you dismiss it. It's pretty hard if not impossible to generate that much utility with logarithmic utility of wealth and geometric discounting. But even given that, a one in a triillion chance at a trillion worthwhile extra days of life may well be worth a dollar (assuming I believed it of course). I'd probably just lose the dollar, but I wouldn't want to completely dismiss it without even looking at the numbers.

Re the mugging, well I can at least accept that there are people who might find this convincing. But it's funny that people can be willing to accept that they should pay but still don't want to, and then come up with a rationalisation like median maximising, which might not even pay a dollar for the mugger not to shoot their mother if they couldn't see the gun. If you really do think it's sufficiently plausible, you should actually pay the guy. If you don't want to pay I'd suggest it's because you know intuitively that there's something wrong with the rationale and refuse to pay a tax on your inability to sort it out. Which is the role the median utility is trying to play here, but to me it's a case of trying to let two wrongs make a right.

Personally though I don't have this problem. If you want to define "impossible" as "so unlikely that I will correctly never account for it in any decision I ever make" then yes, I do believe it's impossible and so should anyone. Certainly there's evidence that could convince me, even rather quickly, it's just that I don't expect to ever see such evidence. I certainly think there might be new laws of physics, but new laws of physics that lead to that much computing power that quickly is something else entirely. But that's just what I think, and what you want to call impossible is entirely a non-argument, irrelevant issue anyway.

The trap I think is that when one imagines something like the matrix, one has no basis on which to put an upper bound on the scale of it, so any size seems plausible. But there is actually a tool for that exact situation: the ignorance prior of a scale value, 1/n. Which happens to decay at exactly the same rate as the number grows. Not everyone is on board with ignorance priors but I will mention that the biggest problem with the 1/n ignorance prior is actually that it doesn't decay fast enough! Which serves to highlight the fact that if you're willing to have the plausibility decay even slower than 1/n, your probability distribution is ill-formed, since it can't integrate to 1.

Now to steel-man your argument, I'm aware of the way to cheat that. It's by redistributing the values by, for instance, complexity, such that a family of arbitrarily large numbers can have sufficiently high probability assigned while the overall integral remains unity. What I think though - and this is the part I can accept people might disagree with, is that it's a categorical error to use this distribution for the plausibility of a particular matrix-like unknown meta-universe. Complexity based probability distributions are a very good tool to describe, for instance, the plausibility of somebody making up such a story, since they have limited time to tell it and are more likely to pick a number they can describe easily. But being able to write a computer program to generate a number and having the actual physical resources to simulate that number of people are two entirely different sorts of things. I see no reason to believe that a meta-universe with 3^^^3 resources is any more likely than a meta-universe with similarly large but impossible to describe resources.

So I'll stick with my proportional to 1/n likelihood of meta-universe scales, and continue to get the answer to the mugging that everyone else seems to think is right anyway. I certainly like it a lot better than median utility. But I concede that I shouldn't have been quite so discouraging of someone trying to come up with an alternative, since not everyone might be convinced.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-22T22:03:13.885Z · LW(p) · GW(p)

Re St Petersburg, I will reiterate that there is no paradox in any finite setting. The game has a value. Whether you'd want to take a bet at close to the value of the game in a large but finite setting is a different question entirely.

Well there are two separate points of the St Petersburg paradox. One is the existence of relatively simple distributions that have no mean. It doesn't converge on any finite value. Another example of such a distribution, which actually occurs in physics, is the Cauchy distribution.

Another, which the original Pascal's Mugger post was intended to address, was Solomonoff induction. The idealized prediction algorithm used in AIXI. EY demonstrated that if you use it to predict an unbounded value like utility, it doesn't converge or have a mean.

The second point is just that the paying more than a few bucks to pay the game is silly. Even in a relatively small finite version of it. The probability of losing is very high. Even though it has a positive expected utility. And this holds even if you adjust the payout tables to account for utility != dollars.

You can bite the bullet and say that if the utility is really so high, you really should take that bet. And that's fine. But I'm not really comfortable betting away everything on such tiny probabilities. You are basically guaranteed to lose and end up worse than not betting.

not even pay a dollar for the mugger not to shoot their mother if they couldn't see the gun.

You can do a tradeoff between median maximizing and expected utility with mean of quantiles. This basically gives you the best average outcome ignoring incredibly unlikely outcomes. Even median maximizing by itself, which seems terrible, will give you the best possible outcome >50% of the time. The median is fairly robust.

Whereas expected utility could give you a shitty outcome 99% of the time or 99.999% of the time, etc. As long as the outliers are large enough.

Certainly there's evidence that could convince me, even rather quickly, it's just that I don't expect to ever see such evidence.

If you are assigning 1/3^^^3 probability to something, then no amount of evidence will ever convince you.

I'm not saying that unbounded computing power is likely. I'm saying you shouldn't assign infinitely small probability to it. The universe we live in runs on seemingly infinite computing power. We can't even simulate the very smallest particles because of how large the number of computations grows.

Maybe someday someone will figure out how to use that computing power. Or even figure out that we could interact with the parent universe that runs us, etc. You shouldn't use a model that assigns these things 0 probability.

comment by V_V · 2015-09-09T08:59:25.061Z · LW(p) · GW(p)

The main theoretic justifications for the use of expected utility - hence of means - are the von Neumann Morgenstern axioms. Using the median obeys the completeness and transitivity axioms, but not the continuity and independence ones. It does obey weaker forms of continuity; but in a sense, this doesn't matter. You can avoid all these issues by making a single 'ultra-choice'. Simply list all the possible policies you could follow, compute their median return, and choose the one with the best median return. Since you're making a single choice, independence doesn't apply.

I think you misunderstand the von Neumann-Morgenstern axioms. Von Neumann-Morgenstern theory refers to one-shot decision making, not iterated decision making, hence there is nothing you can fix by taking decisions over policies.

Median utility maximization satisfy the axioms of completeness, transitivity. It does not satisfy continuity and independence of irrelevant alternatives.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T10:03:27.266Z · LW(p) · GW(p)

The independence axiom derives most of it intuitive strength from the fact that if you violate it, you can be money pumped when presented with a sequence of decisions. When making a single decision over policy, independence has far less intuitive strength, as violating it has no actual cost.

Replies from: V_V

↑ comment by V_V · 2015-09-09T12:47:23.084Z · LW(p) · GW(p)

The independence axiom derives most of it intuitive strength from the fact that if you violate it, you can be money pumped when presented with a sequence of decisions.

If your preferences aren't transitive, then even your one-shot decision making system is completely broken, since it can't even yield an action that is "preferred" in a meaningful sense. Vulnerability to money pumping would be the last of your concerns in this case.

Money pumping is an issue in sequential decision making with time-discounting and/or time horizons: any method to aggregate future utilities other than exponential discounting ( * ) over an infinite time horizon yields dynamic inconsistency which could, in principle, be exploited for money pumping.

The intuitive justification for the independence axiom is the following:

What would you like for dessert, sir? Ice cream or cake?
Ice cream.
Oh sorry, I forgot! We also have fruit.
Then cake.

This decision making example looks intuitively irrational. If you prefer ice cream to cake when they are the only two alternatives, then why would you prefer cake to ice cream when a third, inferior, alternative is included? The independence axiom formalizes this intuition about rational behavior.

( * with no discounting being a special case of exponential discounting)

Replies from: AlexMennen, Stuart_Armstrong, Jiro

↑ comment by AlexMennen · 2015-09-11T01:16:00.254Z · LW(p) · GW(p)

If you prefer ice cream to cake when they are the only two alternatives, then why would you prefer cake to ice cream when a third, inferior, alternative is included?

You're thinking of a different meaning of "independence". A violation of the independence axiom of VNM would look more like this:

What would you like for dessert, sir? Ice cream or cake?
Ice cream.
Oh sorry, I forgot! There is a 50% chance that we are out of both ice cream and cake (I know we have either both or neither). But I'll go check, and if we're not out of dessert, I'll get you your ice cream.
Oh, in that case I'll have cake instead.

Replies from: V_V

↑ comment by V_V · 2015-09-11T10:39:05.186Z · LW(p) · GW(p)

Yes, I believe that this is a stronger version. Median utility satisfies the weaker version of the axiom but not the stronger one.

↑ comment by Stuart_Armstrong · 2015-09-09T12:49:30.062Z · LW(p) · GW(p)

But notice you had two decision points there.

Intransitivity breaks your decision system with a single decision point; dependence does not. Hence a single policy decision has to be transitive, but need not be independent.

Replies from: V_V

↑ comment by V_V · 2015-09-09T12:54:29.377Z · LW(p) · GW(p)

The first decision is immediately canceled and has no effect on your utility, hence it isn't really a relevant decision point.

More generally, the independence axiom makes sure that the outcome of your decision process is not affected by bad options that are available to you.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T13:03:31.195Z · LW(p) · GW(p)

Except that median-maximising respects independence for options that are available to you (or can be trivially tweaked to do so). It only violates independence for hypothetical bad options that will never be available to you.

↑ comment by Jiro · 2015-09-09T15:04:54.950Z · LW(p) · GW(p)

If you prefer ice cream to cake when they are the only two alternatives, then why would you prefer cake to ice cream when a third, inferior, alternative is included?

It can be rational to do this. There's a paradox publicized by Martin Gardner demonstrating how. Unfortunately the best link I could easily find was a Reddit comment, but try https://www.reddit.com/r/fffffffuuuuuuuuuuuu/comments/gxwqe/why_i_hate_people/c1r5203 .

comment by Houshalter · 2015-09-09T02:56:02.414Z · LW(p) · GW(p)

I posted this exact idea a few months ago. There was a lot of discussion about it which you might find interesting. We also discussed it recently on the irc channel.

Median utility by itself doesn't work. I came up with an algorithm that compromises between them. In everyday circumstances it behaves like expected utility. In extreme cases, it behaves like median utility. And it has tunable parameters:

sample n counterfactuals from your probability distribution. Then take the average of these n outcomes, [EDIT: and do this an infinite amount of times, and take the median of all these means]. E.g. so 50% of the time the average of the n outcomes is higher, and 50% of the time it's lower.

As n approaches infinity it becomes equivalent to expected utility, and as it approaches 1 it becomes median expected utility. A reasonable value is probably a few hundred. So that you select outcomes where you come out ahead the vast majority of the time, but still take low probability risks or ignore low probability rewards.

EDIT: Stuart Armstrong's idea is much better than this and gets about the same results: http://lesswrong.com/r/discussion/lw/mqk/mean_of_quantiles/

I believe this more closely matches how humans actually make decisions, and what we actually want, than expected utility. But I am no longer certain of this. Someone suggested that you can deal with most of the expected utility issues by modifying the utility function. And that is somewhat more elegant than this.

As for inconsistency, I proposed a way of dealing with that too. EU is consistent at every single point in time. It's memoryless. If you can precommit yourself to doing certain things in the future, you don't need this property. You can maintain consistency by committing yourself to only take actions that are consistent with your current decision theory.

This is basically the same thing as your policy selection idea.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T03:44:38.317Z · LW(p) · GW(p)

I came up with an algorithm that compromises between them.

I am not sure of the point. If you can "sample ... from your probability distribution" then you fully know your probability distribution including all of its statistics -- mean, median, etc. And then you proceed to generate some sample estimates which just add noise but, as far as I can see, do nothing else useful.

If you want something more robust than the plain old mean, check out M-estimators which are quite flexible.

Replies from: evand, Houshalter

↑ comment by evand · 2015-09-09T14:37:58.960Z · LW(p) · GW(p)

If you can "sample ... from your probability distribution" then you fully know your probability distribution

That's not true. (Though it might well be in all practical cases.) In particular, there are good algorithms for sampling from unknown or uncomputable probability distributions. Of course, any method that lets you sample from it lets you sample the parameters as well, but that's exactly the process the parent comment is suggesting.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T17:02:09.226Z · LW(p) · GW(p)

A fair point, though I don't think it makes any difference in the context. And I'm not sure the utility function is amenable to MCMC sampling...

Replies from: evand

↑ comment by evand · 2015-09-10T03:30:35.629Z · LW(p) · GW(p)

I basically agree. However...

It might be more amenable to MCMC sampling than you think. MCMC basically is a series of operations of the form "make a small change and compare the result to the status quo", which now that I phrase it that way sounds a lot like human ethical reasoning. (Maybe the real problem with philosophy is that we don't consider enough hypothetical cases? I kid... mostly...)

In practice, the symmetry constraint isn't as nasty as it looks. For example, you can do MH to sample a random node from a graph, knowing only local topology (you need some connectivity constraints to get a good walk length to get good diffusion properties). Basically, I posit that the hard part is coming up with a sane definition for "nearby possible world" (and that the symmetry constraint and other parts are pretty easy after that).

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-10T14:41:12.454Z · LW(p) · GW(p)

Maybe the real problem with philosophy is that we don't consider enough hypothetical cases? I kid... mostly...

In that case we can have wonderful debates about which sub-space to sample our hypotheticals from, and once a bright-eyed and bushy-tailed acolyte breates out "ALL of it!" we can pontificate about the boundaries of all :-)

P.S. In about a century philosophy will discover the curse of dimensionality and there will be much rending of clothes and gnashing of teeth...

↑ comment by Houshalter · 2015-09-09T04:21:39.387Z · LW(p) · GW(p)

I should have explained it better. You take n samples, and calculate the mean of those samples. You do that a bunch of times, and create a new distribution of those means of samples. Then you take the median of that.

This gives a tradeoff between mean and median. As n goes to infinity, you just get the mean. As n goes to 1, you just get the median. Values in between are a compromise. n = 100 will roughly ignore things that have less than 1% chance of happening (as opposed to less than 50% chance of happening, like the standard median.)

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T04:53:32.709Z · LW(p) · GW(p)

This gives a tradeoff between mean and median.

There is a variety of ways to get a tradeoff between the mean and the median (or, more generally, between an efficient but not robust estimator and a robust but not efficient estimator). The real question is how do you decide what a good tradeoff is.

Basically if your mean and your median are different, your distribution is asymmetric. If you want a single-point summary of the entire distribution, you need to decide how to deal with that asymmetry. Until you specify some criteria under which you'll be optimizing your single-point summary you can't really talk about what's better and what's worse.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-09T21:02:59.821Z · LW(p) · GW(p)

This is just one of many possible algorithms which trade off between median and mean. Unfortunately there is no objective way to determine which one is best (or the setting of the hyperparameter.)

The criteria we are optimizing is just "how closely does it match the behavior we actually want."

EDIT: Stuart Armstrong's idea is much better: http://lesswrong.com/r/discussion/lw/mqk/mean_of_quantiles/

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T21:07:45.122Z · LW(p) · GW(p)

And what is "the behavior we actually want"?

comment by AlexMennen · 2015-09-09T01:59:28.114Z · LW(p) · GW(p)

I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world. It seemed kind of like you might be trying to argue that median utility maximizers and expected utility maximizers would always approximate each other under realistic conditions, but since you then argue that the alleged difference in their behavior on the Pascal's mugging problem is a reason to prefer median utility maximizers (implying that Pascal's mugging-type problems should be accepted as realistic, or at least that getting them correct is important in a way that getting "buckle my seatbelt, given that this is the only decision I will ever make" right isn't), so I guess that's not it.

But anyway, even if you are right that median utility maximizers buckle their seatbelts in the context of a realistic collections of choices, you concede that they do not buckle their seatbelts when the decision is isolated, and that this is the incorrect decision. I think you should take the fact that your proposal gets a really easy problem wrong much more seriously. If it can't get the seatbelt problem right, it is a bad algorithm, and bad algorithms should not be expected to perform well in real-world problems. I would give an example of a real-world problem that it performs poorly on, but I would have said something like the seatbelt problem, and since I don't understand your argument that it gets that right in the real world, I don't know what must be done in order to construct an example to which your argument does not apply.

Furthermore, I am unimpressed that median utility maximizers reject Pascal's mugging. If you take a random function from decision problems to decisions, there is about a 50% chance it will reject Pascal's mugging, but that doesn't make it a good decision theory. And median utility maximizers do not reject Pascal's mugging for correct reasons. To see this, note that if the seatbelt problem is considered in isolation, it looks exactly like the Pascal's mugging problem, in terms of all the information that median utility maximizers pay attention to, so median utility maximizers do analogous actions in each problem (don't bother putting your seatbelt on, and don't pay the mugger, respectively). However, there are important differences between the problems that make it correct to put your seatbelt on but not pay the mugger. Since a median utility maximizer does not consider these differences, its decision not to pay the mugger does not take into account the reasons that it is a good idea not to pay the mugger. It appears to me that you are not even really trying to come up with a way to make the right decisions for the right reasons, and instead you are merely trying to find a way to make the right decisions. I think that this approach is misguided, because the space of possible failure modes for a decision theory is vast, so if you successfully kludge together a decision procedure into performing well on a certain reasonably finite collection of decision problems, without ensuring that it arrives at its decisions in ways that make sense, the chances that it performs well on all decision problems, or even most of them, is vanishingly small.

Since you brought up the iterated Pascal's mugging, perhaps part of your motivation for this was to find something that would not pay in the isolated Pascal's mugging, but pay each time in the iterated Pascal's mugging? First of all, as literally stated, paying each time in the iterated Pascal's mugging isn't even an available option (I don't have $5 billion, so I can't pay off 1 billion muggers), so it is trivially false that the correct action is to pay every time. However, it is true that there are interpretations of what you could mean under which I would agree that paying is the correct action. But in those cases, an expected utility maximizer with a reasonable bounded utility function will pay, even while not paying in the standard Pascal's mugging problem. (The naive model of the situation in which iterating the problem does not change how an expected utility maximizer handles it does not correctly model the interpretation of "iterated Pascal's mugging" in which it makes sense to pay. I'd say what I mean, but actually keeping track of everything relevant to the problem makes it somewhat tedious to explain.)

Replies from: Stuart_Armstrong, Houshalter

↑ comment by Stuart_Armstrong · 2015-09-09T10:40:13.705Z · LW(p) · GW(p)

I don't understand your argument that the median utility maximizer would buckle its seat belt in the real world.

It derives from the fact that median maximalisation doesn't consider decisions independently, even if their gains and losses are independent.

For illustration, compare the following deal: you pay £q, and get £1 with probability p. There are n independent deals (assume your utility is linear in £).

If n=1, the median maximiser accepts the deal iff q0.5. Not a very good performance! Now let's look at larger n. For m < n, accepting m deals gets you an expected reward of m(p-q). The median is a bit more complicated (see https://en.wikipedia.org/wiki/Binomial_distribution#Mode_and_median ), but it's within £1 of the mean reward.

So if pq, it will accept all n deals.

For pq, it will accept at least n - 1/(p-q) deals. In all cases, its expected loss, compared with the mean maximiser, is less than £1.

There's a similar effect going on when considering the seat-belt situation. Aggregation concentrates the distribution in a way that moved median and mean towards each other.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T16:24:43.753Z · LW(p) · GW(p)

You appear to now be making an argument that you already conceded was incorrect in OP:

This means that the decision of a median maximiser will be close to those of a utility maximiser - they take almost the same precautions - though the outcomes are still pretty far apart: the median maximiser accepts a 49.99999...% chance of death.

You then go on to say that if the agent also faces many decisions of a different nature, it won't do that. That's where I get lost.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T17:00:47.234Z · LW(p) · GW(p)

The median maximiser accepts a 49.99999...% chance of death, only because "death", "trivial cost" and "no cost" are the only options here. If I add "severe injury" and "light injury" to the outcomes, the maximiser will now accept less than a 49.9999...% chance of light injury. If we make light injury additive, and make the trivial cost also additive and not incomparable to light injuries, we get something closer to my illustrative example above.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T20:34:32.829Z · LW(p) · GW(p)

Suppose it comes up with 2 possible policies, one of which involves a 49% chance of death and no chance of injury, and another which involves a 49% chance of light injury, and no chance of heavy injury or death. The median maximizer sees no reason to prefer the second policy if they have the same effects the other 51% of the time.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-10T08:48:26.076Z · LW(p) · GW(p)

Er, yes, constructing single choice examples when the median behaves oddly/wrongly is trivial. My whole point is about what happens to median when you aggregate decisions.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-10T16:15:00.818Z · LW(p) · GW(p)

You were claiming that in a situation where a median-maximizing agent has a large number of trivially inconvenient action that prevent small risks of death, heavy injury, or light injury, then it would accept a 49% chance of light injury, but you seemed to imply that it would not accept a 49% chance of death. I was trying to point out that this appears to be incorrect.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-11T08:30:29.306Z · LW(p) · GW(p)

I'm not entirely sure what your objection is; we seem to be talking at cross purposes.

Let's try it simpler. If we assume that the cost of buckling seat belts is incommensurable (in practice) with light injury (and heavy injury, and death), then the median maximising agent will accept a 49.99..% chance of (light injury or heavy injury or death), over their lifetime. Since light injury is much more likely than death, this in effect forces the probability of death down to a very low amount.

It's just an illustration of the general point that median maximising seems to perform much better in real-world problems than its failure in simple theoretical ones would suggest.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-11T16:27:46.180Z · LW(p) · GW(p)

Since light injury is much more likely than death, this in effect forces the probability of death down to a very low amount.

No, it doesn't. That does not address the fact that the agent will not preferentially accept light injury over death. Adopting a policy of immediately committing suicide once you've been injured enough to force you into the bottom half of outcomes does not decrease median utility. The agent has no incentive to prevent further damage once it is in the bottom half of outcomes. As a less extreme example, the value of house insurance to a median maximizer is 0, just because loosing your house is a bad outcome even if you get insurance money for it. This isn't a weird hypothetical that relies on it being an isolated decision; it's a real-life decision that a median maximizer would get wrong.

Replies from: Stuart_Armstrong, Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-14T11:39:59.340Z · LW(p) · GW(p)

A more general way of stating how multiple decisions improve median maximalisation: the median maximaliser is indifferent of outcomes not at the median (eg suicide vs light injury). But as the decision tree grows and the number of possible situations does as well, the probability increases that outcomes not at the median in a one shot, will affect the median in the more complex situation.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-14T17:27:14.996Z · LW(p) · GW(p)

This argument relies on your utility being a sum of effects from each of the decisions you made, but in reality, your decisions interact in much more complicated ways, so that isn't a realistic model.

Also, if your defense of median maximization consists entirely of an argument that it approximates mean maximization, then what's the point of all this? Why not just use expected utility maximization? I'm expecting you to bring up Pascal's mugging here, but since VNM-rationality does not force you to pay the mugger, you'll have to do better than that.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-15T10:56:27.211Z · LW(p) · GW(p)

This argument relies on your utility being a sum of effects from each of the decisions you made

It doesn't require that in the least. I don't know if, eg, quadratic of higher order effects would improve or worsen the situation.

but since VNM-rationality does not force you to pay the mugger

The consensus at the moment seems to be that if you have unbounded utility, it does force you to pay some muggers. Now, I'm perfectly fine with bounding your utility to avoid muggers, but that's the kind of non-independent decision some people don't like ;-)

The real problem is things like the Cauchy distribution, or any function without an expectation value at all. Saying "VNM works fine as long as we don't face these difficult choices, then it breaks down" is very unsatisfactory. I'm also interested in seeing what happens when "expect to win" and "win in expectation" become quite distinct - a rare event, in practice.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-15T19:14:00.020Z · LW(p) · GW(p)

It doesn't require that in the least. I don't know if, eg, quadratic of higher order effects would improve or worsen the situation.

The more concrete argument you made previous does rely on it. If what you're saying now doesn't, then I guess I don't understand it.

Now, I'm perfectly fine with bounding your utility to avoid muggers, but that's the kind of non-independent decision some people don't like ;-)

I don't follow. Maximizing the expected value of a bounded utility functions does respect independence.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-16T11:24:27.836Z · LW(p) · GW(p)

The more concrete argument you made previous does rely on it. If what you're saying now doesn't, then I guess I don't understand it.

That was an example. There's another one in http://lesswrong.com/lw/1d5/expected_utility_without_the_independence_axiom/ which relies on "not risk loving". That post doesn't mention the median, but it does mention the standard deviation, and we know the mean must be within one SD of the mean (and often much closer).

I don't follow. Maximizing the expected value of a bounded utility functions does respect independence.

Choosing to bound an unbounded utility function to avoid muggers does not.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-16T20:51:58.629Z · LW(p) · GW(p)

That was an example. There's another one in http://lesswrong.com/lw/1d5/expected_utility_without_the_independence_axiom/

That example also relies on your utility being the sum of components that are determined from your various actions.

Choosing to bound an unbounded utility function to avoid muggers does not.

To be clear, I was not suggesting that you have an unbounded utility function that it would make sense for you to maximize if it weren't for Pascal's mugger, so you should bound it when there might be a Pascal's mugger around. I was suggesting that the utility function it makes sense for you to maximize is bounded. Unbounded utility functions are so loony they never should have been seriously considered in the first place; Pascal's mugger is merely a dramatic illustration of that fact.

Edit: I probably shouldn't rely on the theoretical reasons to prefer bounded utility functions, since they are not completely airtight and actual human preferences are more important anyway. So let's look at actual human preferences. Suppose you've got a rational agent with preference relation "<", and you want to test whether its utility function is bounded or unbounded. Here's a simple test: First find outcomes A and B such that A<B (if you can't even do that, its utility function is constant, hence bounded). Then pick an absurdly tiny probability p>0. Now see if you can find such a terrible C and such a wonderful D that pC+(1-p)B < pD + (1-p)A. If, for every p>0 you can find such C and D, then its utility function is unbounded. But if for some p>0, you cannot find any C and D that will suffice, even when you probe the extremes of goodness and badness, then its utility function is bounded. This test should sound familiar. What I'm getting at here is that one does not bound their unbounded utility function so that they don't have to pay Pascal's mugger; your preferences were simply bounded all along, and your response to Pascal's mugger is proof.

↑ comment by Stuart_Armstrong · 2015-09-14T11:37:58.924Z · LW(p) · GW(p)

Look, we're arguing past each other here. My logical response here would be to add more options to the system, which would remove the problem you identified (and I don't understand your house insurance example - this is just the seat-belt decision again as a one-shot, and I would address it by looking at all the financial decisions you make in your life - and if that's not enough, all the decisions, including all the "don't do something clearly stupid and pointless" ones).

What I think is clear is:

a) Median maximalisation makes bad decisions in isolated problems.

b) If we combine all the likely decisions that a median maximiser will have to make, the quality of the decisions increase.

If you want to argue against it, either say that a) is bad enough we should reject the approach anyway, even if it decides well in practice, or find examples where a real world median maximaliser will make bad decisions even in the real world (if you would pay Pascal's mugger, then you could use that as an example).

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-14T17:11:35.991Z · LW(p) · GW(p)

I don't understand your house insurance example - this is just the seat-belt decision again as a one-shot

We were modeling the seat-belt decision as something that makes the difference between being dead and being completely fine in the event of an accident (which I suppose is not very realistic, but whatever). I was trying to point to a situation where an event can happen which is bad enough to put in the bottom half of outcomes either way, so that nothing that happens conditional on the event can affect the median outcome, but a decision you can make ahead of time would make the difference between bad and worse.

I do think that a) is bad enough, because a decision procedure that does poorly in isolated problems is wrong, and thus cannot be expected to do well in realistic situations, as I mentioned previously. I guess b) is probably technically true, but it is not enough for the quality of the decisions to increase when the number increases; it should actually increase towards a limit that isn't still awful, and come close to achieving that limit (I'm pretty sure it fails on at least one of those, though which step it fails on might depend on how you make things precise). I've given examples where median maximizers make bad decisions in the real world, but you've dismissed them with vague appeals to "everything will be fine when you consider it in the context of all the other decisions it has to make".

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-15T11:23:13.932Z · LW(p) · GW(p)

I've given examples where median maximizers make bad decisions in the real world, but you've dismissed them with vague appeals to "everything will be fine when you consider it in the context of all the other decisions it has to make".

And I've added in the specific other decisions needed to achieve this effect. I agree it's not clear what exactly the median maximalisation converge on in the real world, but the examples you've produced are not sufficient to show it's bad.

I do think that a) is bad enough, because a decision procedure that does poorly in isolated problems is wrong

My take on this is that counterfactual decision count as well. ie if humans look not only at the decisions they face, but the ones they can imagine facing, then median maximalisation is improved. My justification for this line of thought is - how do you know that one chocolate cake is +10 utility while one coffee is +2 (and two coffees is +3, three is +2, and four is -1)? Not just the ordinal ranking, but the cardinality. I'd argue that you get this by either experiencing circumstances where you choose a 20% chance of a cake over coffee, or imagining yourself in that circumstance. And if imagination and past experiences are valid for the purpose of constructing your utility function, they should be valid for the purpose of median-maximalisation.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-15T20:23:35.669Z · LW(p) · GW(p)

And I've added in the specific other decisions needed to achieve this effect.

That you claim achieve that effect. But as I said, unless the are choices you can make that would protect you from light injury involve less inconvenience per % reduction in risk than the choices you can make that would protect you from death, it doesn't work.

However, I did think of something which seems to sort of achieve what you want: if you have high uncertainty about what the value of your utility function will be, then adding something to it with some probability will have a significant effect on the median value, even if the probability is significantly less than 50%. For instance, a 49% chance of death is very bad because if there's a 49% chance you die, then the median outcome is one in which you're alive but in a worse situation than all but 1/51 of the scenarios in which you die. It may be that this is what you had in mind, and adding future decisions that involve uncertainty was merely a mechanism by which large uncertainty about the outcome was introduced, in which case future-you actually getting to make any choices about them was a red herring. I still don't find this argument convincing either, though, both because it still undervalues protection from risks of losses that are large relative to the rest your uncertainty about the value of the outcome (for instance, note that when valuing reductions in risk of death, there is still a weird discontinuity around 50%), and because it assumes that you can't make decisions that selectively have significant consequences only in very good or very bad outcomes (this is what I was getting at with the house insurance example).

My take on this is that counterfactual decision count as well. ... And if imagination and past experiences are valid for the purpose of constructing your utility function, they should be valid for the purpose of median-maximalisation.

I don't understand what you're saying here. Is it that you can maximize the median value of the mean of the values of your utility function in a bunch of hypothetical scenarios? If so, that sounds kind of like Houshalter's median of means proposal, which approaches mean maximization as the number of samples considered approaches infinity.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-16T11:29:56.020Z · LW(p) · GW(p)

I don't understand what you're saying here.

The observation I have is that when facing many decisions, median maximialisation tends to move close to mean maximalisation (since the central limit theorem has "convergence in the distribution", the median will converge to the mean in the case of averaging repeated independent processes; but there are many other examples of this). Therefore I'm considering what happens if you add "all the decisions you can imagine making" to the set of actual decisions you expect to make. This feels like it should move the two even closer together.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-16T22:41:54.735Z · LW(p) · GW(p)

Ah, are you saying you should use your prior to choose a policy that maximizes your median utility, and then implementing that policy, rather than updating your prior with your observations and then choosing a policy that maximizes the median? So like UDT but with medians?

It seems difficult to analyze how it would actually behave, but it seems likely to be true that it acts much more similarly to mean utility maximization than it would if you updated before choosing the policy. Both of these properties (difficulty to analyze, and similarity to mean maximization) make it difficult to identify problems that it would perform poorly on. But this also makes it difficult to defend its alleged advantages (for instance, if it ends up being too similar to mean maximization, and if you use an unbounded utility function as you seem to insist, perhaps it pays Pascal's mugger).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-17T10:03:48.436Z · LW(p) · GW(p)

Ah, are you saying you should use your prior to choose a policy that maximizes your median utility, and then implementing that policy, rather than updating your prior with your observations and then choosing a policy that maximizes the median? So like UDT but with medians?

Ouch! Sorry for not being clear. If you missed that, then you can't have understood much of what I was saying!

↑ comment by Houshalter · 2015-09-09T04:09:57.005Z · LW(p) · GW(p)

How do you know that it's right to buckle your seatbelt? If you are only going to ride in a car once, never again. And there are no other risks to your life, and so no need to make a general policy against taking small risks?

I'm not confident that it's actually the wrong choice. And if it is, it shouldn't matter much. 99.99% of the time, the median will come out with higher utility than the EU maximizer.

This is generalizable. If there was a "utility competition" between different decision policies in the same situations, the median utility would usually come out on top. As the possible outcomes become more extreme and unlikely, expected utility will do worse and worse. With pascal's mugging at the extreme.

That's because EU trades away utility from the majority of possible outcomes, to really really unlikely outcomes. Outliers can really skew the mean of a distribution, and EU is just the mean.

Of course median can be exploited too. Perhaps there is some compromise between them that gets the behavior we want. There are an infinite number of possible policies for deciding which distribution of utilities to prefer.

EU was chosen because it is the only one that meets a certain set of conditions and is perfectly consistent. But if you allow for algorithms that select overall policies instead of decisions, like OP does, then you can make many different algorithms consistent.

So there is no inherent reason to prefer mean over median. It just comes down to personal preference, and subjective values. What probability distribution of utilities do you prefer?

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T04:59:45.040Z · LW(p) · GW(p)

How do you know that it's right to buckle your seatbelt? If you are only going to ride in a car once, never again.

I do think that the isolation of the decision is a red herring, but for the sake of the point I was trying to make, it is probably easier to replace the example with a structurally similar one in which the right answer is obvious: suppose you have the opportunity to press a button that will kill you will 49% probability, and give you $5 otherwise. This is the only decision you will ever make. Should you press the button?

Perhaps there is some compromise between them that gets the behavior we want.

As I was saying in my previous comment, I think that's the wrong approach. It isn't enough to kludge together a decision procedure that does what you want on the problems you thought of, because then it will do something you don't want on something you haven't thought of. You need a decision procedure that will reliably do the right thing, and in order to get that, you need it to do the right thing for the right reasons. EU maximization, applied properly, will tell you to do the correct things, and will do so for the correct reasons.

So there is no inherent reason to prefer mean over median.

Actually, there is: https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-09T05:55:02.397Z · LW(p) · GW(p)

suppose you have the opportunity to press a button that will kill you will 49% probability, and give you $5 otherwise.

Yes I said that median utility is not optimal. I'm proposing that there might be policies better than both EU or median.

Actually, there is: https://en.wikipedia.org/wiki/Von_Neumann%E2%80%93Morgenstern_utility_theorem

Please reread the OP and my comment. If you allow selection over policies instead of individual decisions, you can be perfectly consistent. EU and median are both special cases of ways to pick policies, based on the probability distribution of utility they produce.

You need a decision procedure that will reliably do the right thing, and in order to get that, you need it to do the right thing for the right reasons. EU maximization, applied properly, will tell you to do the correct things, and will do so for the correct reasons.

There is no law of the universe that some procedures are correct and others aren't. You just have to pick one that you like, and your choice is going to be arbitrary.

If you go with EU you are pascal muggable. If you go with median you are muggable in certain cases as well (though you should usually, with >50% probability, end up with better outcomes in the long run. Whereas EU could possibly fail 100% of the time. So it's exploitable, but it's less exploitable at least.)

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T07:46:52.246Z · LW(p) · GW(p)

If you allow selection over policies instead of individual decisions, you can be perfectly consistent.

I don't see how selecting policies instead of actions removes the motivation for independence.

You just have to pick one that you like, and your choice is going to be arbitrary.

Ultimately, it isn't the policy that you care about; it's the outcome. So you should pick a policy because you like the probability distributions over outcomes that you get from implementing it more than you like the probability distributions over outcomes that you would get from implementing other policies. Since there are many decision problems to use your policy on, this quite heavily constrains what policy you choose. In order to get a policy that reliably picks the actions that you decide are correct in the situations where you can tell what the correct action is, it will have to make those decisions for the same reason you decided that it was the best action (or at least something equivalent to or approximating the same reason). So no, the choice of policy is not at all arbitrary.

If you go with EU you are pascal muggable.

That is not true. EU maximizers with bounded utility functions reject Pascal's wager.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T10:52:25.854Z · LW(p) · GW(p)

I don't see how selecting policies instead of actions removes the motivation for independence.

There are two reasons to like independence. First of all, you might like it for philosophical/aesthetic reasons: "these things really should be independent, these really should be irrelevant". Or you could like it because it prevents you from being money pumped.

When considering policies, money pumping is (almost) no longer an issue, because a policy that allows itself to be money-pumped is (almost) certainly inferior to one that doesn't. So choosing policies removes one of the motivations for independence, to my mind the important one.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T20:29:59.550Z · LW(p) · GW(p)

While it's true that this does not tell you to pay each time to switch the outcomes around in a circle over and over again, it still falls prey to one step of a similar problem. Suppose their are 3 possible outcomes: A, B, and C, and there are 2 possible scenarios: X and Y. In scenario X, you get to choose between A and B. In scenario Y, you can attempt to choose between A and B, and you get what you picked with 50% probability, and you get outcome C otherwise. In each scenario, this is the only decision you will ever make. Suppose in scenario X, you prefer A over B, but in scenario Y, you prefer (B+C)/2 over (A+C)/2. But suppose you had to pay to pick A in scenario X, and you had to pay to pick (B+C)/2 in scenario Y, and you still make those choices. If Y is twice as likely as X a priori, then you are paying to get a probability distribution over outcomes that you could have gotten for free by picking B given X, and (A+C)/2 given Y. Since each scenario only involves you ever getting to make one decision, picking a policy is equivalent to picking a decision.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-09T21:22:01.066Z · LW(p) · GW(p)

Your example is difficult to follow, but I think you are missing the point. If there is only one decision, then it's actions can't be inconsistent. By choosing a policy only once - one that maximizes it's desired probability distribution of utility outcomes - it's not money pumpable, and it's not inconsistent.

Now by itself it still sucks because we probably don't want to maximize for the best median future. But it opens up the door to more general policies for making decisions. You no longer have to use expected utility if you want to be consistent. You can choose a tradeoff between expected utility and median utility (see my top level comment), or a different algorithm entirely.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-09T23:52:42.490Z · LW(p) · GW(p)

If there is only one decision point in each possible world, then it is impossible to demonstrate inconsistency within a world, but you can still be inconsistent between different possible worlds.

Edit: as V_V pointed out, the VNM framework was designed to handle isolated decisions. So if you think that considering an isolated decision rather than multiple decisions removes the motivation for the independence axiom, then you have misunderstood the motivation for the independence axiom.

Replies from: Stuart_Armstrong, Houshalter

↑ comment by Stuart_Armstrong · 2015-09-10T08:46:45.271Z · LW(p) · GW(p)

So if you think that considering an isolated decision rather than multiple decisions removes the motivation for the independence axiom, then you have misunderstood the motivation for the independence axiom.

I understand the two motivations for the independence axiom, and the practical one ("you can't be money pumped") is much more important that the theoretical one ("your system obeys this here philosophically neat understanding of irrelevant information").

But this is kind of a moot point, because humans don't have utility functions. And therefore we will have to construct them. And the process of constructing them is almost certainly going to depend on facts about the world, making the construction process almost certainly inconsistent between different possible worlds.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-10T23:00:40.945Z · LW(p) · GW(p)

And the process of constructing them is almost certainly going to depend on facts about the world

It shouldn't. If your preferences among outcomes depend on what options are actually available to you, then I don't see how you can justify claiming to have preferences among outcomes, as opposed to tendencies to make certain choices.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-11T08:37:05.587Z · LW(p) · GW(p)

It shouldn't.

Then define me a process that takes people's current mess of preferences, makes these into utility functions, and, respecting bounded rationality, is independent of options available in the real world. Even then, we have the problem that this mess of preferences is highly dependent on real world experiences in the first place.

I don't see how you can justify claiming to have preferences among outcomes, as opposed to tendencies to make certain choices.

If I always go left at a road, I have tendency to make certain choices. If I have a full model of the entire universe with labelled outcomes ranked on a utility function, and use it with unbounded rationality to make decisions, I have preferences among outcomes. The extremes are clear.

I feel that a bounded human being with a crude mental model that is trying to achieve some goal, imperfectly (because of ingrained bad habits, for instance) is better described as having preferences among outcomes. You could argue that they have mere tendencies, but this seems to stretch the term. But in any case, this is a simple linguistic dispute. Real human beings cannot achieve independence.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-11T17:02:01.760Z · LW(p) · GW(p)

Then define me a process that takes people's current mess of preferences, makes these into utility functions, and, respecting bounded rationality, is independent of options available in the real world.

Define me a process with all those properties except the last one. If you can't do that either, it's not the last constraint that is to blame for the difficulty.

Even then, we have the problem that this mess of preferences is highly dependent on real world experiences in the first place.

Yes, different agents have different preferences. The same agent shouldn't have its preferences change when the available outcomes do.

If I have a full model of the entire universe with labelled outcomes ranked on a utility function, and use it with unbounded rationality to make decisions, I have preferences among outcomes.

If you are neutral between .4A+.6C and .4B+.6C, then you don't have a very good claim to preferring A over B.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-14T11:28:31.955Z · LW(p) · GW(p)

Define me a process with all those properties except the last one.

Well, there's my old idea here: http://lesswrong.com/lw/8qb/cevinspired_models/ . I don't think it's particularly good, but it does construct a utility function, and might be doable with good enough models or a WBE. More broadly, there's the general "figure out human preferences from their decisions and from hypothetical questions and fit a utility function to it", which we can already do today (see "inverse reinforcement learning"); we just can't do it well enough, yet, to get something generally safe at the other end.

None of these ideas have independent variants (not technically true; I can think of some independent versions of them, but they're so ludicrously unsafe in our world that we'd rule them out immediately; thus, this would be a non-independent process).

If you are neutral between .4A+.6C and .4B+.6C, then you don't have a very good claim to preferring A over B.

If I actually do prefer A over B (and my behaviour reflects that in (1- ɛ)A+ ɛC versus (1-ɛ)B+ ɛC cases), then I have an extremely good claim to preferring A over B, and an extremely poor claim to independence.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-14T18:07:16.700Z · LW(p) · GW(p)

Well, there's my old idea here: http://lesswrong.com/lw/8qb/cevinspired_models/ . I don't think it's particularly good

I assumed accuracy was implied by "making a mess of preferences into a utility function".

More broadly, there's the general "figure out human preferences from their decisions and from hypothetical questions and fit a utility function to it", which we can already do today (see "inverse reinforcement learning"); we just can't do it well enough, yet, to get something generally safe at the other end.

I'm somewhat skeptical of that strategy for learning utility functions, because the space of possible outcomes is extremely high-dimensional, and it may be difficult to test extreme outcomes because the humans you're trying to construct a utility function for might not be able to understand them. But perhaps this objection doesn't get to the heart of the matter, and I should put it aside for now.

None of these ideas have independent variants

I am admittedly not well-versed in inverse reinforcement learning, but this is a perplexing claim. Except for a few people like you suggesting alternatives, I've only ever heard "utility function" used to refer to a function you maximize the expected value of (if you're trying to handle uncertainty), or a function you just maximize the value of (if you're not trying to handle uncertainty). In the first case, we have independence. In the second case, the question of whether or not we obey independence doesn't really make sense. So if inverse reinforcement learning violates independence, then what exactly does it try to fit to human preferences?

If I actually do prefer A over B

Then if the only difference between two gambles is that one might give you A when the other might give you B, you'll take the one that might give you something you like instead of something you don't like.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-15T11:01:48.156Z · LW(p) · GW(p)

I've only ever heard "utility function" used to refer to

To be clear, I am saying the process of constructing the utility function violates independence, not that subsequently maximising it does. Similarly, choosing a median-maximising policy P violates independence, but there is (almost certainly) a utility u such that maximising u is the same as following P.

Once the first choice is made, we have independence in both cases; before it is made, we have it in neither. The philosophical underpinning of independence in single decisions therefore seems very weak.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-15T17:08:30.113Z · LW(p) · GW(p)

To be clear, I am saying the process of constructing the utility function violates independence

Feel free to tell me to shut up and learn how inverse reinforcement learning works before bothering you with such questions, if that is appropriate, but I'm not sure what you mean. Can you be more precise about what property you're saying inverse reinforcement learning doesn't have?

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-16T11:10:56.750Z · LW(p) · GW(p)

Inverse reinforcement learning relies on observation of humans performing specific actions, and drawing the "right" conclusion as to what their preferences. Indirectly, it relies on humans tinkering with its code to remove "errors", ie things that don't fit with the mental image that human programmers of what preferences should be.

Given that human desires are not independent (citation not needed), this process, if it produces a utility function, involves constructing something independent from non-independent input. However, to establish this utility function, the algorithm has access only to the particular problems given to it, and the particular mental images of its programmers. It is almost certain that the end result would be somewhat different if it was trained on different problems, or if its programmers had different intuitions. Therefore the process itself cannot be independent.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-16T22:08:40.273Z · LW(p) · GW(p)

Ah, I see what you mean, and you're right; the utility function constructed will depend on how the data points are sampled. This isn't quite the same as saying that the result will depend on what results are actually available, though, unless knowledge about what results will be available is used to determine how to sample the data. This still seems like somewhat of a defect of inverse reinforcement learning, unless there ends up being a good case that some particular way of sampling the data is optimal for revealing underlying preferences and ignoring biases, or something like that.

Given that human desires are not independent (citation not needed)

That's probably true, but on the other hand, you seem to want to pin the deviations of human behavior from VNM rationality on violations of the independence axiom, and it isn't clear to me that this is the case (I don't think the point you were making relies on this, so if you weren't trying to make that claim then you can ignore this; it just seemed like you might be). There are situations where there are large framing effects (that is, whether A or B is preferred depends on how the options are presented, even if no other outcome C is being mixed in with them), and likely also violations of transitivity (where someone would say A>B, B>C, and C>A whenever you ask them about 2 of them without bringing up the third). It seems likely to me that most paradoxes of human decision-making have more to do with these than they do to violations of independence.

↑ comment by Houshalter · 2015-09-10T00:08:00.506Z · LW(p) · GW(p)

It can't be inconsistent within a world no matter how many decisions points there are. If we agree it's not inconsistent, then what are you arguing against?

I don't care about the VNM framework. As you said, it is designed to be optimal for decisions made in isolation. Because we don't need to make decisions in isolation, we don't need to be constrained by it.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-10T00:29:28.753Z · LW(p) · GW(p)

If we agree it's not inconsistent...

No. Inconsistency between different possible worlds is still inconsistency.

Because we don't need to make decisions in isolation, we don't need to be constrained by it.

The difference doesn't matter that much in practice. If there are multiple decision points, you can combine them into one by selecting a policy, or by considering them sequentially and using your beliefs about what your choices will be in the future to compute the expected utilities of the possible decisions available to you now. The reason that the VNM framework was designed for one-shot decisions is that it makes things simpler without actually constraining what it can be applied to.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-11T00:01:04.557Z · LW(p) · GW(p)

No. Inconsistency between different possible worlds is still inconsistency.

It's perfectly consistent in the sense that it's not money pumpable, and always makes the same decisions given the same information. It will make different decisions in different situations, given different information. But that is not inconsistent by an reasonable definition of "inconsistent".

The difference doesn't matter that much in practice.

It makes a huge difference. If you want to get the best median future, then you can't make decisions in isolation. You need to consider every possible decision you will have to make, and their probability. And choose a decision policy that selects the best median outcome.

Replies from: AlexMennen

↑ comment by AlexMennen · 2015-09-11T01:05:02.222Z · LW(p) · GW(p)

It's perfectly consistent in the sense that it's not money pumpable, and always makes the same decisions given the same information.

As in my previous example (sorry about it being difficult to follow, though I'm not sure yet what I could say to clarify things), it is inconsistent in the sense that it can lead you to pay for probability distributions over outcomes that you could have achieved for free.

You need to consider every possible decision you will have to make, and their probability.

Right. As I just said, "you can... consider them sequentially and use your beliefs about what your choices will be in the future to compute the expected utilities of the possible decisions available to you now." (edited to fix grammar). This reduces iterated decisions to isolated decisions: you have certain beliefs about what you'll do in the future, and now you just have to make a decision on the issue facing you now.

comment by PeterCoin · 2015-09-09T01:35:44.856Z · LW(p) · GW(p)

Median expected behavior is simple which makes it easy to calculate.

As an electrical engineer when I design circuits I start off by assuming that all my parts behave exactly as rated. If a resistor says it's 220+10% Ohms then I use 220 for my initial calculations. Assuming median behavior works wonderfully in telling me what my circuit probably will do.

In fact that's good enough info for me to base my design decision on for a lot of purposes (given a quick verification of functionality, of course).

But what about that 10%? What if it might matter? One thing I do is called worst case analysis https://en.wikipedia.org/wiki/Tolerance_analysis#Worst-case

This is the exact opposite of what you're proposing! I look for the cases where everything is off by the greatest amount possible and in the way that combines to form the worst possible outcome. If my circuit has 2 220+10% ohm resistors I'll consider the cases where both are 242ohms, both are 198ohms and even the bizarre cases where one is 198ohms and the other 242ohms. I do that because if I know my circuit will function under those circumstances, then only when the resistors are out of tolerance (and I can blame someone else) there's a problem.

In my view, average expected utility is the true metric. But there are circumstances where it's easier and cheaper to ignore the utility of anything other than the median case, and there are circumstances where it's easier and cheaper to ignore the utility of anything other than the worst cases.

Replies from: Houshalter

↑ comment by Houshalter · 2015-09-09T03:23:45.066Z · LW(p) · GW(p)

Worst case isn't a great metric either. E.g. you are required to pay the mugger, because it's the worst possible case. Average case doesn't solve it either, because the utility the mugger is promising is even greater than improbability he's right. Rare outliers can throw off the average case by a lot.

We need to invent some kind of policy to decide what actions to prefer, given a set of the utilities and probabilities of each possible outcome. Expected utility isn't good enough. Median utility isn't either. But there might be some compromise between them that gets what we want. Or a totally different algorithm altogether.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T09:22:24.746Z · LW(p) · GW(p)

That's why I find it interesting that mean and median converge in many cases of repeated choices.

comment by Larks · 2015-09-09T01:11:31.310Z · LW(p) · GW(p)

In finance we use medians a lot more than means.

Replies from: Lumifer

↑ comment by Lumifer · 2015-09-09T02:30:38.949Z · LW(p) · GW(p)

The rather important question is: For which purpose?

comment by entirelyuseless · 2015-09-08T17:27:14.786Z · LW(p) · GW(p)

"Assume that avoiding these choices has a trivial cost, incommensurable with dying (ie no matter how many times you have to buckle your seatbelt, it still better than a fatal accident)."

Suppose you had a choice: die in a plane crash, or listen to those plane safety announcements one million times. I choose dying in a plane crash.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2015-09-09T09:08:32.983Z · LW(p) · GW(p)

The incommensurability assumption is for illustration only, and is dropped later on.

Median utility rather than mean?

Contents

Why the median is like the mean

Why the median is not like the mean

For lack of a Cardinal...

86 comments