Coherent decisions imply consistent utilities
post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 20190512T21:33:57.982Z · LW · GW · 80 commentsContents
Introduction to the introduction: Why expected utility? Why not circular preferences? Human lives, mere dollars, and coherent trades Probabilities and expected utility summing to 1 book arguments probability Allais Paradox From the Allais Paradox to real life Conclusion Further reading None 80 comments
(Written for Arbital in 2017.)
Introduction to the introduction: Why expected utility?
So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'.
And before we even ask what those are, we might first ask, Why?
There's a mathematical formalism, 'expected utility', that some people invented to talk about making decisions. This formalism is very academically popular, and appears in all the textbooks.
But so what? Why is that necessarily the best way of making decisions under every kind of circumstance? Why would an Artificial Intelligence care what's academically popular? Maybe there's some better way of thinking about rational agency? Heck, why is this formalism popular in the first place?
We can ask the same kinds of questions about probability theory:
Okay, we have this mathematical formalism in which the chance that X happens, aka , plus the chance that X doesn't happen, aka , must be represented in a way that makes the two quantities sum to unity: .
That formalism for probability has some neat mathematical properties. But so what? Why should the best way of reasoning about a messy, uncertain world have neat properties? Why shouldn't an agent reason about 'how likely is that' using something completely unlike probabilities? How do you know a sufficiently advanced Artificial Intelligence would reason in probabilities? You haven't seen an AI, so what do you think you know and how do you think you know it?
That entirely reasonable question is what this introduction tries to answer. There are, indeed, excellent reasons beyond academic habit and mathematical convenience for why we would by default invoke 'expected utility' and 'probability theory' to think about good human decisions, talk about rational agency, or reason about sufficiently advanced AIs.
The broad form of the answer seems easier to show than to tell, so we'll just plunge straight in.
Why not circular preferences?
De gustibus non est disputandum, goes the proverb; matters of taste cannot be disputed. If I like onions on my pizza and you like pineapple, it's not that one of us is right and one of us is wrong. We just prefer different pizza toppings.
Well, but suppose I declare to you that I simultaneously:
 Prefer onions to pineapple on my pizza.
 Prefer pineapple to mushrooms on my pizza.
 Prefer mushrooms to onions on my pizza.
If we use to denote my pizza preferences, with denoting that I prefer X to Y, then I am declaring:
That sounds strange, to be sure. But is there anything wrong with that? Can we disputandum it?
We used the math symbol which denotes an ordering. If we ask whether can be an ordering, it naughtily violates the standard transitivity axiom .
Okay, so then maybe we shouldn't have used the symbol or called it an ordering. Why is that necessarily bad?
We can try to imagine each pizza as having a numerical score denoting how much I like it. In that case, there's no way we could assign consistent numbers to those three pizza toppings such that .
So maybe I don't assign numbers to my pizza. Why is that so awful?
Are there any grounds besides "we like a certain mathematical formalism and your choices don't fit into our math," on which criticize my three simultaneous preferences?
(Feel free to try to answer this yourself before continuing...)
Click here to reveal and continue:
Suppose I tell you that I prefer pineapple to mushrooms on my pizza. Suppose you're about to give me a slice of mushroom pizza; but by paying one penny () I can instead get a slice of pineapple pizza (which is just as fresh from the oven). It seems realistic to say that most people with a pineapple pizza preference would probably pay the penny, if they happened to have a penny in their pocket.¹
After I pay the penny, though, and just before I'm about to get the pineapple pizza, you offer me a slice of onion pizza instead—no charge for the change! If I was telling the truth about preferring onion pizza to pineapple, I should certainly accept the substitution if it's free.
And then to round out the day, you offer me a mushroom pizza instead of the onion pizza, and again, since I prefer mushrooms to onions, I accept the swap.
I end up with exactly the same slice of mushroom pizza I started with... and one penny poorer, because I previously paid $0.01 to swap mushrooms for pineapple.
This seems like a qualitatively bad behavior on my part. By virtue of my incoherent preferences which cannot be given a consistent ordering, I have shot myself in the foot, done something selfdefeating. We haven't said how I ought to sort out my inconsistent preferences. But no matter how it shakes out, it seems like there must be some better alternative—some better way I could reason that wouldn't spend a penny to go in circles. That is, I could at least have kept my original pizza slice and not spent the penny.
In a phrase you're going to keep hearing, I have executed a 'dominated strategy': there exists some other strategy that does strictly better.²
Or as Steve Omohundro put it: If you prefer being in Berkeley to being in San Francisco; prefer being in San Jose to being in Berkeley; and prefer being in San Francisco to being in San Jose; then you're going to waste a lot of time on taxi rides.
None of this reasoning has told us that a nonselfdefeating agent must prefer Berkeley to San Francisco or vice versa. There are at least six possible consistent orderings over pizza toppings, like etcetera, and any consistent ordering would avoid paying to go in circles.³ We have not, in this argument, used pure logic to derive that pineapple pizza must taste better than mushroom pizza to an ideal rational agent. But we've seen that eliminating a certain kind of shootyourselfinthefoot behavior, corresponds to imposing a certain coherence or consistency requirement on whatever preferences are there.
It turns out that this is just one instance of a large family of coherence theorems which all end up pointing at the same set of core properties. All roads lead to Rome, and all the roads say, "If you are not shooting yourself in the foot in sense X, we can view you as having coherence property Y."
There are some caveats to this general idea.
For example: In complicated problems, perfect coherence is usually impossible to compute—it's just too expensive to consider all the possibilities.
But there are also caveats to the caveats! For example, it may be that if there's a powerful machine intelligence that is not visibly to us humans shooting itself in the foot in way X, then from our perspective it must look like the AI has coherence property Y. If there's some sense in which the machine intelligence is going in circles, because not going in circles is too hard to compute, well, we won't see that either with our tiny human brains. In which case it may make sense, from our perspective, to think about the machine intelligence as if it has some coherent preference ordering.
We are not going to go through all the coherence theorems in this introduction. They form a very large family; some of them are a lot more mathematically intimidating; and honestly I don't know even 5% of the variants.
But we can hopefully walk through enough coherence theorems to at least start to see the reasoning behind, "Why expected utility?" And, because the two are a package deal, "Why probability?"
Human lives, mere dollars, and coherent trades
An experiment in 2000—from a paper titled "The Psychology of the Unthinkable: Taboo TradeOffs, Forbidden Base Rates, and Heretical Counterfactuals"—asked subjects to consider the dilemma of a hospital administrator named Robert:
Robert can save the life of Johnny, a five year old who needs a liver transplant, but the transplant procedure will cost the hospital $1,000,000 that could be spent in other ways, such as purchasing better equipment and enhancing salaries to recruit talented doctors to the hospital. Johnny is very ill and has been on the waiting list for a transplant but because of the shortage of local organ donors, obtaining a liver will be expensive. Robert could save Johnny's life, or he could use the $1,000,000 for other hospital needs.
The main experimental result was that most subjects got angry at Robert for even considering the question.
After all, you can't put a dollar value on a human life, right?
But better hospital equipment also saves lives, or at least one hopes so.⁴ It's not like the other potential use of the money saves zero lives.
Let's say that Robert has a total budget of $100,000,000 and is faced with a long list of options such as these:
 $100,000 for a new dialysis machine, which will save 3 lives
 $1,000,000 for a liver for Johnny, which will save 1 life
 $10,000 to train the nurses on proper hygiene when inserting central lines, which will save an expected 100 lives
 ...
Now suppose—this is a supposition we'll need for our theorem—that Robert does not care at all about money, not even a tiny bit. Robert only cares about maximizing the total number of lives saved. Furthermore, we suppose for now that Robert cares about every human life equally.
If Robert does save as many lives as possible, given his bounded money, then Robert must behave like somebody assigning some consistent dollar value to saving a human life.
We should be able to look down the long list of options that Robert took and didn't take, and say, e.g., "Oh, Robert took all the options that saved more than 1 life per $500,000 and rejected all options that saved less than 1 life per $500,000; so Robert's behavior is consistent with his spending $500,000 per life."
Alternatively, if we can't view Robert's behavior as being coherent in this sense—if we cannot make up any dollar value of a human life, such that Robert's choices are consistent with that dollar value—then it must be possible to move around the same amount of money, in a way that saves more lives.
We start from the qualitative criterion, "Robert must save as many lives as possible; it shouldn't be possible to move around the same money to save more lives." We end up with the quantitative coherence theorem, "It must be possible to view Robert as trading dollars for lives at a consistent price."
We haven't proven that dollars have some intrinsic worth that trades off against the intrinsic worth of a human life. By hypothesis, Robert doesn't care about money at all. It's just that every dollar has an opportunity cost in lives it could have saved if deployed differently; and this opportunity cost is the same for every dollar because money is fungible.
An important caveat to this theorem is that there may be, e.g., an option that saves a hundred thousand lives for $200,000,000. But Robert only has $100,000,000 to spend. In this case, Robert may fail to take that option even though it saves 1 life per $2,000. It was a good option, but Robert didn't have enough money in the bank to afford it. This does mess up the elegance of being able to say, "Robert must have taken all the options saving at least 1 life per $500,000", and instead we can only say this with respect to options that are in some sense small enough or granular enough.
Similarly, if an option costs $5,000,000 to save 15 lives, but Robert only has $4,000,000 left over after taking all his other best opportunities, Robert's last selected option might be to save 8 lives for $4,000,000 instead. This again messes up the elegance of the reasoning, but Robert is still doing exactly what an agent would do if it consistently valued lives at 1 life per $500,000—it would buy all the best options it could afford that purchased at least that many lives per dollar. So that part of the theorem's conclusion still holds.
Another caveat is that we haven't proven that there's some specific dollar value in Robert's head, as a matter of psychology. We've only proven that Robert's outward behavior can be viewed as if it prices lives at some consistent value, assuming Robert saves as many lives as possible.
It could be that Robert accepts every option that spends less than $500,000/life and rejects every option that spends over $600,000, and there aren't any available options in the middle. Then Robert's behavior can equally be viewed as consistent with a price of $510,000 or a price of $590,000. This helps show that we haven't proven anything about Robert explicitly thinking of some number. Maybe Robert never lets himself think of a specific threshold value, because it would be taboo to assign a dollar value to human life; and instead Robert just fiddles the choices until he can't see how to save any more lives.
We naturally have not proved by pure logic that Robert must want, in the first place, to save as many lives as possible. Even if Robert is a good person, this doesn't follow. Maybe Robert values a 10yearold's life at 5 times the value of a 70yearold's life, so that Robert will sacrifice five grandparents to save one 10yearold. A lot of people would see that as entirely consistent with valuing human life in general.
Let's consider that last idea more thoroughly. If Robert considers a preteen equally valuable with 5 grandparents, so that Robert will shift $100,000 from saving 8 old people to saving 2 children, then we can no longer say that Robert wants to save as many 'lives' as possible. That last decision would decrease by 6 the total number of 'lives' saved. So we can no longer say that there's a qualitative criterion, 'Save as many lives as possible', that produces the quantitative coherence requirement, 'trade dollars for lives at a consistent rate'.
Does this mean that coherence might as well go out the window, so far as Robert's behavior is concerned? Anything goes, now? Just spend money wherever?
"Hm," you might think. "But... if Robert trades 8 old people for 2 children here... and then trades 1 child for 2 old people there..."
To reduce distraction, let's make this problem be about apples and oranges instead. Suppose:
 Alice starts with 8 apples and 1 orange.
 Then Alice trades 8 apples for 2 oranges.
 Then Alice trades away 1 orange for 2 apples.
 Finally, Alice trades another orange for 3 apples.
Then in this example, Alice is using a strategy that's strictly dominated across all categories of fruit. Alice ends up with 5 apples and one orange, but could've ended with 8 apples and one orange (by not making any trades at all). Regardless of the relative value of apples and oranges, Alice's strategy is doing qualitatively worse than another possible strategy, if apples have any positive value to her at all.
So the fact that Alice can't be viewed as having any coherent relative value for apples and oranges, corresponds to her ending up with qualitatively less of some category of fruit (without any corresponding gains elsewhere).
This remains true if we introduce more kinds of fruit into the problem. Let's say the set of fruits Alice can trade includes {apples, oranges, strawberries, plums}. If we can't look at Alice's trades and make up some relative quantitative values of fruit, such that Alice could be trading consistently with respect to those values, then Alice's trading strategy must have been dominated by some other strategy that would have ended up with strictly more fruit across all categories.
In other words, we need to be able to look at Alice's trades, and say something like:
"Maybe Alice values an orange at 2 apples, a strawberry at 0.1 apples, and a plum at 0.5 apples. That would explain why Alice was willing to trade 4 strawberries for a plum, but not willing to trade 40 strawberries for an orange and an apple."
And if we can't say this, then there must be some way to rearrange Alice's trades and get strictly more fruit across all categories in the sense that, e.g., we end with the same number of plums and apples, but one more orange and two more strawberries. This is a bad thing if Alice qualitatively values fruit from each category—prefers having more fruit to less fruit, ceteris paribus, for each category of fruit.
Now let's shift our attention back to Robert the hospital administrator. Either we can view Robert as consistently assigning some relative value of life for 10yearolds vs. 70yearolds, or there must be a way to rearrange Robert's expenditures to save either strictly more 10yearolds or strictly more 70yearolds. The same logic applies if we add 50yearolds to the mix. We must be able to say something like, "Robert is consistently behaving as if a 50yearold is worth a third of a tenyearold". If we can't say that, Robert must be behaving in a way that pointlessly discards some saveable lives in some category.
Or perhaps Robert is behaving in a way which implies that 10yearold girls are worth more than 10yearold boys. But then the relative values of those subclasses 10yearolds need to be viewable as consistent; or else Robert must be qualitatively failing to save one more 10yearold boy than could've been saved otherwise.
If you can denominate apples in oranges, and price oranges in plums, and trade off plums for strawberries, all at consistent rates... then you might as well take it one step further, and factor out an abstract unit for ease of notation.
Let's call this unit 1 utilon, and denote it €1. (As we'll see later, the letters 'EU' are appropriate here.)
If we say that apples are worth €1, oranges are worth €2, and plums are worth €0.5, then this tells us the relative value of apples, oranges, and plums. Conversely, if we can assign consistent relative values to apples, oranges, and plums, then we can factor out an abstract unit at will—for example, by arbitrarily declaring apples to be worth €100 and then calculating everything else's price in apples.
Have we proven by pure logic that all apples have the same utility? Of course not; you can prefer some particular apples to other particular apples. But when you're done saying which things you qualitatively prefer to which other things, if you go around making tradeoffs in way that can be viewed as not qualitatively leaving behind some things you said you wanted, we can view you as assigning coherent quantitative utilities to everything you want.
And that's one coherence theorem—among others—that can be seen as motivating the concept of utility in decision theory.
Utility isn't a solid thing, a separate thing. We could multiply all the utilities by two, and that would correspond to the same outward behaviors. It's meaningless to ask how much utility you scored at the end of your life, because we could subtract a million or add a million to that quantity while leaving everything else conceptually the same.
You could pick anything you valued—say, the joy of watching a cat chase a laser pointer for 10 seconds—and denominate everything relative to that, without needing any concept of an extra abstract 'utility'. So (just to be extremely clear about this point) we have not proven that there is a separate thing 'utility' that you should be pursuing instead of everything else you wanted in life.
The coherence theorem says nothing about which things to value more than others, or how much to value them relative to other things. It doesn't say whether you should value your happiness more than someone else's happiness, any more than the notion of a consistent preference ordering tells us whether .
(The notion that we should assign equal value to all human lives, or equal value to all sentient lives, or equal value to all QualityAdjusted Life Years, is utilitarianism. Which is, sorry about the confusion, a whole 'nother separate different philosophy.)
The conceptual gizmo that maps thingies to utilities—the whatchamacallit that takes in a fruit and spits out a utility—is called a 'utility function'. Again, this isn't a separate thing that's written on a stone tablet. If we multiply a utility function by 9.2, that's conceptually the same utility function because it's consistent with the same set of behaviors.
But in general: If we can sensibly view any agent as doing as well as qualitatively possible at anything, we must be able to view the agent's behavior as consistent with there being some coherent relative quantities of wantedness for all the thingies it's trying to optimize.
Probabilities and expected utility
We've so far made no mention of probability. But the way that probabilities and utilities interact, is where we start to see the full structure of expected utility spotlighted by all the coherence theorems.
The basic notion in expected utility is that some choices present us with uncertain outcomes.
For example, I come to you and say: "Give me 1 apple, and I'll flip a coin; if the coin lands heads, I'll give you 1 orange; if the coin comes up tails, I'll give you 3 plums." Suppose you relatively value fruits as described earlier: 2 apples / orange and 0.5 apples / plum. Then either possible outcome gives you something that's worth more to you than 1 apple. Turning down a socalled 'gamble' like that... why, it'd be a dominated strategy.
In general, the notion of 'expected utility' says that we assign certain quantities called probabilities to each possible outcome. In the example above, we might assign a 'probability' of to the coin landing heads (1 orange), and a 'probability' of to the coin landing tails (3 plums). Then the total value of the 'gamble' we get by trading away 1 apple is:
Conversely, if we just keep our 1 apple instead of making the trade, this has an expected utilty of . So indeed we ought to trade (as the previous reasoning suggested).
"But wait!" you cry. "Where did these probabilities come from? Why is the 'probability' of a fair coin landing heads and not, say, or ? Who says we ought to multiply utilities by probabilities in the first place?"
If you're used to approaching this problem from a Bayesian standpoint, then you may now be thinking of notions like prior probability and Occam's Razor and universal priors...
But from the standpoint of coherence theorems, that's putting the cart before the horse.
From the standpoint of coherence theorems, we don't start with a notion of 'probability'.
Instead we ought to prove something along the lines of: if you're not using qualitatively dominated strategies, then you must behave as if you are multiplying utilities by certain quantitative thingies.
We might then furthermore show that, for nondominated strategies, these utilitymultiplying thingies must be between and rather than say or .
Having determined what coherence properties these utilitymultiplying thingies need to have, we decide to call them 'probabilities'. And then—once we know in the first place that we need 'probabilities' in order to not be using dominated strategies—we can start to worry about exactly what the numbers ought to be.
Probabilities summing to 1
Here's a taste of the kind of reasoning we might do:
Suppose that—having already accepted some previous proof that nondominated strategies dealing with uncertain outcomes, must multiply utilities by quantitative thingies—you then say that you are going to assign a probability of to the coin coming up heads, and a probability of to the coin coming up tails.
If you're already used to the standard notion of probability, you might object, "But those probabilities sum to when they ought to sum to !"⁵ But now we are in coherenceland; we don't ask "Did we violate the standard axioms that all the textbooks use?" but "What rules must nondominated strategies obey?" De gustibus non est disputandum; can we disputandum somebody saying that a coin has a 60% probability of coming up heads and a 70% probability of coming up tails? (Where these are the only 2 possible outcomes of an uncertain coinflip.)
Well—assuming you've already accepted that we need utilitymultiplying thingies—I might then offer you a gamble. How about you give me one apple, and if the coin lands heads, I'll give you 0.8 apples; while if the coin lands tails, I'll give you 0.8 apples.
According to you, the expected utility of this gamble is:
You've just decided to trade your apple for 0.8 apples, which sure sounds like one of 'em dominated strategies.
And that's why the thingies you multiply probabilities by—the thingies that you use to weight uncertain outcomes in your imagination, when you're trying to decide how much you want one branch of an uncertain choice—must sum to 1, whether you call them 'probabilities' or not.
Well... actually we just argued⁶ that probabilities for mutually exclusive outcomes should sum to no more than 1. What would be an example showing that, for nondominated strategies, the probabilities for exhaustive outcomes should sum to no less than 1?
Why exhaustive outcomes should sum to at least 1:
Suppose that, in exchange for 1 apple, I credibly offer:
* To pay you 1.1 apples if a coin comes up heads.
* To pay you 1.1 apples if a coin comes up tails.
* To pay you 1.1 apples if anything else happens.
If the probabilities you assign to these three outcomes sum to say 0.9, you will refuse to trade 1 apple for 1.1 apples.
(This is strictly dominated by the strategy of agreeing to trade 1 apple for 1.1 apples.)
Dutch book arguments
Another way we could have presented essentially the same argument as above, is as follows:
Suppose you are a marketmaker in a prediction market for some event . When you say that your price for event is , you mean that you will sell for a ticket which pays if happens (and pays out nothing otherwise). In fact, you will sell any number of such tickets!
Since you are a marketmaker (that is, you are trying to encourage trading in for whatever reason), you are also willing to buy any number of tickets at the price . That is, I can say to you (the marketmaker) "I'd like to sign a contract where you give me now, and in return I must pay you iff happens;" and you'll agree. (We can view this as you selling me a negative number of the original kind of ticket.)
Let and denote two events such that exactly one of them must happen; say, is a coin landing heads and is the coin not landing heads.
Now suppose that you, as a marketmaker, are motivated to avoid combinations of bets that lead into certain losses for you—not just losses that are merely probable, but combinations of bets such that every possibility leads to a loss.
Then if exactly one of and must happen, your prices and must sum to exactly . Because:
 If , I buy both an ticket and a ticket and get a guaranteed payout of minus costs of . Since this is a guaranteed profit for me, it is a guaranteed loss for you.
 If , I sell you both tickets and will at the end pay you after you have already paid me . Again, this is a guaranteed profit for me of .
This is more or less exactly the same argument as in the previous section, with trading apples. Except that: (a) the scenario is more crisp, so it is easier to generalize and scale up much more complicated similar arguments; and (b) it introduces a whole lot of assumptions that people new to expected utility would probably find rather questionable.
"What?" one might cry. "What sort of crazy bookie would buy and sell bets at exactly the same price? Why ought anyone to buy and sell bets at exactly the same price? Who says that I must value a gain of $1 exactly the opposite of a loss of $1? Why should the price that I put on a bet represent my degree of uncertainty about the environment? What does all of this argument about gambling have to do with real life?"
So again, the key idea is not that we are assuming anything about people valuing every realworld dollar the same; nor is it in real life a good idea to offer to buy or sell bets at the same prices.⁷ Rather, Dutch book arguments can stand in as shorthand for some longer story in which we only assume that you prefer more apples to less apples.
The Dutch book argument above has to be seen as one more added piece in the company of all the other coherence theorems—for example, the coherence theorems suggesting that you ought to be quantitatively weighing events in your mind in the first place.
Conditional probability
With more complicated Dutch book arguments, we can derive more complicated ideas such as 'conditional probability'.
Let's say that we're pricing three kinds of gambles over two events and :
 A ticket that costs , and pays if happens.
 A ticket that doesn't cost anything or pay anything if doesn't happen (the ticket price is refunded); and if does happen, this ticket costs , then pays if happens.
 A ticket that costs , and pays if and both happen.
Intuitively, the idea of conditional probability is that the probability of and both happening, should be equal to the probability of happening, times the probability that happens assuming that happens:
To exhibit a Dutch book argument for this rule, we want to start from the assumption of a qualitatively nondominated strategy, and derive the quantitative rule .
So let's give an example that violates this equation and see if there's a way to make a guaranteed profit. Let's say somebody:
 Prices at the first ticket, aka .
 Prices at the second ticket, aka .
 Prices at the third ticket, aka , which ought to be assuming the first two prices.
The first two tickets are priced relatively high, compared to the third ticket which is priced relatively low, suggesting that we ought to sell the first two tickets and buy the third.
Okay, let's ask what happens if we sell 10 of the first ticket, sell 10 of the second ticket, and buy 10 of the third ticket.
 If doesn't happen, we get $6, and pay $2. Net +$4.
 If happens and doesn't happen, we get $6, pay $10, get $7, and pay $2. Net +$1.
 If happens and happens, we get $6, pay $10, get $7, pay $10, pay $2, and get $10. Net: +$1.
That is: we can get a guaranteed positive profit over all three possible outcomes.
More generally, let be the (potentially negative) amount of each ticket that is being bought (buying a negative amount is selling). Then the prices can be combined into a 'Dutch book' whenever the following three inequalities can be simultaneously true, with at least one inequality strict:
For this is impossible exactly iff . The proof via a bunch of algebra is left as an exercise to the reader.⁸
The Allais Paradox
By now, you'd probably like to see a glimpse of the sort of argument that shows in the first place that we need expected utility—that a nondominated strategy for uncertain choice must behave as if multiplying utilities by some kinda utilitymultiplying thingies ('probabilities').
As far as I understand it, the real argument you're looking for is Abraham Wald's complete class theorem, which I must confess I don't know how to reduce to a simple demonstration.
But we can catch a glimpse of the general idea from a famous psychology experiment that became known as the Allais Paradox (in slightly adapted form).
Suppose you ask some experimental subjects which of these gambles they would rather play:
 1A: A certainty of $1,000,000.
 1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.
Most subjects say they'd prefer 1A to 1B.
Now ask a separate group of subjects which of these gambles they'd prefer:
 2A: 50% chance of winning $1,000,000; 50% chance of winning $0.
 2B: 45% chance of winning $5,000,000; 55% chance of winning $0.
In this case, most subjects say they'd prefer gamble 2B.
Note that the $ sign here denotes real dollars, not utilities! A gain of five million dollars isn't, and shouldn't be, worth exactly five times as much to you as a gain of one million dollars. We can use the € symbol to denote the expected utilities that are abstracted from how much you relatively value different outcomes; $ is just money.
So we certainly aren't claiming that the first preference is paradoxical because 1B has an expected dollar value of $4.5 million and 1A has an expected dollar value of $1 million. That would be silly. We care about expected utilities, not expected dollar values, and those two concepts aren't the same at all!
Nonetheless, the combined preferences 1A > 1B and 2A < 2B are not compatible with any coherent utility function. We cannot simultaneously have:
This was one of the earliest experiments seeming to demonstrate that actual human beings were not expected utility maximizers—a very tame idea nowadays, to be sure, but the first definite demonstration of that was a big deal at the time. Hence the term, "Allais Paradox".
Now, by the general idea behind coherence theorems, since we can't view this behavior as corresponding to expected utilities, we ought to be able to show that it corresponds to a dominated strategy somehow—derive some way in which this behavior corresponds to shooting off your own foot.
In this case, the relevant idea seems nonobvious enough that it doesn't seem reasonable to demand that you think of it on your own; but if you like, you can pause and try to think of it anyway. Otherwise, just continue reading.
Again, the gambles are as follows:
 1A: A certainty of $1,000,000.
 1B: 90% chance of winning $5,000,000, 10% chance of winning nothing.
 2A: 50% chance of winning $1,000,000; 50% chance of winning $0.
 2B: 45% chance of winning $5,000,000; 55% chance of winning $0.
Now observe that Scenario 2 corresponds to a 50% chance of playing Scenario 1, and otherwise getting $0.
This, in fact, is why the combination 1A > 1B; 2A < 2B is incompatible with expected utility. In terms of one set of axioms frequently used to describe expected utility, it violates the Independence Axiom: if a gamble is preferred to (that is, ), then we ought to be able to take a constant probability and another gamble and have .
To put it another way, if I flip a coin to decide whether or not to play some entirely different game , but otherwise let you choose or , you ought to make the same choice as if I just ask you whether you prefer or . Your preference between and should be 'independent' of the possibility that, instead of doing anything whatsoever with or , we will do something else instead.
And since this is an axiom of expected utility, any violation of that axiom ought to correspond to a dominated strategy somehow.
In the case of the Allais Paradox, we do the following:
First, I show you a switch that can be set to A or B, currently set to A.
In one minute, I tell you, I will flip a coin. If the coin comes up heads, you will get nothing. If the coin comes up tails, you will play the gamble from Scenario 1.
From your current perspective, that is, we are playing Scenario 2: since the switch is set to A, you have a 50% chance of getting nothing and a 50% chance of getting $1 million.
I ask you if you'd like to pay a penny to throw the switch from A to B. Since you prefer gamble 2B to 2A, and some quite large amounts of money are at stake, you agree to pay the penny. From your perspective, you now have a 55% chance of ending up with nothing and a 45% chance of getting $5M.
I then flip the coin, and luckily for you, it comes up tails.
From your perspective, you are now in Scenario 1B. Having observed the coin and updated on its state, you now think you have a 90% chance of getting $5 million and a 10% chance of getting nothing. By hypothesis, you would prefer a certainty of $1 million.
So I offer you a chance to pay another penny to flip the switch back from B to A. And with so much money at stake, you agree.
I have taken your two cents on the subject.
That is: You paid a penny to flip a switch and then paid another penny to switch it back, and this is dominated by the strategy of just leaving the switch set to A.
And that's at least a glimpse of why, if you're not using dominated strategies, the thing you do with relative utilities is multiply them by probabilities in a consistent way, and prefer the choice that leads to a greater expectation of the variable representing utility.
From the Allais Paradox to real life
The reallife lesson about what to do when faced with Allais's dilemma might be something like this:
There's some amount that $1 million would improve your life compared to $0.
There's some amount that an additional $4 million would further improve your life after the first $1 million.
You ought to visualize these two improvements as best you can, and decide whether another $4 million can produce at least oneninth as much improvement, as much true value to you, as the first $1 million.
If it can, you should consistently prefer 1B > 1A; 2B > 2A. And if not, you should consistently prefer 1A > 1B; 2A > 2B.
The standard 'paradoxical' preferences in Allais's experiment are standardly attributed to a certainty effect: people value the certainty of having $1 million, while the difference between a 50% probability and a 55% probability looms less large. (And this ties in to a number of other results about certainty, need for closure, prospect theory, and so on.)
It may sound intuitive, in an Allaislike scenario, to say that you ought to derive some value from being certain about the outcome. In fact this is just the reasoning the experiment shows people to be using, so of course it might sound intuitive. But that does, inescapably, correspond to a kind of thinking that produces dominated strategies.
One possible excuse might be that certainty is valuable if you need to make plans about the future; knowing the exact future lets you make better plans. This is admittedly true and a phenomenon within expected utility, though it applies in a smooth way as confidence increases rather than jumping suddenly around 100%. But in the particular dilemma as described here, you only have 1 minute before the game is played, and no time to make other major life choices dependent on the outcome.
Another possible excuse for certainty bias might be to say: "Well, I value the emotional feeling of certainty."
In real life, we do have emotions that are directly about probabilities, and those little flashes of happiness or sadness are worth something if you care about people being happy or sad. If you say that you value the emotional feeling of being certain of getting $1 million, the freedom from the fear of getting $0, for the minute that the dilemma lasts and you are experiencing the emotion—well, that may just be a fact about what you value, even if it exists outside the expected utility formalism.
And this genuinely does not fit into the expected utility formalism. In an expected utility agent, probabilities are just thingiesyoumultiplyutilitiesby. If those thingies start generating their own utilities once represented inside the mind of the person who is an object of ethical value, you really are going to get results that are incompatible with the formal decision theory.
However, not being viewable as an expected utility agent does always correspond to employing dominated strategies. You are giving up something in exchange, if you pursue that feeling of certainty. You are potentially losing all the real value you could have gained from another $4 million, if that realized future actually would have gained you more than oneninth the value of the first $1 million. Is a fleeting emotional sense of certainty over 1 minute, worth automatically discarding the potential $5million outcome? Even if the correct answer given your values is that you properly ought to take the $1 million, treasuring 1 minute of emotional gratification doesn't seem like the wise reason to do that. The wise reason would be if the first $1 million really was worth that much more than the next $4 million.
The danger of saying, "Oh, well, I attach a lot of utility to that comfortable feeling of certainty, so my choices are coherent after all" is not that it's mathematically improper to value the emotions we feel while we're deciding. Rather, by saying that the most valuable stakes are the emotions you feel during the minute you make the decision, what you're saying is, "I get a huge amount of value by making decisions however humans instinctively make their decisions, and that's much more important than the thing I'm making a decision about." This could well be true for something like buying a stuffed animal. If millions of dollars or human lives are at stake, maybe not so much.
Conclusion
The demonstrations we've walked through here aren't the professionalgrade coherence theorems as they appear in real math. Those have names like "Cox's Theorem" or "the complete class theorem"; their proofs are difficult; and they say things like "If seeing piece of information A followed by piece of information B leads you into the same epistemic state as seeing piece of information B followed by piece of information A, plus some other assumptions, I can show an isomorphism between those epistemic states and classical probabilities" or "Any decision rule for taking different actions depending on your observations either corresponds to Bayesian updating given some prior, or else is strictly dominated by some Bayesian strategy".
But hopefully you've seen enough concrete demonstrations to get a general idea of what's going on with the actual coherence theorems. We have multiple spotlights all shining on the same core mathematical structure, saying dozens of different variants on, "If you aren't running around in circles or stepping on your own feet or wantonly giving up things you say you want, we can see your behavior as corresponding to this shape. Conversely, if we can't see your behavior as corresponding to this shape, you must be visibly shooting yourself in the foot." Expected utility is the only structure that has this great big family of discovered theorems all saying that. It has a scattering of academic competitors, because academia is academia, but the competitors don't have anything like that mass of spotlights all pointing in the same direction.
So if we need to pick an interim answer for "What kind of quantitative framework should I try to put around my own decisionmaking, when I'm trying to check if my thoughts make sense?" or "By default and barring special cases, what properties might a sufficiently advanced machine intelligence look to us like it possessed, at least approximately, if we couldn't see it visibly running around in circles?", then there's pretty much one obvious candidate: Probabilities, utility functions, and expected utility.
Further reading
 To learn more about agents and AI: Consequentialist cognition; the orthogonality of agents' utility functions and capabilities; epistemic and instrumental efficiency; instrumental strategies sufficiently capable agents tend to converge on; properties of sufficiently advanced agents.
 To learn more about decision theory: The controversial counterfactual at the heart of the expected utility formula.
¹ It could be that somebody's pizza preference is real, but so weak that they wouldn't pay one penny to get the pizza they prefer. In this case, imagine we're talking about some stronger preference instead. Like your willingness to pay at least one penny not to have your house burned down, or something.
² This does assume that the agent prefers to have more money rather than less money. "Ah, but why is it bad if one person has a penny instead of another?" you ask. If we insist on pinning down every point of this sort, then you can also imagine the $0.01 as standing in for the time I burned in order to move the pizza slices around in circles. That time was burned, and nobody else has it now. If I'm an effective agent that goes around pursuing my preferences, I should in general be able to sometimes convert time into other things that I want. In other words, my circular preference can lead me to incur an opportunity cost denominated in the sacrifice of other things I want, and not in a way that benefits anyone else.
³ There are more than six possibilities if you think it's possible to be absolutely indifferent between two kinds of pizza.
⁴ We can omit the 'better doctors' item from consideration: The supply of doctors is mostly constrained by regulatory burdens and medical schools rather than the number of people who want to become doctors; so bidding up salaries for doctors doesn't much increase the total number of doctors; so bidding on a talented doctor at one hospital just means some other hospital doesn't get that talented doctor. It's also illegal to pay for livers, but let's ignore that particular issue with the problem setup or pretend that it all takes place in a more sensible country than the United States or Europe.
⁵ Or maybe a tiny bit less than , in case the coin lands on its edge or something.
⁶ Nothing we're walking through here is really a coherence theorem per se, more like intuitive arguments that a coherence theorem ought to exist. Theorems require proofs, and nothing here is what real mathematicians would consider to be a 'proof'.
⁷ In real life this leads to a problem of 'adversarial selection', where somebody who knows more about the environment than you can decide whether to buy or sell from you. To put it another way, from a Bayesian standpoint, if an intelligent counterparty is deciding whether to buy or sell from you a bet on , the fact that they choose to buy (or sell) should cause you to update in favor (or against) actually happening. After all, they wouldn't be taking the bet unless they thought they knew something you didn't!
⁸ The quick but advanced argument would be to say that the lefthandside must look like a singular matrix, whose determinant must therefore be zero.
80 comments
Comments sorted by top scores.
comment by johnswentworth · 20201216T19:18:56.558Z · LW(p) · GW(p)
Things To Take Away From The Essay
First and foremost: Yudkowsky makes absolutely no mention whatsoever of the VNM utility theorem. This is neither an oversight nor a simplification. The VNM utility theorem is not the primary coherence theorem. It's debatable whether it should be considered a coherence theorem at all.
Far and away the most common mistake when arguing about coherence (at least among a technicallyeducated audience) is for people who've only heard of VNM to think they know what the debate is about. Looking at the topvoted comments on this essay:
 the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model
 the second argues that two of the VNM axioms are unrealistic
I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different.
So what are the primary coherence theorems, and how do they differ from VNM? Yudkowsky mentions the complete class theorem in the post, Savage's theorem comes up in the comments, and there are variations on these two and probably others as well. Roughly, the general claim these theorems make is that any system either (a) acts like an expected utility maximizer under some probabilistic model, or (b) throws away resources in a paretosuboptimal manner. One thing to emphasize: these theorems generally do not assume any preexisting probabilities (as VNM does); an agent's implied probabilities are instead derived. Yudkowsky's essay does a good job communicating these concepts, but doesn't emphasize that this is different from VNM.
One more common misconception which this essay quietly addresses: the idea that every system can be interpreted as an expected utility maximizer. This is technically true, in the sense that we can always pick a utility function which is maximized under whatever outcome actually occurred. And yet... Yudkowsky gives multiple examples in which the system is not a utility maximizer. What's going on here?
The coherence theorems implicitly put some stronger constraints on how we're allowed to "interpret" systems as utility maximizers. They assume the existence of some resources, and talk about systems which are paretooptimal with respect to those resources  e.g. systems which "don't throw away money". Implicitly, we're assuming that the system generally "wants" more resources, and we derive the system's "preferences" over everything else (including things which are not resources) from that. The agent "prefers" X over Y if it expends resources to get from X to Y. If the agent reaches a worldstate which it could have reached with strictly less resource expenditure in all possible worlds, then it's not an expected utility maximizer  it "threw away money" unnecessarily.
(Side note: as in Yudkowsky's hospitaladministrator example, we need not assume that the agent "wants" more resources as a terminal goal; the agent may only want more resources in order to exchange them for something else. The theorems still basically work, so long as resources can be spent for something the agent "wants".)
Of course, we can very often find things which work like "resources" for purposes of the theorems even when they're not bakedin to the problem. For instance, in thermodynamics, energy and momentum work like resources, and we could use the coherence theorems to talk about systems which don't throw away energy and/or momentum in a paretosuboptimal manner. Biological cells are a good example: presumably they make efficient use of energy, as well as other metabolic resources, therefore we should expect the coherence theorems to apply.
Some Problems With (Known) Coherence Theorems
Financial markets are the urexample of inexploitability and pareto efficiency (in the same sense as the coherence theorems). They generally do not throw away resources in a paretosuboptimal manner, and this can be proven for idealized mathematical markets. And yet, it turns out that even an idealized market is not equivalent to an expected utility maximizer, in general. (Economists call this "nonexistence of a representative agent".) That's a pretty big red flag.
The problem, in this case, is that the coherence theorems implicitly assume that the system has no internal state (or at least no relevant internal state). Once we allow internal state, subagents matter  see the essay "Why Subagents? [LW · GW]" for more on that.
Another pretty big red flag: real systems can sometimes "change their mind" for no outwardlyapparent reason, yet still be pareto efficient. A good example here is a bookie with a side channel: when the bookie gets new information, the odds update, even though "from the outside" there's no apparent reason why the odds are changing  the outside environment doesn't have access to the side channel. The coherence theorems discussed here don't handle such side channels. Abram has talked about more general versions of this issue (including logical uncertainty connections) in his essays on Radical Probabilism [LW · GW].
An even more general issue, which Abram also discusses in his Radical Probabilism essays: while the coherence theorems make a decent argument for probabilistic beliefs and expected utility maximization at any one point in time, the coherence arguments for how to update are much weaker than the other arguments. Yudkowsky talks about conditional probability in terms of conditional bets  i.e. bets which only pay out when a condition triggers. That's fine, and the coherence arguments work for that usecase. The problem is, it's not clear that an agent's beliefupdate when new information comes in must be equivalent to these conditional bets.
Finally, there's the assumption that "resources" exist, and that we can use tradeoffs with those resources in order to work out implied preferences over everything else. I think instrumental convergence provides a strong argument that this will be the case, at least for the sorts of "agents" we actually care about (i.e. agents which have significant impact on the world). However, that's not an argument which is baked into the coherence theorems themselves, and there's some highly nontrivial steps to make the argument.
SideNote: Probability Without Utility
At this point, it's worth noting that there are foundations for probability which do not involve utility or decision theory at all, and I consider these foundations much stronger than the coherence theorems. Frequentism is the obvious example. Another prominent example is information theory and the minimum description length foundation of probability theory.
The most fundamental foundation I know of is Cox' theorem, which is more of a metafoundation explaining why the same laws of probability drop out of so many different assumptions (e.g. frequencies, bets, minimum description length, etc).
However, these foundations do not say anything at all about agents or utilities or expected utility maximization. They only talk about probabilities.
Towards A Better Coherence Theorem
As I see it, the real justification for expected utility maximization is not any particular coherence theorem, but rather the fact that there's a wide variety of coherence theorems (and some other kinds of theorems, and empirical results) which all seem to point in a similar direction. When that sort of thing happens, it's a pretty strong clue that there's something fundamental going on. I think the "real" coherence theorem has yet to be discovered.
What features would such a theorem have?
Following the "Why Subagents?" argument, it would probably prove that a system is equivalent to a market of expected utility maximizers rather than a single expected utility maximizer. It would handle sidechannels. It would derive the notion of an "update" on incoming information.
As a starting point in searching for such a theorem, probably the most important hint is that "resources" should be a derived notion rather than a fundamental one. My current best guess at a sketch: the agent should make decisions within multiple looselycoupled contexts, with all the coupling via some lowdimensional summary information  and that summary information would be the "resources". (This is exactly the kind of setup which leads to instrumental convergence.) By making paretoresourceefficient decisions in one context, the agent would leave itself maximum freedom in the other contexts. In some sense, the ultimate "resource" is the agent's action space. Then, resource tradeoffs implicitly tell us how the agent is trading off its degree of control within each context, which we can interpret as somethinglikeutility.
Replies from: rohinmshah, Vaniver, Benito↑ comment by rohinmshah · 20210113T19:02:56.850Z · LW(p) · GW(p)
 the first links to a post which argues against VNM on the basis that it assumes probabilities and preferences are already in the model
I assume this is my comment + post; I'm not entirely sure what you mean here. Perhaps you mean that I'm not modeling the world as having "external" probabilities that the agent has to handle; I agree that is true, but that is because in the use case I'm imagining (looking at the behavior of an AI system and determining what it is optimizing) you don't get these "external" probabilities.
I expect that if these two commenters read the full essay, and think carefully about how the theorems Yudkowsky is discussing differ from VNM, then their objections will look very different.
I assure you I read this full post (well, the Arbital version of it) and thought carefully about it before making my post; my objections remain. I discussed VNM specifically because that's the bestunderstood coherence theorem and the one that I see misused in AI alignment most often. (That being said, I don't know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.)
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn't seem all that relevant to the specific, narrow setting which I was considering.
I agree that coherence arguments (including VNM) can be useful, for example by:
 Helping people make better decisions (e.g. becoming more comfortable with taking risk)
 Reasoning about what AI systems would do, given stronger assumptions than the ones I used (e.g. if you assume there are "resources" that the AI system has no preferences over).
Nonetheless, within AI alignment, prior to my post I heard the VNM argument being misused all the time (by rationalists / LWers, less so by others); this has gone down since then but still happens.
I think e.g. this talk is sneaking in the "resources" assumption, without arguing for it or acknowledging its existence, and this often misleads people (including me [? · GW]) into thinking that AI risk is implied by math based on very simple axioms that are hard to disagree with.

On the review: I don't think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment. As such I will vote against it.
I would however support it being part of the nonAlignment section of the review; as I've said before, I generally really like coherence arguments and they influence my own decisionmaking a lot (in fact, a big part of the reason I started working in AI alignment was thinking about the Arrhenius paradox, which has a very similar coherence flavor).
Replies from: johnswentworth, ESRogs↑ comment by johnswentworth · 20210114T21:39:38.035Z · LW(p) · GW(p)
I assume this is my comment + post
I was referring mainly to Richard's post here. You do seem to understand the issue of assuming (rather than deriving) probabilities.
I discussed VNM specifically because that's the bestunderstood coherence theorem and the one that I see misused in AI alignment most often.
This I certainly agree with.
I don't know the formal statements of other coherence theorems, though I predict with ~98% confidence that any specific theorem you point me to would not change my objection.
Exactly which objection are you talking about here?
If it's something like "coherence theorems do not say that tool AI is not a thing", that seems true. Even today humans have plenty of useful tools with some amount of information processing in them which are probably not usefully modelable as expected utility maximizers.
But then you also make claims like "all behavior can be rationalized as EU maximization", which is wildly misleading. Given a system, the coherence theorems map a notion of resources/efficiency/outcomes to a notion of EU maximization. Sure, we can model any system as an EU maximizer this way, but only if we use a trivial/uninteresting notion of resources/efficiency/outcomes. For instance, as you noted, it's not very interesting when "outcomes" refers to "universehistories". (Also, the "preferences over universehistories" argument doesn't work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.)
Combining these points: your argument largely seems to be "coherence arguments apply to any arbitrary system, therefore they don't tell us interesting things about which systems are/aren't <agenty/dangerous/etc>". (That summary isn't exactly meant to pass an ITT, but please complain if it's way off the mark.) My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren't <agenty/dangerous/etc>. There may be good arguments for why coherence theorems are the wrong way to think about goaldirectedness, but "everything can be viewed as EU maximization" is not one of them.
Yes, if you add in some additional detail about resources, assume that you do not have preferences over how those resources are used, and assume that there are preferences over other things that can be affected using resources, then coherence theorems tell you something about how such agents act. This doesn't seem all that relevant to the specific, narrow setting which I was considering.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an ecoli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an ecoli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about ecoli and/or selection pressures, and possibly point to better coherence theorems).
Replies from: rohinmshah↑ comment by rohinmshah · 20210114T22:28:21.078Z · LW(p) · GW(p)
Exactly which objection are you talking about here?
If it's something like "coherence theorems do not say that tool AI is not a thing", that seems true.
Yes, I think that is basically the main thing I'm claiming.
But then you also make claims like "all behavior can be rationalized as EU maximization", which is wildly misleading.
I tried to be clear that my argument was "you need more assumptions beyond just coherence arguments on universehistories; if you have literally no other assumptions then all behavior can be rationalized as EU maximization". I think the phrase "all behavior can be rationalized as EU maximization" or something like it was basically necessary to get across the argument that I was making. I agree that taken in isolation it is misleading; I don't really see what I could have done differently to prevent there from being something that in isolation was misleading, while still being able to point out thethingthatIbelieveisfallacious. Nuance is hard.
(Also, it should be noted that you are not in the intended audience for that post; I expect that to you the point feels obvious enough so as not to be worth stating, and so overall it feels like I'm just being misleading. If everyone were similar to you I would not have bothered to write that post.)
Also, the "preferences over universehistories" argument doesn't work as well when we specify the full counterfactual behavior of a system, which is something we can do quite well in practice.
I agree that if you have counterfactual behavior EU maximization is not vacuous. I don't think that this meaningfully changes the upshot (which is "coherence arguments, by themselves without any other assumptions on the structure of the world or the space of utility functions, do not imply AI risk"). It might meaningfully change the title of the post (perhaps they do imply goaldirected behavior in some sense), though in that case I'd change the title to "Coherence arguments do not imply AI risk" and I think it's effectively the same post.
Mostly though, I'm wondering how exactly you use counterfactual behavior in an argument for AI risk. Like, the argument I was arguing against is extremely abstract, and just claims that the AI is "intelligent" / "coherent". How do you use that to get counterfactual behavior for the AI system?
I agree that for any given AI system, we could probably gain a bunch of knowledge about its counterfactual behavior, and then reason about how coherent it is and how goaldirected it is. But this is a fundamentally different thing than the thing I was talking about (which is just: can we abstractly argue for AI risk without talking about details of the system beyond "it is intelligent"?)
My argument is that coherence theorems do not apply nontrivially to any arbitrary system, so they could still potentially tell us interesting things about which systems are/aren't <agenty/dangerous/etc>.
I agree with this.
There may be good arguments for why coherence theorems are the wrong way to think about goaldirectedness, but "everything can be viewed as EU maximization" is not one of them.
I actually also agree with this, and was not trying to argue that coherence arguments are irrelevant to "goaldirectedness" or "being a good agent"  I've already mentioned that I personally do things differently thanks to my knowledge of coherence arguments.
Just how narrow a setting are you considering here? Limited resources are everywhere. Even an ecoli needs to efficiently use limited resources. Indeed, I expect coherence theorems to say nontrivial things about an ecoli swimming around in search of food (and this includes the possibility that the nontrivial things the theorem says could turn out to be empirically wrong, which in turn would tell us nontrivial things about ecoli and/or selection pressures, and possibly point to better coherence theorems).
I agree that if you take any particular system and try to make predictions, the necessary assumptions (such as "what counts as a limited resource") will often be easy and obvious and the coherence theorems do have content in such situations. It's the abstract argument that feels flawed to me.
I somewhat expect your response will be "why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system", to which I would say that you are not in the intended audience.

Fwiw thinking this through has made me feel better about including it in the Alignment book than I did before, though I'm still overall opposed. (I do still think it is a good fit for other books.)
Replies from: johnswentworth↑ comment by johnswentworth · 20210114T23:21:51.790Z · LW(p) · GW(p)
I somewhat expect your response will be "why would anyone be applying coherence arguments in such a ridiculously abstract way rather than studying a concrete system", to which I would say that you are not in the intended audience.
Ok, this is a fair answer. I think you and I, at least, are basically aligned here.
I do think a lot of people took away from your post something like "all behavior can be rationalized as EU maximization", and in particular I think a lot of people walked away with the impression that usefully applying coherence arguments to systems in our particular universe is much more rare/difficult than it actually is. But I can't fault you much for some of your readers not paying sufficiently close attention [LW · GW], especially when my review at the top of this thread is largely me complaining about how people missed nuances in this post.
Replies from: Benito↑ comment by Ben Pace (Benito) · 20210115T00:43:58.181Z · LW(p) · GW(p)
(Once again, great use of that link)
↑ comment by ESRogs · 20210113T20:04:10.484Z · LW(p) · GW(p)
On the review: I don't think this post should be in the Alignment section of the review, without a significant rewrite / addition clarifying why exactly coherence arguments are useful or important for AI alignment.
Assuming that one accepts the arguments against coherence arguments being important for alignment (as I tentatively do), I don't see why that means this shouldn't be included in the Alignment section.
The motivation for this post was its relevance to alignment. People think about it in the context of alignment. If subsequent arguments indicate that it's misguided, I don't see why that means it shouldn't be considered (from a historical perspective) to have been in the alignment stream of work (along with the arguments against it).
(Though, I suppose if there's another category that seems like a more exact match, that seems like a fine reason to put it in that section rather than the Alignment section.)
Does that make sense? Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
Replies from: johnswentworth, rohinmshah↑ comment by johnswentworth · 20210114T20:50:20.760Z · LW(p) · GW(p)
I actually think it shouldn't be in the alignment section, though for different reasons than Rohin. There's lots of things which can be applied to AI, but are a lot more general, and I think it's usually better to separate the "here's the general idea" presentation from the "here's how it applies to AI" presentation. That way, people working on other interesting things can come along and notice the idea and try to apply it in their own area rather than getting scared off by the label.
For instance, I think there's probably gains to be had from applying coherence theorems to biological systems. I would love it if some rationalist biologist came along, read Yudkowsky's post, and said "wait a minute, cells need to make efficient use of energy/limited molecules/etc, can I apply that?". That sort of thing becomes less likely if this sort of post is hiding in "the alignment section".
Zooming out further... today, alignment is the only technical research area with a lot of discussion on LW, and I think it would be a nearpareto improvement if more such fields were drawn in. Taking things which are alignmentrelevantbutnotjustalignment and lumping them all under the alignment heading makes that less likely.
Replies from: ESRogs↑ comment by rohinmshah · 20210113T23:31:48.644Z · LW(p) · GW(p)
It seems weird to include a post in the book if we believe that it is misguided, just because people historically believed it. If I were making this book, I would not include such posts; I'd want an "LW Review" to focus on things that are true and useful, rather than historically interesting.
That being said, I haven't thought much about the goals of the book, and if we want to include posts for the sake of history, then sure, include the post. That was just not my impression about the goal.
Is your concern that people will see this in the Alignment section, and not see the arguments against the connection, and continue to be misled?
I would have this concern, yes, but I'm happy to defer (in the sense of "not pushing", rather than the sense of "adopting their beliefs as my own") to the opinions of the people who have thought way more than me about the purpose of this review and the book, and have caused it to happen. If they are interested in including historically important essays that we now think are misguided, I wouldn't object. I predict that they would prefer not to include such essays but of course I could be wrong about that.
↑ comment by Vaniver · 20210114T22:52:36.623Z · LW(p) · GW(p)
I like this comment, but I feel sort of confused about it as a review instead of an elaboration. Yes, coherence theorems are very important, but did people get it from this post? To the extent that comments are evidence, they look like no, the post didn't quite make it clear to them what exactly is going here.
↑ comment by Ben Pace (Benito) · 20201216T19:31:58.626Z · LW(p) · GW(p)
No need to think about editing at this point, we'll sort out all editing issues after the review. (And for this specific issue, all hyperlinks in the books have been turned into readable footnotes, which works out just fine in the vast majority of cases.)
comment by rohinmshah · 20190512T22:16:36.002Z · LW(p) · GW(p)
Obligatory: Coherence arguments do not imply goaldirected behavior [? · GW]
Also Coherent behaviour in the real world is an incoherent concept [AF · GW]
comment by Said Achmiz (SaidAchmiz) · 20190512T22:47:03.915Z · LW(p) · GW(p)
(Note: This comment mostly concerns the material in the first three sections of the post. I have not yet read, but only skimmed, the section titled “Probabilities and expected utility”. It seems to cover material I am familiar with, but I will read it in detail when I have more time.)
Eliezer, you speak here of reasons why an agent ought to behave as if its preferences satisfy the transitivity axiom, specifically; you discuss circular preferences, and their unfortunate effects, as the consequences of transitivity violations. You also discuss the independence axiom in the latter half of the post. You have discussed reasons to accept these two axioms before, in the Sequences.
However, the Von Neumann–Morgenstern utility theorem (the most commonly used, as far as I am aware, formalization of decisiontheoretic utility) has four axioms; and an agent’s preferences must satisfy all four in order for a utility function to be constructable from them.
It so happens that the two axioms you do not discuss are precisely the two axioms that I (and many economists; see below) find most suspect. The case for transitivity is obvious; the case for independence is not obvious but nonetheless reasonably solid. What I should like to see you discuss is the case for continuity and, especially, for completeness.
Why “especially” completeness? Well, Robert Aumann had this to say on the matter:
Of all the axioms of utility theory, the completeness axiom is perhaps the most questionable.[8] Like others of the axioms, it is inaccurate as a description of real life; but unlike them, we find it hard to accept even from the normative viewpoint.
(From his 1962 paper “Utility Theory without the Completeness Axiom”.)
Also relevant is “Expected Utility Theory without the Completeness Axiom” (Dubra et. al., 2001):
Before stating more carefully our goal and the contribution thereof, let us note that there are several economic reasons why one would like to study incomplete preference relations. First of all, as advanced by several authors in the literature, it is not evident if completeness is a fundamental rationality tenet the way the transitivity property is. Aumann (1962), Bewley (1986) and Mandler (1999), among others, defend this position very strongly from both the normative and positive viewpoints.
(See a previous comment of mine [LW(p) · GW(p)] for a longer quote.)
What unfortunate consequences follow from violations of completeness or continuity? What coherence theorems dictate the acceptance of those axioms? These, it seems to me, would make useful topics for any further posts along these lines…
Replies from: SaidAchmiz, adelelopez1, mikkelwilson, Sherrinford↑ comment by Said Achmiz (SaidAchmiz) · 20190513T03:12:22.202Z · LW(p) · GW(p)
Incidentally, there are also reasons for hesitation to accept the independence axiom [LW(p) · GW(p)].
↑ comment by Adele Lopez (adelelopez1) · 20190513T01:27:09.759Z · LW(p) · GW(p)
For continuity, it's reasonable to assume this because all computable functions are continuous. See theorem 4.4 of https://eccc.weizmann.ac.il/resources/pdf/ica.pdf
Edit: I realized that the continuity assumption is different (though related) from assuming the utility function is continuous. My guess is that computability is still a good justification for this, but I'd have to check that that actually follows.
Replies from: Chris_Leong, SaidAchmiz↑ comment by Chris_Leong · 20190513T04:19:36.345Z · LW(p) · GW(p)
"Because all computable functions are continuous"  how does this make any sense? Why can't I just pick a value x=1 and if it's left limit and right limit are p, set the function to p+1 at x=1.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 20190513T21:05:25.591Z · LW(p) · GW(p)
Because equality of (computable) real numbers is uncomputable. So is calculating the limit of an infinite sequence of them.
In more detail: a computable real number must be represented by a Turing machine (or your favorite Turingequivalent model) that generates some representation of it as a possibly infinite string. Equality of the infinite output of two Turing machines is uncomputable.
In fact, picking a representation for "computable" real numbers, and implementing basic arithmetic on them is nontrivial. The usual decimal or binary strings of digits won't work.
Replies from: Chris_Leong↑ comment by Chris_Leong · 20190514T02:09:04.395Z · LW(p) · GW(p)
Hmm, I'm still not following. Limits are uncomputable in general, but I just need one computational function where I know the limits at one point and then I can set it to p+1 instead. Why wouldn't this function still be computable? Maybe "computable function" is being defined differently than I would expect.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 20190514T16:04:01.281Z · LW(p) · GW(p)
To compute that function for an unknown argument x, you would have to determine whether x is equal to 1. But if real numbers are encoded as infinite strings, there is no way to tell whether x=1 in finite time. If x happens to be 1, then however long an initial segment of that representation you examined, you could never be sure that x was not very slightly different from 1. In the usual decimal representation, if you see 1.00000.... if the number is greater than 1 you will eventually know that, but if the zeroes go on forever, you can never know that. Similarly if you see 0.99999.....
I'm not sure how relevant this is to the original context, see other replies to Adele Lopez's ancestor comment.
Replies from: Chris_Leong↑ comment by Chris_Leong · 20190514T18:06:29.991Z · LW(p) · GW(p)
Okay, so there is an additional assumption that these strings are all encoded as infinite sequences. Instead, they could be encoded with a system that starts by listing the number of digits or 1 if the sequence if infinite, then provide those digits. That's a pretty key property to not mention (then again, I can't criticise too much as I was too lazy to read the PDF). Thanks for the explanation!
↑ comment by Said Achmiz (SaidAchmiz) · 20190513T03:23:06.711Z · LW(p) · GW(p)
This seems to be a non sequitur.
Suppose it were true that preferences that violate the continuity axiom imply a utility function that is uncomputable. (This hardly seems worse or less convenient than the case—which is, in fact, the actual state of affairs—where continuity violations imply that no utility function can represent one’s preferences, computable or otherwise… but let’s set that aside, for now.)
How would this constitute a reason to have one’s preferences conform to the continuity axiom…?
Replies from: adelelopez1↑ comment by Adele Lopez (adelelopez1) · 20190513T04:19:44.364Z · LW(p) · GW(p)
Presumably, any agent which we manage to build will be computable. So to the extent our agent is using utility functions, they will be continuous.
If an agent is only capable of computable observations, but has a discontinuous utility function, then if the universe is in a state where the utility function is discontinuous, the agent will need to spend an infinite amount of time (or as long as the universe state remains at such a point) determining the utility of the current state. I think it might be possible to use this to create a more concrete exploit.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190513T04:29:08.503Z · LW(p) · GW(p)
Presumably, any agent which we manage to build will be computable. So to the extent our agent is using utility functions, they will be continuous.
There are several objections one could make to this line of reasoning. Here are two.
First: do you believe that we, humans are uncomputable? If we are uncomputable, then it is clearly possible to construct an uncomputable agent. If, conversely, we are computable, then whatever reasoning you apply to an agent we build can be applied to us as well. Do you think it does apply to us?
Second: supposing your reasoning holds, why should it not be a reason for our constructed agent not to use utility functions, rather than a reason for said agent to have continuous preferences?
(This is a good time to mention, again, that this entire tangent is moot, as violating the continuity axiom—or any of the axioms—means that no utility function, computable or not, can be constructed from your preferences. But even if that weren’t the case, the above objections apply.)
↑ comment by MikkW (mikkelwilson) · 20200321T04:54:57.658Z · LW(p) · GW(p)
As for completeness, I struggle to see any practical difference between being unwilling to choose between two outcomes, and finding them equally acceptable (which is allowed by completeness).
Or, one can imagine someone relentlessly flopping between two highly (un)desirable outcomes because they are unwilling to settle, and I think it's obvious what the problem there is.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20200321T08:35:12.448Z · LW(p) · GW(p)
Have you read the papers I linked (or the more directly relevant papers cited by those)? What do you think about Aumann’s commentary on this question, for instance?
↑ comment by Sherrinford · 20190513T08:29:13.730Z · LW(p) · GW(p)
While I don't find completeness so problematic, I got quite confused by Eliezer's post. Firstly, it would make much more sense to first explain what "utility" is, in the sense that it is used here. Secondly, the justification of transitivity is common, but using a word like "dominated strategy" there does not make much sense, because you can only evaluate strategies if you know the utility functions (and it also mixes up words). Thirdly, it's necessary to discuss all axioms and their implications. For example, in standard preferences theory under certainty, it's possible to have preferences that are complete and transitive but you cannot get a utility function from. Fourthly, I am still confused whether this talk about expected utility is only normative or also a positive description of humans, or kinda both.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190513T19:18:48.482Z · LW(p) · GW(p)
Firstly, it would make much more sense to first explain what “utility” is, in the sense that it is used here.
He is referring to decisiontheoretic utility, in the sense in which the term is used in economics and game theory.
For example, in standard preferences theory under certainty, it’s possible to have preferences that are complete and transitive but you cannot get a utility function from.
Such (“lexicographic”) preferences violate the continuity axiom.
Fourthly, I am still confused whether this talk about expected utility is only normative or also a positive description of humans, or kinda both.
Eliezer is definitely speaking normatively; none of the VNM axioms reliably apply to humans in a descriptive sense. Eliezer is concerned with the design of artificial agents, for which task it is necessary to determine what axioms their preferences ought to conform to (among other things).
comment by johnswentworth · 20201202T18:46:23.178Z · LW(p) · GW(p)
I don't particularly like dragging out the old coherence discussions, but the annual review is partly about building common knowledge, so it's the right time to bring it up.
This currently seems to be the canonical reference post on the subject. On the one hand, I think there are major problems/missing pieces with it. On the other hand, looking at the top "objection"style comment (i.e. Said's), it's clear that the commenter didn't even finish reading the post and doesn't understand the pieces involved. I think this is pretty typical among people who object to coherence results: most of them have only dealt with the VNM theorem, and correctly complain about the assumptions of that theorem being too strong, but don't know about the existence of all the other coherence theorems (including the complete class theorem mentioned in the post, and Savage's theorem mentioned in the comments). The "real" coherence theorems do have problems with them, but they're not the problems which a lot of people point to in VNM.
I'll leave a more detailed review later. The point of this nomination is to build common knowledge: I'd like to get to the point where the objections to coherence theorems are the right objections, rather than objections based in ignorance, and this post (and reviews of it) seem like a good place for that.
comment by Richard_Kennaway · 20190513T20:58:43.346Z · LW(p) · GW(p)
Meta: I'm unclear about the context in which this post is to be read, and its purpose. Googling phrases from below the fold tells me that it appears both here and on Arbital, although there is no indication here or there that this is a crosspost, and Arbital posts are not routinely posted here. It includes a link to a blog post of 2018, so appears to be of recent composition, but reads like a posting from the Sequences now long in the past, and I am not sure it contains any ideas not present there. It begins in medias res ("So, we're talking..."), yet does not refer back to any of the implied predecessors.
I notice that I am confused.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190513T22:33:49.608Z · LW(p) · GW(p)
Huh, you’re right: this is just a repost of an Arbital article.
I must say I feel rather cheated. When I saw this, I was under the impression that Eliezer had composed this post for Less Wrong, and had posted it to Less Wrong; I assumed that there was therefore some chance he might respond to comments. But that seems not to be the case. (Is it even Eliezer who posted it? Or someone else using his account, as happened, IIRC, with Inadequate Equilibria?)
I, too, would like to know what the purpose of this post is.
Replies from: RobbBB, Benquo↑ comment by Rob Bensinger (RobbBB) · 20190514T03:44:21.316Z · LW(p) · GW(p)
I asked Eliezer if it made sense to crosspost this from Arbital, and did the crossposting when he approved. I'm sorry it wasn't clear that this was a crosspost! I intended to make this clearer, but my idea was bad (putting the information on the sequence page [? · GW]) and I also implemented it wrong (the sequence didn't previously display on the top of this post).
This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins in media res stylistically, it doesn't have any prereqs or context beyond "this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics."
Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:
So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'. And before we even ask what those are, we might first ask, Why?
There have been loose plans for a while to crosspost content from Arbital to LW (maybe all of it; maybe just the best or most interesting stuff), but as I mentioned downthread [LW(p) · GW(p)], we're doing more crosspost experiments sooner than we would have because Arbital's been having serious performance issues.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190514T03:57:59.435Z · LW(p) · GW(p)
I see, thanks. That does explain things.
Some questions occur to me, which I don’t expect you necessarily to answer at once, but hope you (and/or whoever is responsible for the Arbital content or the decisions to post it to LW) will consider:

In your opinion, does this post (still? ever?) work well as a “linkable resource for conversations about human rationality and … AGI”?

Are there plans (by Eliezer, or by anyone else) to revise this content? Or is it meant to stand unchanged, as a matter of “historical interest” only, so to speak?

Relatedly to #2, is it productive to engage with this post, by commenting, discussing, critiquing? (In any sense other than “it’s fun and/or personally edifying to do so”?) That is: is there anyone “on the other end”, so to speak, who might read (and possibly even participate in) such discussions, and take action (such as writing an updated version of this material, to pick a simple example) as a result?

For whom is this post intended, and by whom? Whose purposes does it serve, whom is it meant to benefit, and who may reasonably judge whether it is serving its purpose?
↑ comment by Benquo · 20190514T01:30:41.722Z · LW(p) · GW(p)
Presumably to keep morale up by making it look like the rightful Caliph is still alive and producing output.
Replies from: Raemon↑ comment by Raemon · 20190514T02:22:46.190Z · LW(p) · GW(p)
I believe the intention was for this post to appear as part of a sequence that more clearly situated it as part of a series of reposts from Arbital, but there were some mixups that made the sequence title not show up by default. I agree the current implementation is confusing.
comment by habryka (habryka4) · 20190628T18:27:59.916Z · LW(p) · GW(p)
Promoted to curated: This is a pretty key post that makes an argument that I think has been implicit in a lot of things on LessWrong for a long time, but hasn't actually been made this explicitly.
I do actually think that in the act of making it explicit, I've started to agree with some of the commenters that there is something missing in this argument (in particular as Said pointed out the treatment of the completeness axiom). It's not necessarily the case that I disagree with the conclusion, but I still think covering those arguments is something I would want someone to spend serious time on.
However, overall I still think this post does an exceptionally well job at introducing utility functions as a core abstraction in rationality, and expect it to be something I reference for a long time to come.
comment by Sniffnoy · 20191021T01:06:15.455Z · LW(p) · GW(p)
So this post is basically just collecting together a bunch of things you previously wrote in the Sequences, but I guess it's useful to have them collected together.
I must, however, take objection to one part. The proper noncircular foundation you want for probability and utility is not the complete class theorem, but rather Savage's theorem, which I previously wrote about on this website [LW · GW]. It's not short, but I don't think it's too inaccessible.
Note, in particular, that Savage's theorem does not start with any assumption baked in that R is the correct system of numbers to use for probabilities[0], instead deriving that as a conclusion. The complete class theorem, by contrast, has real numbers in the assumptions.
In fact  and it's possible I'm misunderstanding  but it's not even clear to me that the complete class theorem does what you claim it does, at all. It seems to assume probability at the outset, and therefore cannot provide a grounding for probability. Unlike Savage's theorem, which does. Again, it's possible I'm misunderstanding, but that sure seems to be the case.
Now this has come up here before [LW(p) · GW(p)] (I'm basically in this comment just restating things I've previously written) and your reply when I previously pointed out some of these issues was, frankly, nonsensical (your reply [LW(p) · GW(p)], my reply [LW(p) · GW(p)]), in which you claimed that the statement that one's preferences form a partial preorder is a stronger assumption than "one prefers more apples to less apples", when, in fact, the exact reverse is the case.
(To restate it for those who don't want to click through: If one is talking solely about one's preferences over number of apples, then the statement that more is better immediately yields a total preorder. And if one is talking about preferences not just over number of apples but in general, then... well, it's not clear how what you're saying applies directly; and taken less literally, it just in general seems to me that the complete class theorem is making some very strong assumptions, much stronger than that of merely a total preorder (e.g., real numbers!).)
In short the use of the complete class theorem here in place of Savage's theorem would appear to be an error and I think you should correct it.
[0]Yes, it includes an Archimedean assumption, which you could argue is the same thing as baking in R; but I'd say it's not, because this Archimedean assumption is a direct statement about the agent's preferences, whereas it's not immediately clear what picking R as your number system means as a statement about the agent's preferences.
comment by Said Achmiz (SaidAchmiz) · 20190512T22:05:13.861Z · LW(p) · GW(p)
Meta: (some of?) the linked Arbital pages do not seem to work. For example, https://arbital.com/p/probability_theory/ shows me a blank page:
(There was also some sort of red box with something about a “pipeline error” or something, but it disappeared.)
I am using Chrome 74.0.3729.131 (the latest as of this writing) on a Mac.
Replies from: RobbBB, Ruby↑ comment by Rob Bensinger (RobbBB) · 20190514T02:01:21.742Z · LW(p) · GW(p)
Arbital has been getting increasingly slow and unresponsive. The LW team is looking for fixes or workarounds, but they aren't familiar with the Arbital codebase. In the meantime, I've been helping crosspost some content from Arbital to LW so it's available at all.
Replies from: SaidAchmiz, jimrandomh↑ comment by Said Achmiz (SaidAchmiz) · 20190514T02:21:52.511Z · LW(p) · GW(p)
Is it possible to create, and make available, a dump of the Arbital content? I’ve no doubt that there are people who’d be willing to host the entire thing, or convert it en masse into another format, etc.
Edit: Actually, if you could just post a complete list of Arbital page names, I could extract the content myself, as the API to request page content seems sufficiently straightforward.
Replies from: RobbBB↑ comment by Rob Bensinger (RobbBB) · 20190514T19:46:56.205Z · LW(p) · GW(p)
We'd talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).
↑ comment by jimrandomh · 20190514T02:32:20.616Z · LW(p) · GW(p)
While we have a longterm plan of importing Arbital's content into LessWrong (after LessWrong acquires some wikilike features to make it make sense), we have not taken responsibility for the maintenance of Arbital itself.
Replies from: RobbBB↑ comment by Rob Bensinger (RobbBB) · 20190514T03:34:04.620Z · LW(p) · GW(p)
I assume you mean 'no one has this responsibility for Arbital anymore', and not that there's someone else who has this responsibility.
comment by SuddenCaution · 20190513T09:15:53.591Z · LW(p) · GW(p)
I find it confusing that the only thing that matters to a rational agent is the expectation of utility, i.e., that the details of the probability distribution of utilities do not matter.
I understand that VNM theorem proves that from what seem reasonable axioms, but on the other hand it seems to me that there is nothing irrational about having different risk preferences. Consider the following two scenarios
 A: you gain utility 1 with probability 1
 B: you gain utility 0 with probability 1/2 or utility 2 with probability 1/2
According to expected utility, it is irrational to be anything but indifferent to between A and B. This seems wrong to me. I can even go a bit further, consider a third option:
 C: you gain utility 0.9 with probability 1
Expected utility says it is irrational to prefer C to B, but this seems perfectly reasonable to me. It's optimizing for the worstcase instead of the average case. Is there a direct way of showing that preferring B to C is irrational?
Replies from: TheMajor, dxu, SaidAchmiz, Slider↑ comment by TheMajor · 20190513T14:14:24.615Z · LW(p) · GW(p)
This is part of the meaning of 'utility'. In real life we often have riskaverse strategies where, for example, 100% chance at 100 dollars is preferred to 50% chance of losing 100 dollars and 50% chance of gaining 350 dollars. But, under the assumption that our riskaverse tendencies satisfy the coherence properties from the post, this simply means that our utility is not linear in dollars. As far as I know this captures most of the situations where riskaversion comes into play: often you simply cannot tolerate extremely negative outliers, meaning that your expected utility is mostly dominated by some large negative terms, and the best possible action is to minimize the probability that these outcomes occur.
Also there is the following: consider the case where you are repeatedly offered bets of the example you give (B versus C). You know this in advance, and are allowed to redesign your decision theory from scratch (but you cannot change the definition of 'utility' or the bets being offered). What criteria would you use to determine if B is preferable to C? The law of large numbers(/central limit theorem) states that in the long run with probability 1 the option with higher expected value will give you more utilons, and in fact that this number is the only number you need to figure out which option is the better pick in the long run.
The tricky bit is the question whether this also applies to oneshot problems or not. Maybe there are rational strategies that use, say, the aggregate median instead of the expected value, which has the same limit behaviour. My intuition is that this clashes with what we mean with 'probability'  even if this particular problem is a oneoff, at least our strategy should generalise to all situations where we talk about probability 1/2, and then the law of large numbers applies again. I also suspect that any agent that uses more information to make this decision than the expected value to decide (in particular, occasionally deliberately chooses the option with lower expected utility) can be cheated out of utilons with clever adversarial selections of offers, but this is just a guess.
Replies from: SuddenCaution↑ comment by SuddenCaution · 20190513T18:08:07.684Z · LW(p) · GW(p)
The tricky bit is the question whether this also applies to oneshot problems or not.
This is the crux. It seems to me that the expected utility frame work means that if you prefer A to B in one time choice, then you must also prefer n repetitions of A to n repetitions of B, because the fact that you have larger variance for n=1 does not matter. This seems intuitively wrong to me.
Replies from: Pattern↑ comment by Pattern · 20190524T17:39:00.961Z · LW(p) · GW(p)
I'd hold that it's the reverse that seems more questionable. If n is a large number then the Law of Large Numbers may be applicable ("the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.").
↑ comment by Said Achmiz (SaidAchmiz) · 20190513T15:39:35.802Z · LW(p) · GW(p)
Robyn Dawes makes a more detailed version of precisely this argument in Rational Choice in an Uncertain World. I summarize his argument in an old comment of mine [LW(p) · GW(p)]. (The axiom you must reject, incidentally, if you find this sort of reasoning convincing, is the independence axiom.)
Replies from: SuddenCaution↑ comment by SuddenCaution · 20190513T18:01:37.337Z · LW(p) · GW(p)
Thanks, I looked at the discussion you linked with interest. I think I understand my confusion a little better, but I am still confused.
I can walk through the proof of the VNM theorem and see where the independence axiom comes in and how it leads to u(A)=u(B) in my example. The axiom of independence itself feels unassailable to me and I am not quite sure this is a strong enough argument against it. Maybe having a more direct argument from axiom of independence to unintuitive result would be more convincing.
Maybe the answer is to read Dawes book, thanks for the reference.
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190513T19:14:12.073Z · LW(p) · GW(p)
The axiom of independence itself feels unassailable to me
Well, the axiom of independence is just that: an axiom. It doesn’t need to be assailed; we can take it as axiomatic, or not. If we do take it as axiomatic, certain interesting analyses become possible (depending on what other axioms we adopt). If we refuse to do so, then bad things happen—or so it’s claimed.
In any case, Dawes’ argument (and related ones) about the independence axiom fundamentally concerns the question of what properties of an outcome distribution we should concern ourselves with. (Here “outcome distribution” can refer to a probability distribution, or to some set of outcomes, distributed across time, space, individuals, etc., that is generated by some policy, which we may perhaps view as the output of a generator with some probability distribution.)
A VNMcompliant agent behaves as if it is maximizing the expectation of the utility of its outcome distribution. It is not concerned at all with other properties of that distribution, such as dispersion (i.e., standard deviation or some related measure) or skewness. (Or, to put it another way, a VNMcompliant agent is unconcerned with the form of the outcome distribution.)
What Dawes is saying is simply that, contra the assumptions of VNMrationality, there seems to be ample reason to concern ourselves with, for instance, the skewness of the outcome distribution, and not just its expectation. But if we do prefer one outcome distribution to another, where the dispreferred distribution has a higher expectation (but a “better” skewness), then we violate the independence axiom.
Replies from: SuddenCaution↑ comment by SuddenCaution · 20190513T19:37:25.319Z · LW(p) · GW(p)
I get what you are saying. You have convinced me that the following two statements are contradictory:
 Axiom of Independence: preferring A to B implies preferring ApC to BpC for any p and C.
 The variance and higher moments of utility matter, not just the expected value.
My confusion is that it intuitively it seems both must be true for a rational agent but I guess my intuition is just wrong.
Thanks for your comments, they were very illuminating.
↑ comment by Slider · 20190513T17:25:18.884Z · LW(p) · GW(p)
I think you are not allowed to refer explicitly to utility in the options. That is an option of "I do not choose this option" is selfdefeating and illformed. In another post [LW(p) · GW(p)] I posited a riskaverse utility function that references amount of paperclips. Maximising the utility function doesn't maximise expected amount of paperclips. Even if the physical objects of interest are paperclips and we value them linearly a paperclip is not synonymous with utilon. It's not a thing you can give out in an option.
Replies from: SuddenCaution↑ comment by SuddenCaution · 20190513T18:19:58.077Z · LW(p) · GW(p)
I think you are not allowed to refer explicitly to utility in the options.
I was going to answer that I can easily reword my example to not explicitly mention any utility values, but when I tried to that it very quickly led to something where it is obvious that u(A) = u(C). I guess my rewording was basically going through the steps of the proof of VNM theorem.
I am still not sure I am convinced by your objection, as I don't think there's anything selfreferential in my example, but that did give me some pause.
Replies from: Slider↑ comment by Slider · 20190515T13:15:59.417Z · LW(p) · GW(p)
In a case where you are going to pick less variance less expected value over more variance more expected value it will mean that option needs to have a bigger "utility number". In order to get that you need to mess with how utility is calculated. Then it becomes ambigious whether the "utilityfruits" are redefined in the same go as we redefine how we compare options. If we name them "paperclips" it's clear that they are not touched by such redefining.
It triggerred a "typeunsafety" trigger but the operation overall might be safe as it doesn't actualise the danger. For example having an option of "plum + 2 utility" could give one agent "plum + apple" if it valued apples and "plum + pear" if it valued pears. I guess if you consistenly replace all physical items for their utility values it doesn't happen.
In the case of "gain 1 utility with probability 1" if your agent is riskseeking it might give this option "actual" utility less than 1. In general if we lose the distribution independence we might need to retain the information of our suboutcomes rather than collapsing it to he a single number. For if an agent is riskseeking it's clear that it would prefer A=( 5% 0,90% 1, 5% 2) to B=(100%, 1). But same riskseeking in combined lotteries would make it prefer C=(5% , 90% A, 5% A+A) over A. When comparing C and A it's not sufficent to know that their expected utilities are 1.
comment by Zvi · 20201218T15:35:11.710Z · LW(p) · GW(p)
The problem with evaluating a post like this is that long post is long and slow and methodical, and making points that I (and I'm guessing most others who are doing the review process) already knew even at the time it was written in 2017. So it's hard to know whether the post 'works' at doing the thing it is trying to do, and also hard to know whether it is an efficient means of transmitting that information.
Why can't the post be much shorter and still get its point across? Would it perhaps even get the point across better if it was much shorter, because eyes would not glaze over? It's really hard for me to tell. There definitely seems, to me, to be a (relatively) very short post here that seems like it would be much clearer to the me that did not yet know this than this post is, because it's boiling things down better. But that same person also likely could derive the whole result from the title, perhaps with the words "else dutch book QED" attached.
I'd also feel better about linking to a shorter post than this, as pointing here seems like a big ask, and also makes it harder for people to quickly conclude they do in fact know this already, if they know this already.
Another worry is that by being this long and careful, the absence of other things seems like missing pieces, in a way that it wouldn't in a shorter post, I think?
And it also relies heavily on the Dutch Book argument, in ways that realistic agents often have defenses against exactly because they're not coherent enough to go without them. Dutch Book attacks seem like the easy mode where proving everything is trivial via proof by contradiction, and therefore it kind of excludes all the 'interesting' cases? As in all we need is: "If not consistent utilities then inconsistent utilities so there exist two inconsistent utilities so Dutch Book so not coherent QED" and then we extend to probability by saying "If probabilities don't add to 1 then Dutch Book over all probabilities, selling a 1 payout for all possibilities for 1+epsilon if they sum to >1 and buying the 1 payout for 1epsilon otherwise for some epsilon, then repeat, so again QED same way." Is that unfair? If so, why? Actually want to know.
Does that short version exist? If not, should I just write it (better than I wrote it way too quickly here)?
The objections raised in the comments don't seem to hold any weight and also seem (as John confirms) to be mostly arguing against something different than what the post says.
So I guess my question would then be, who got the central point from this post, that didn't already have it, and what was that experience like? What parts of this seemed needed for that to happen versus not needed?
comment by Vaniver · 20210114T23:51:45.729Z · LW(p) · GW(p)
Incidentally, a handful of things have crossed my path at the same time, such that I think I have a better explanation for the psychology underlying the Allais Paradox. [I'm not sure this will seem new, but something about the standard presentation seems to be not giving it the emphasis it deserves, or speaking generally instead of particularly.]
The traditional explanation is that you're paying for certainty, which has some value (typically hugely overestimated). But I think 'certainty' should really be read as something more like "not being blameworthy." That is, connect it to handicapping so that you have an excuse for poor performance. The person who picks 1B and loses know that they missed out on a certain $1M, whereas the person who picks 1A can choose to focus their attention on the possibility of losing the $1M they did get instead of the $4M they might have had and don't.
As Matt Levine puts it,
I admit that I occasionally envy the people who bought Bitcoin early for nothing and are now billionaires and retired. One thing that soothes this envy is reading about people who bought Bitcoin early for nothing and are now theoretical centimillionaires but lost their private keys and can’t access the money. I may have no Bitcoins, but at least I haven’t misplaced a fortune in Bitcoins.
"At least I haven't misplaced a fortune in Bitcoins"! Or, in other words, two different ways to "gain $0" with different Us.
[For what it's worth, I think this sort of "protecting yourself against updates" is mostly a mistake, and think it's better to hug reality as closely as possible, which means paying more attention to your mistakes instead of less, and being more open to making them instead of less. I think seeing the obstacles more clearly makes them easier to overcome.]
Replies from: Unnamedcomment by Zack_M_Davis · 20201212T22:07:17.142Z · LW(p) · GW(p)
This is the second nomination in order to get this in the official Review pool, in order for John S. Wentworth's future "more detailed review" [LW(p) · GW(p)] to be in the official Review pool.
comment by Chris_Leong · 20190513T04:29:45.463Z · LW(p) · GW(p)
My understanding of the arguments against using a utility maximiser is that proponents accept that this will lead to suboptimal or dominated outcomes, but they are happy to accept this because they believe that these AIs will be easier to align. This seems like a completely reasonable tradeoff to me. For example, imagine that choosing option A is worth 1 utility. Option B is worth 1.1 utility if 100 mathematical statements are all correct, but 1000 otherwise (we are ignoring the costs of reading through and thinking about all 100 mathematical statements). Even if each of the statements seems obviously correct, there is a decent chance that you messed up on at least 1 of them, so you'll most likely want to take the outside view and pick option A. So I don't think it's necessarily an issue if the AI is doing things that are obviously stupid from an inside view.
comment by Richard_Ngo (ricraz) · 20210107T15:46:40.154Z · LW(p) · GW(p)
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
In an earlier review, johnswentworth argues:
I think instrumental convergence provides a strong argument that...we can use tradeoffs with those resources in order to work out implied preferences over everything else, at least for the sorts of "agents" we actually care about (i.e. agents which have significant impact on the world).
I think this is a reasonable point, but also a very different type of argument from Eliezer's argument, since it relies on things like economic incentives. Instead, when Eliezer critiques Paul's concept of corrigibility, he says things like "deference is an unusually antinatural shape for cognition [LW · GW]". How do coherence theorems translate to such specific claims about the "shape of cognition"; and why is grounding these theorems in "resources" a justifiable choice in this context? These are the types of followup arguments which seem necessary at this point in order for further promotion of this post to be productive rather than harmful.
Replies from: ESRogs, DanielFilan↑ comment by ESRogs · 20210113T20:13:53.944Z · LW(p) · GW(p)
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
If the post is the best articulation of a line of reasoning that has been influential in people's thinking about alignment, then even if there are strong arguments against it, I don't see why that means the post is not significant, at least from a historical perspective.
By analogy, I think Searle's Chinese Room argument is wrong and misleading, but I wouldn't argue that it shouldn't be included in a list of important works on philosophy of mind.
Would you (assuming you disagreed with it)? If not, what's the difference here?
(Put another way, I wouldn't think of the review as a collection of "correct" posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)
Replies from: ricraz, TAG↑ comment by Richard_Ngo (ricraz) · 20210113T23:07:56.407Z · LW(p) · GW(p)
Your argument is plausible. On the other hand, this review is for 2019, not 2017 (when this post was written) nor 2013 (when this series of ideas was originally laid out [LW · GW]). So it seems like it should reflect our currentish thinking.
I note that the page for the review doesn't have anything about voting criteria. This seems like something of an oversight?
↑ comment by DanielFilan · 20210112T19:50:15.394Z · LW(p) · GW(p)
How do coherence theorems translate to such specific claims about the "shape of cognition"; and why is grounding these theorems in "resources" a justifiable choice in this context?
It occurs to me that one plausible answer here is that cognition requires computational resources, and therefore effective cognition will generically involve trading off these resources in a way that does not reliably lose them.
But my more relevant response is that in that section I don't see Eliezer saying that coherence theorems are the justification for his claim about the antinaturalness of deference.
Replies from: ricraz↑ comment by Richard_Ngo (ricraz) · 20210113T18:01:15.876Z · LW(p) · GW(p)
I don't see Eliezer saying that coherence theorems are the justification for his claim about the antinaturalness of deference.
If coherence theorems are consistent with deference being "natural", then I'm not sure what argument Eliezer is trying to make in this post, because then couldn't they also be consistent with other deontological cognition being natural, and therefore likely to arise in AGIs?
effective cognition will generically involve trading off these resources in a way that does not reliably lose them
In principle, maybe. In practice, if we'd been trying to predict how monkeys will evolve, what does this claim imply about humanmonkey differences?
comment by Tetraspace Grouping (tetraspacegrouping) · 20201213T02:15:30.728Z · LW(p) · GW(p)
I have used this post quite a few times as a citation when I want to motivate the use of expected utility theory as an ideal for making decisions, because it explains how it's not just an elegant decisionmaking procedure from nowhere but a mathematical inevitability of the requirements to not leave money on the table or to accept guaranteed losses. I find the concept of coherence theorems a better foundation than the normal way this is explained, by pointing at the von NeumannMorgensten axioms and saying "they look true".
comment by Dan Tobias (dantobias) · 20190514T11:44:51.746Z · LW(p) · GW(p)
The hypothetical person with circular preferences in where to be reminds me of the hero of The Phantom Tollbooth, Milo, whose own location preferences are described this way: "When he was in school he longed to be out, and when he was out he longed to be in. On the way he thought about coming home, and coming home he thought about going. Wherever he was he wished he were somewhere else, and when he got there he wondered why he'd bothered."
comment by Tyrrell_McAllister · 20190711T19:29:41.229Z · LW(p) · GW(p)
Typo: "And that's why the thingies you multiply probabilities by—the thingies that you use to weight uncertain outcomes in your imagination,"
Here, "probabilities" should be "utilities".
comment by orthonormal · 20190514T00:01:56.252Z · LW(p) · GW(p)
Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)
comment by romeostevensit · 20190513T20:49:40.250Z · LW(p) · GW(p)
I have to trade off the cost of following high complexity decision theory against the risk of being dominated*the badness of being dominated.
comment by Chris_Leong · 20190513T04:15:54.303Z · LW(p) · GW(p)
"Is a fleeting emotional sense of certainty over 1 minute, worth automatically discarding the potential $5million outcome?"  I know it's mostly outside of what is being modelled here, but suspect that someone who takes the 90% bet and wins nothing might experience much more than just a fleeting sense of disappointment, much more than someone who takes the 45% chance and doesn't win.
comment by Samuel Hapák (hleumas) · 20190512T22:43:16.428Z · LW(p) · GW(p)
There is one other explanations for the results of those experiments.
In a real world, it's quite uncommon that somebody tells you exact probabilities—no you need to infer them from the situation around you. And we the people, we pretty much suck at assigning numeric values to probabilities. When I say 99%, it probably means something like 90%. When I say 90%, I'd guess 70% corresponds to that.
But that doesn't mean that people behave irrationally. If you view the proposed scenarios through the described lens, it's more like:
a) Certainty of million or ~60% chance on getting 5 millions.
b) Slightly higher probability of getting a million but the difference is much smaller than the actual error in the estimation of probabilities themselves.
With this in mind, the actual behaviour of people makes much more sense.
Replies from: adelelopez1↑ comment by Adele Lopez (adelelopez1) · 20190513T01:35:02.312Z · LW(p) · GW(p)
I think you're right that this is part of where the intuition comes from. But it's still irrational in a context where you actually know the probabilities accurately enough.
Replies from: hleumas↑ comment by Samuel Hapák (hleumas) · 20190516T08:28:30.537Z · LW(p) · GW(p)
True, but that’s usually very artificial context. Often when someone claims they know the probabilities accurately enough, they are mistaken or lying.
comment by TAG · 20190522T13:19:24.417Z · LW(p) · GW(p)
The food preferrence example is rather self defeating. Most people don't mechanically and predictably choose X over y and z when all are available...they also have preferences for variety, trying new things, impressing people they are with, and and so on. People whose preferences are both predictable and incoherent can be gamed... but that doesnt mean everyone has coherent preferences, because coherent preferences need to be defined against a limited framework (without randomness or meta preferences).. and because having messy, unpredictable preferences protects you against being gamed as well as predictable and coherent preferences.
A similar pattern emerges when considering ethics. Under the assumption that ethics=utilitarianism, the ethical person needs to have consistent preferences... but the assumption is doing a lot of the lifting. Utitarianism is a latecomer, and WEIRD. Most people run on a mishmash of virtue theory and deontology.
Proveable consistency and proveable inconsistency both require unrealistically precise and predictable behaviour. There's an argument that if you are going to be precise and predictable, you should also be coherent, but it doesn't show that people actually have UFs.
comment by Slider · 20190513T15:59:44.960Z · LW(p) · GW(p)
The "damage" from shooting your own foot is defined in the terms of the utilitynumber.
Say I pick a dominated strategy that nets me 2 apples and the dominating strategy nets me 3 apples. If on another level of modelling I can know that the first apples are clean and the 2 apples in the dominating arrangement have worms I might be happy to be dominated. Applelevel damage is okay (while nutritional level damage might not be). All deductive results are tautologies but "if you can't model the agent as trying to achieve goal X then it's inefficient at achieving X" seems very far from "incoherent agents are stupid".
Replies from: SaidAchmiz↑ comment by Said Achmiz (SaidAchmiz) · 20190513T19:01:37.940Z · LW(p) · GW(p)
If some of the apples are clean and others have worms, then that is modeled in your preference ordering: you prefer clean apples to wormy ones, perhaps at some exchange rate, etc. We then stipulate that all the apples are clean (or all are wormy, or all have an equal chance of being clean vs. wormy, etc.), and the analysis proceeds as before.
That said, your general point is worth exploring. If we suppose, as Eliezer says, that
Alice … prefers having more fruit to less fruit, ceteris paribus, for each category of fruit
… and if we further suppose that her preferences are intransitive, then we conclude that Alice’s strategy is strictly dominated by some other.
That is—Alice’s strategy is strictly dominated in terms of apples (or fruit in general). It can’t be dominated in utility, of course, because we cannot construct a utility function from Alice’s preferences (on account of their intransitivity)!
Well, and so what? Is this bad according to Alice’s own preferences? Can we show this? How would we do that? By asking Alice whether she prefers the outcome (5 apples and 1 orange) to the initial state (8 apples and 1 orange)? But what good is that? If Alice’s preferences are circular, then it’s entirely possible (in fact, it’s true) that the outcome (5 apples and 1 orange) both dominates, and is dominated by, the initial state (8 apples and 1 orange).
(More accurately, that’s true if we’re permitted to say that if strategy X dominates Y, and Y dominates Z, then X dominates Z. It’s not possible for an agent to prefer X to Y and, simultaneously, Y to X, however intransitive their preferences are, if they still obey the completeness axiom. Of course, if an agent’s preferences are intransitive and incomplete, then it can prefer X to Y, and also Y to X.)
The point is this: it’s not so easy to show that an agent’s strategy is suboptimal according to its own preferences if those preferences violate the axioms. We can gesture at some intuitive considerations like “well, that’s obviously stupid”, but these amount to little more than the fact that we find the violated axioms intuitively attractive in the given case.
Replies from: Slider↑ comment by Slider · 20190513T20:27:44.694Z · LW(p) · GW(p)
I was thinking of another agent judging my strategies and making a backed argument why I am wrong. If someone said "you were suboptimal on fruit front, I fixed that mistake for you" and I arrive at a table with 2 worm apples, I would be annoyed/pissed. I am assuming that the other agent can't evaluate their cleanness  it's all fruit to them. Moreover it might be that worm apples are rare and observing my trade activity it might be inductively well supported that I seem to value "fruitmaximization" a great deal (nutrition maximisation with clean fruit is just fruit maximisation). And it might be important to understand that he didn't mean to cause wormy apples (he isn't even capable of meaning that) but his actions might have infact caused it.
In the case that wormy apples are frequent the hypothesis that I am a fruitmaximiser is violated clearly enough that he knows to be on shaky grounds on modelling me as a fruitmaximiser. For some very unskilled traders they might confuse one type of fruit with another and be inconsistent because they can't get their fruit categories straight. At some midskill "fruitmaximisement" peaks and those that don't understand things beyond that point will confuse those that are yet to get to fruitmaximization and those that are past that. Expecting superintelligent things to be consistent kind of assumes that if a metric ever becomes a good goal higher levels will never be weaker on that metric, that maximation strictly grows and never decreases with ability for all submetrics.