# Math appendix for: "Why you must maximize expected utility"

post by Benya (Benja) · 2012-12-13T01:11:20.015Z · score: 8 (13 votes) · LW · GW · Legacy · 4 commentsThis is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!

The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.

*

I represent preference relations as total preorders on a simplex ; define , , and in the obvious ways (e.g., iff both and , and iff but *not* ). Write for the 'th unit vector in .

In the following, I will always assume that *satisfies the independence axiom*: that is, for all and , we have if and only if . Note that the analogous statement with weak preferences follows from this: holds iff , which by independence is equivalent to , which is just .

**Lemma 1 **(more of a good thing is always better)**.** *If ** and , **then .*

*Proof. *Let . Then, and . Thus, the result follows from independence applied to , , , and .

**Lemma 2.** *If and , then there** is a unique such that for and for .*

*Proof. *Let be the supremum of all such that (note that by assumption, this condition holds for ). Suppose that . Then there is an such that . By Lemma 1, we have , and the first assertion follows.

Suppose now that . Then by definition of , we do * not* have , which means that we have , which was the second assertion.

Finally, uniqueness is obvious, because if both and satisfied the condition, we would have .

**Definition 3.** is *much better* than , notation or , if there are neighbourhoods of and of (in the relative topology of ) such that we have for all and . (In other words, the graph of is the interior of the graph of .) Write or when ( is *not much better* than ), and ( is *about as good* as ) when both and .

**Theorem 4** (existence of a utility function). *There is a ** such that for all *,

*Unless for all and , there are such that .*

*Proof. *Let be a worst and a best outcome, i.e. let be such that for all . If , then for all , and by repeated applications of independence we get for all , and therefore again for all , and we can simply choose .

Thus, suppose that . In this case, let be such that for every , equals the unique provided by Lemma 2 applied to and . Because of Lemma 1, . Let .

We first show that implies . For every , we either have , in which case by Lemma 2 we have for arbitrarily small , or we have , in which case we set and find . Set . Now, by independence applied times, we have ; analogously, we obtain for arbitrarily small . Thus, using and Lemma 1, and therefore as claimed. Now note that if , then this continues to hold for and in a sufficiently small neighbourhood of and , and therefore we have .

Now suppose that . Since we have and , we can find points and arbitrarily close to and such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, by the preceding paragraph. But this implies that , which completes the proof.

**Corollary 5.** * is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.*

*Proof. *It is well-known (and straightforward to check) that this follows from the assertion of the theorem.

**Corollary 6.** * is unique up to affine transformations.*

*Proof. *Since is a VNM utility function for , this follows from the analogous result for that case.

**Corollary 7.** *Unless for all , for all the set has lower dimension than (i.e., it is the intersection of with a lower-dimensional subspace of ).*

*Proof. *First, note that the assumption implies that . Let be given by , , and note that is the intersection of the hyperplane with the closed positive orthant . By the theorem, is not parallel to , so the hyperplane is not parallel to . It follows that has dimension , and therefore can have at most this dimension. (It can have smaller dimension or be the empty set if only touches or lies entirely outside the positive orthant.)

## 4 comments

Comments sorted by top scores.

Just some feedback: I'm probably about average in math skill here (or maybe below average, the most math I've done is calculus 10 years ago) and with some work I'm able to get through some of this. When I first looked at it I didn't understand anything but reading the wikipedia on VNM utility theorem and the always helpful List of Mathematical Symbols I was able to get through most of Lemma 1. I was able to prove it to my satisfaction using the solver in Excel and can follow most of the proof up until "Thus, the result follows", I don't see how it follows.

Are there any recommendations for slowly improving math skills other than just trying to work through things like this when time permits? Are people willing to host a Google Hangout where they walk through things such as this for those of us who are curious but have difficulty working it out all on our own (I know I probably could work it all out given enough time, but its hard to be motivated enough to make the time. When I first found the site, I didn't know about Bayes theorem or any of the probability theory notation, but I saw its importance and so made sure to spend the time so I can follow it and work it out on my own when needed).

**[deleted]**· 2012-12-18T14:31:51.408Z · score: 0 (0 votes) · LW(p) · GW(p)

I think it's a general problem in the way mathematics is taught (at least around here in Finland and I'm basing this on considerably low amount of empirical observations) that the *language* of mathematics is not very well elaborated: What each symbol stands for, what's the logical rule set for using each symbol, like for an example if you have the symbol for sigma to stand for summation - and so even if the students *could* use their math skills *in principle* they end up stumbling in practice due to not know how to interpet some statement using symbols they're not entirely familiar with. Another similar problem in my opinion is the lack of emphasis on understanding what's actually happening on the abstract level. *Why does this work*? How do you exploit this rule to arrive at a truthful and more revealing answer?

I'm not sure if this is any good but here's how I personally like to go about learning math - which I've not really done much.

Try and understand what's going on in the abstract level. Involves questions like: Why does this work? What's the rule? What's the exception? Is there a fixed relation?

Understanding the computing part, the operation. What do you actually do to achieve the wanted outcome. Do you add numbers? How do you derive a function for an example?

Understanding the language of mathematics related to the concept. What are the symbols involved? For an example involved with functions, derivals, integrals? What are the rules used in the language in particular? (For an example if you have the symbol of Sigma and below it are i=n what do the symbols below it stand for? What's the rule with the symbol?)

Doing a full operation using the so far obtained knowledge to perform a computation. So you start with some kind of data and end up into a final position where the data has been transformed or simplified with the help of the mathematics. ( If at this point you still don't know what's going on, try to think backwards to 1. )

Application of the developed skill At this point you understand the mathskill in question well enough to have attached sort of "handles" to it. So you can understand the concept well enough to recognize it in a different environment and use the ability to handle a subset of data from a larger sample. To make an example if you understand trigonometry very well then handling sectors and related angles with circles is much easier. Both 1. and 5. involve sufficient understanding to be able to predict what kind of change has been caused on the initial position to some extent after the calculation or the operation.

If you dont understand 1. well enough then you just jump to 2. and 3. which go mostly hand in hand, to actually execute the operation you usually need to know both the language and the operation, but not necessarily. Then once you arrive at point 4. you can try to reason backwards from the answer and get the idea at 1. The abstract information is crucial to be able to effectively apply the information when needed like described in part 5. So: (Skip if must) Step 1 - Step 2&3 - Step 4. - (back to) Step 1. or Step 5.

I also think very brilliant students obtain the contents of these subtopics automatically when studying mathematics, but at least personally I'm notne of them and I think an analytical method like this comes in handy.

What does someone else think about this? In particular someone who already knows lots of math, does this make sense?

I'm sure there is a better place to post a wall of math than this forum. My guess is that >99% of the readers have not been able to get through the first paragraph. A link to a blog or a preprint, maybe? You state that this is original research, so publish it.

I'm sure there is a better place to post a wall of math than this forum.

It's published as an *appendix*, in the discussion section. People aren't expected to read the appendices unless they are interested in why something is assumed in the main paper or the details happen to be important to their work somehow. I approve of splitting of the mathematical details into an appendix like this---the only downside is the cost to the author of producing the work.