Math appendix for: "Why you must maximize expected utility"

benja

Math appendix for: "Why you must maximize expected utility"

post by Benya (Benja) · 2012-12-13T01:11:20.015Z · LW · GW · Legacy · 4 comments

4 comments

This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!

The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.

I represent preference relations as total preorders $\preccurlyeq$ on a simplex $\Delta_N$ ; define $\prec$ , $\sim$ , $\succcurlyeq$ and $\succ$ in the obvious ways (e.g., $x\sim y$ iff both $x\preccurlyeq y$ and $y\preccurlyeq x$ , and $x\prec y$ iff $x\preccurlyeq y$ but not $y\preccurlyeq x$ ). Write $e^i$ for the $i$ 'th unit vector in $\mathbb{R}^N$ .

In the following, I will always assume that $\preccurlyeq$ satisfies the independence axiom: that is, for all $x,y,z\in\Delta_N$ and $p\in(0,1]$ , we have $x\prec y$ if and only if $px + (1-p)z \prec py + (1-p)z$ . Note that the analogous statement with weak preferences follows from this: $x\preccurlyeq y$ holds iff $y\not\prec x$ , which by independence is equivalent to $py + (1-p)z \not\prec px + (1-p)z$ , which is just $px + (1-p)z \preccurlyeq py + (1-p)z$ .

Lemma 1 (more of a good thing is always better). If $x\prec y$ and $0\le p < q \le 1$ , then $(1-p)x + py\prec (1-q)x + qy$ .

Proof. Let $r := q-p$ . Then, $(1-p)x + py = \big((1-q)x + py\big) + rx$ and $(1-q)x + qy = \big((1-q)x + py\big) + ry$ . Thus, the result follows from independence applied to $x$ , $y$ , $\textstyle\frac{1}{1-r}\big((1-q)x + py\big)$ , and $r$ . $\square$

Lemma 2. If $x\preccurlyeq y\preccurlyeq z$ and $x\prec z$ , then there is a unique $p\in[0,1]$ such that $(1-q)x + qz \prec y$ for $q\in[0,p)$ and $y\prec (1-q)x + qz$ for $q\in(p,1]$ .

Proof. Let $p$ be the supremum of all $r\in[0,1]$ such that $(1-r)x + rz\preccurlyeq y$ (note that by assumption, this condition holds for $r=0$ ). Suppose that $0\le q<p$ . Then there is an $r\in(q,p]$ such that $(1-r)x + rz\preccurlyeq y$ . By Lemma 1, we have $(1-q)x + qz \prec (1-r)x + rz$ , and the first assertion follows.

Suppose now that $p < q \le 1$ . Then by definition of $p$ , we do not have $(1-q)x + qz\preccurlyeq y$ , which means that we have $(1-q)x + qz\succ y$ , which was the second assertion.

Finally, uniqueness is obvious, because if both $p$ and $p'$ satisfied the condition, we would have $\textstyle y \prec \big(1 - \frac{p+p'}2\big)x + \frac{p+p'}2z \prec y$ . $\square$

Definition 3. $x$ is much better than $y$ , notation $x\succ_* y$ or $y\prec_* x$ , if there are neighbourhoods $U$ of $x$ and $V$ of $y$ (in the relative topology of $\Delta_N$ ) such that we have $x' \succ y'$ for all $x'\in U$ and $y'\in V$ . (In other words, the graph of $\succ_*$ is the interior of the graph of $\succ$ .) Write $x\preccurlyeq_* y$ or $y\succcurlyeq_* x$ when $x\nsucc_* y$ ( $x$ is not much better than $y$ ), and $x\sim_* y$ ( $x$ is about as good as $y$ ) when both $x\preccurlyeq_* y$ and $x\succcurlyeq_* y$ .

Theorem 4 (existence of a utility function). There is a $u\in\mathbb{R}^N$ such that for all $x,y\in\Delta_N$ ,

$\sum_i x_i\,u_i \;<\; \sum_i y_i\,u_i\;\;\iff\;\; x\prec_* y\;\;\implies\;\;x\prec y.$

Unless $x\sim y$ for all $x$ and $y$ , there are $i,j\in\{1,\dotsc,N\}$ such that $u_i\neq u_j$ .

Proof. Let $i$ be a worst and $j$ a best outcome, i.e. let $i,j\in\{1,\dotsc,N\}$ be such that $e^i\preccurlyeq e^k\preccurlyeq e^j$ for all $k\in\{1,\dotsc,N\}$ . If $e^i\sim e^j$ , then $e^i \sim e^k$ for all $k$ , and by repeated applications of independence we get $x\sim e^i\sim y$ for all $x,y\in\Delta_N$ , and therefore $x\sim_* y$ again for all $x,y\in\Delta_N$ , and we can simply choose $u=0$ .

Thus, suppose that $e^i\prec e^j$ . In this case, let $u$ be such that for every $k\in\{1,\dotsc,N\}$ , $u_k$ equals the unique $p$ provided by Lemma 2 applied to $e^i\preccurlyeq e^k\preccurlyeq e^j$ and $e^i\prec e^j$ . Because of Lemma 1, $u_i = 0 \neq 1 = u_j$ . Let $f(r) := (1-r)e^i + re^j$ .

We first show that $\textstyle p := \sum_k x_k\,u_k < \sum_k y_k\,u_k =: q$ implies $x\prec y$ . For every $k$ , we either have $u_k < 1$ , in which case by Lemma 2 we have $e^k \prec f(u_k + \epsilon_k)$ for arbitrarily small $\epsilon_k > 0$ , or we have $u_k = 1$ , in which case we set $\epsilon_k := 0$ and find $e^k\preccurlyeq e^j = f(u_k + \epsilon_k)$ . Set $\textstyle \epsilon := \sum_k x_k\,\epsilon_k$ . Now, by independence applied $N-1$ times, we have $\textstyle x = \sum_k x_k\,e^k \preccurlyeq \sum_k x_k f(u_k + \epsilon_k) = f(p+\epsilon)$ ; analogously, we obtain $y \succcurlyeq f(q-\delta)$ for arbitrarily small $\delta > 0$ . Thus, using $p<q$ and Lemma 1, $x\preccurlyeq f(p+\epsilon)\prec f(q-\delta)\preccurlyeq y$ and therefore $x\prec y$ as claimed. Now note that if $\textstyle\sum_k x_k\,u_k < \sum_k y_k\,u_k$ , then this continues to hold for $x'$ and $y'$ in a sufficiently small neighbourhood of $x$ and $y$ , and therefore we have $x\prec_* y$ .

Now suppose that $\textstyle \sum_k x_k\,u_k \ge \sum_k y_k\,u_k$ . Since we have $u_i = 0$ and $u_j = 1$ , we can find points $x'$ and $y'$ arbitrarily close to $x$ and $y$ such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, $x'\succ y'$ by the preceding paragraph. But this implies that $x\not\prec_* y$ , which completes the proof. $\square$

Corollary 5. $\preccurlyeq_*$ is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.

Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem. $\square$

Corollary 6. $u$ is unique up to affine transformations.

Proof. Since $u$ is a VNM utility function for $\preccurlyeq_*$ , this follows from the analogous result for that case. $\square$

Corollary 7. Unless $x\sim y$ for all $x,y\in\Delta_N$ , for all $r\in\mathbb{R}$ the set $\textstyle \{x\in\Delta_N : \sum_i x_i\,u_i = r\}$ has lower dimension than $\Delta_N$ (i.e., it is the intersection of $\Delta_N$ with a lower-dimensional subspace of $\mathbb{R}^N$ ).

Proof. First, note that the assumption implies that $N\ge 2$ . Let $v\in\mathbb{R}^N$ be given by $v_i = 1$ , $\forall i$ , and note that $\Delta_N$ is the intersection of the hyperplane $A := \{x\in\mathbb{R}^N : x\cdot v = 1\}$ with the closed positive orthant $\mathbb{R}^N_+$ . By the theorem, $u$ is not parallel to $v$ , so the hyperplane $B_r := \{x\in\mathbb{R}^N : x\cdot u = r\}$ is not parallel to $A$ . It follows that $A\cap B_r$ has dimension $N-2$ , and therefore $\textstyle\{x\in\Delta_N : \sum_i x_i\,u_i = r\} \;=\; A\cap B_r\cap\mathbb{R}^N_+$ can have at most this dimension. (It can have smaller dimension or be the empty set if $A\cap B_r$ only touches or lies entirely outside the positive orthant.) $\square$

4 comments

Comments sorted by top scores.

comment by BlueSun · 2012-12-13T22:12:10.001Z · LW(p) · GW(p)

Just some feedback: I'm probably about average in math skill here (or maybe below average, the most math I've done is calculus 10 years ago) and with some work I'm able to get through some of this. When I first looked at it I didn't understand anything but reading the wikipedia on VNM utility theorem and the always helpful List of Mathematical Symbols I was able to get through most of Lemma 1. I was able to prove it to my satisfaction using the solver in Excel and can follow most of the proof up until "Thus, the result follows", I don't see how it follows.

Are there any recommendations for slowly improving math skills other than just trying to work through things like this when time permits? Are people willing to host a Google Hangout where they walk through things such as this for those of us who are curious but have difficulty working it out all on our own (I know I probably could work it all out given enough time, but its hard to be motivated enough to make the time. When I first found the site, I didn't know about Bayes theorem or any of the probability theory notation, but I saw its importance and so made sure to spend the time so I can follow it and work it out on my own when needed).

Replies from: None

↑ comment by [deleted] · 2012-12-18T14:31:51.408Z · LW(p) · GW(p)

I think it's a general problem in the way mathematics is taught (at least around here in Finland and I'm basing this on considerably low amount of empirical observations) that the language of mathematics is not very well elaborated: What each symbol stands for, what's the logical rule set for using each symbol, like for an example if you have the symbol for sigma to stand for summation - and so even if the students could use their math skills in principle they end up stumbling in practice due to not know how to interpet some statement using symbols they're not entirely familiar with. Another similar problem in my opinion is the lack of emphasis on understanding what's actually happening on the abstract level. Why does this work? How do you exploit this rule to arrive at a truthful and more revealing answer?

I'm not sure if this is any good but here's how I personally like to go about learning math - which I've not really done much.

Try and understand what's going on in the abstract level. Involves questions like: Why does this work? What's the rule? What's the exception? Is there a fixed relation?
Understanding the computing part, the operation. What do you actually do to achieve the wanted outcome. Do you add numbers? How do you derive a function for an example?
Understanding the language of mathematics related to the concept. What are the symbols involved? For an example involved with functions, derivals, integrals? What are the rules used in the language in particular? (For an example if you have the symbol of Sigma and below it are i=n what do the symbols below it stand for? What's the rule with the symbol?)
Doing a full operation using the so far obtained knowledge to perform a computation. So you start with some kind of data and end up into a final position where the data has been transformed or simplified with the help of the mathematics. ( If at this point you still don't know what's going on, try to think backwards to 1. )
Application of the developed skill At this point you understand the mathskill in question well enough to have attached sort of "handles" to it. So you can understand the concept well enough to recognize it in a different environment and use the ability to handle a subset of data from a larger sample. To make an example if you understand trigonometry very well then handling sectors and related angles with circles is much easier. Both 1. and 5. involve sufficient understanding to be able to predict what kind of change has been caused on the initial position to some extent after the calculation or the operation.

If you dont understand 1. well enough then you just jump to 2. and 3. which go mostly hand in hand, to actually execute the operation you usually need to know both the language and the operation, but not necessarily. Then once you arrive at point 4. you can try to reason backwards from the answer and get the idea at 1. The abstract information is crucial to be able to effectively apply the information when needed like described in part 5. So: (Skip if must) Step 1 - Step 2&3 - Step 4. - (back to) Step 1. or Step 5.

I also think very brilliant students obtain the contents of these subtopics automatically when studying mathematics, but at least personally I'm notne of them and I think an analytical method like this comes in handy.

What does someone else think about this? In particular someone who already knows lots of math, does this make sense?

comment by Shmi (shminux) · 2012-12-13T16:06:17.166Z · LW(p) · GW(p)

I'm sure there is a better place to post a wall of math than this forum. My guess is that >99% of the readers have not been able to get through the first paragraph. A link to a blog or a preprint, maybe? You state that this is original research, so publish it.

Replies from: wedrifid

↑ comment by wedrifid · 2012-12-13T22:56:04.721Z · LW(p) · GW(p)

I'm sure there is a better place to post a wall of math than this forum.

It's published as an appendix, in the discussion section. People aren't expected to read the appendices unless they are interested in why something is assumed in the main paper or the details happen to be important to their work somehow. I approve of splitting of the mathematical details into an appendix like this---the only downside is the cost to the author of producing the work.

Math appendix for: "Why you must maximize expected utility"

Contents

4 comments