Unconditionally Convergent Expected Utility

post by DanielLC · 2011-06-11T20:00:23.355Z · LW · GW · Legacy · 32 comments

Expected utility can be expressed as the sum ΣP(Xn)U(Xn). Suppose P(Xn) = 2-n, and U(Xn) = (-2)n/n. Then expected utility = Σ2-n(-2)n/n = Σ(-1)n/n = -1+1/2-1/3+1/4-... = -ln(2). Except there's no obvious order to add it. You could just as well say it's -1+1/2+1/4+1/6+1/8-1/3+1/10+1/12+1/14+1/16-1/5+... = 0. The sum depends on the order you add it. This is known as conditional convergence.

This is clearly something we want to avoid. Suppose my priors have an unconditionally convergent expected utility. This would mean that ΣP(Xn)|U(Xn)| converges. Now suppose I observe evidence Y. ΣP(Xn|Y)|U(Xn)| = Σ|U(Xn)|P(Xn∩Y)/P(Y) ≤ Σ|U(Xn)|P(Xn)/P(Y) = 1/P(Y)·ΣP(Xn)|U(Xn)|. As long as P(Y) is nonzero, this must also converge.

If my prior expected utility is unconditionally convergent, then given any finite amount of evidence, so is my posterior.

This means I only have to come up with a nice prior, and I'll never have to worry about evidence braking expected utility.

I suspect that this can be made even more powerful, and given any amount of evidence, finite or otherwise, I will almost surely have an unconditionally convergent posterior. Anyone want to prove it?

Now let's look at Pascal's Mugging. The problem here seems to be that someone could very easily give you an arbitrarily powerful threat. However, in order for expected utility to converge unconditionally, either carrying out the threat must get unlikely faster than the disutility increases, or the probability of the threat itself must get unlikely that fast. In other words, either someone threatening 3^^^3 people is so unlikely to carry it out to make it non-threatening, or the threat itself must be so difficult to make that you don't have to worry about it.

32 comments

Comments sorted by top scores.

comment by Sniffnoy · 2011-06-12T01:09:24.222Z · LW(p) · GW(p)

Quick terminology notes: If the expected utility of an action would appear to be given by a conditionally convergent sum, it doesn't have an expected utility. Such a function isn't integrable. What you're talking about in general is properly/more generally put as a distinction between functions that are integrable and functions that aren't.

comment by jsteinhardt · 2011-06-12T10:55:00.315Z · LW(p) · GW(p)

I'm upvoting this post because it caused me to rethink some of my thoughts about utility and Pascal's mugging, something that very few of these posts [on Pascal's mugging] so far have done.

However, I am now somewhat confused. Doesn't the Archimedean property for utility (axiom P3' here) automatically imply that utility should be bounded? My argument for this would be as follows. I'm using the terminology from the Wikipedia page, but in case you don't want to click the link, u is utility, Eu is expected utility.

  • If Eu(N) was ever infinite, then there would be no epsilon > 0 such that (1-epsilon)L+epsilon N < M (here we start with some fixed L,M with L<M).

  • von Neumann utility is a statement about all lotteries, not just the ones induced by our beliefs. So in particular we have to consider the lottery A(E) where a fixed event E happens with probability 1. Since Eu(A(E)) is finite, and u(E) = Eu(A(E)), u(E) must be finite.

  • Finally, suppose that there was a sequence of events E1,E2,... whose utilities grew without bound. We can always pick out a subsequence F1,F2,... such that u(F(n+1)) >= 2u(F(n)) for all n. Then consider the lottery where Fn occurs with probability 2^(-n). This has infinite expected utility since its expected utility is sum(n=1 to infty) 2^(-n)u(F(n)) > sum(n=1 to infty) u(F(1)) = infty

Is there a flaw in this argument? Are the VNM utility axioms too strong (should preferences only need to be defined over lotteries that can actually be induced by updating our priors based on some amount of evidence; or alternately, is the Archimedean property too strong)?

Or is utility bounded? While I personally think that human preferences should lead to bounded utility functions, the conclusion that all rational agents must have bounded utility functions seems possibly too strong.

Replies from: Douglas_Knight, DanielLC
comment by Douglas_Knight · 2011-06-12T21:58:24.024Z · LW(p) · GW(p)

The vNM argument only uses preference over lotteries with finitely many possible outcomes to assign utilities to pure outcomes. One can use measure theory arguments to pin down the expected utilities of other lotteries. That some lotteries have infinite value does not seem to me to be a big problem.

comment by DanielLC · 2011-06-12T19:43:28.962Z · LW(p) · GW(p)

I suppose what it comes down to is that I don't believe a conditionally convergent expected utility is possible, and thus does not count as a possible lottery.

I'd say that even this idea is too strong, and I'm only willing to go with it because if I break expected utility, I have no idea what to do.

Replies from: jsteinhardt
comment by jsteinhardt · 2011-06-12T20:07:35.473Z · LW(p) · GW(p)

Can you clarify what you mean by "this idea"?

Also, doesn't the set of "possible lotteries" have to be the space of lotteries about which VNM can be proved? Probably there are multiple such spaces, but I'm not sure what conditions they have to satisfy. Also, since utilities don't come until after we've chosen a space of lotteries, we can't define the space of lotteries to be "those for which expected utility converges", so I'm not quite sure what you are looking for.

Replies from: DanielLC
comment by DanielLC · 2011-06-12T22:58:55.678Z · LW(p) · GW(p)

Can you clarify what you mean by "this idea"?

Using priors that allow for unconditionally convergent expected utility.

Also, doesn't the set of "possible lotteries" have to be the space of lotteries about which VNM can be proved?

I don't understand what you mean. The set I'm using is the largest possible set for which all of those axioms are true. If I use one with utility that increases/decreases without limit, axiom 3 (along with 3') is false. If I use one with utility that doesn't converge unconditionally, axiom 2 is false.

comment by RHollerith (rhollerith_dot_com) · 2011-06-11T20:31:35.958Z · LW(p) · GW(p)

My eyes thank you for not specifying an image file for every mathematical expression in your post.

comment by endoself · 2011-06-11T23:13:55.419Z · LW(p) · GW(p)

However, in order for expected utility to converge unconditionally, either carrying out the threat must get unlikely faster than the disutility increases, or the probability of the threat itself must get unlikely that fast. In other words, either someone threatening 3^^^3 people is so unlikely to carry it out to make it non-threatening, or the threat itself must be so difficult to make that you don't have to worry about it.

Except you get this result by making up probabilities rather than arriving at them through any rational process. This has been discussed here many times before, including in the sequences and very recently. Downvoted.

Replies from: None, DanielLC
comment by [deleted] · 2011-06-12T00:21:30.623Z · LW(p) · GW(p)

Except you get this result by making up probabilities rather than arriving at them through any rational process. This has been discussed here many times before, including in the sequences and very recently. Downvoted.

I disagree that the above is not a new contribution to thought on this. The issue at stake has to do with restricting the set of permissible utility functions. If we have a probability measure induced by our empirical observations, then it doesn't do any good from a rationalism standpoint to allow non-summable or non-integrable utility functions with respect to that probability measure.

This example shows one such case. Suppose Nature hands me a probability distribution over some sequence of events, P(Xn) = 2^{-n}. Then there is a meta-probability assignment over the space of utility functions I can assign to the events Xn and it involves the resulting expectations. You can think of it like a Dirichlet distribution.

It makes no sense to speak of utility functions that aren't L1(problem domain) (respectively, l1(problem domain)) under the probability measure you believe to be true about the situation.

I think Pascal's mugging suffers from this issue. For any valid probability distribution over the number of lives at stake, I can produce utility functions for valuing lives that produce arbitrarily different output decisions. In reality, though, you can't decouple the choice of a "permissible" utility function from the exact same processes that yield some knowledge or model about the probability distribution over lives threatened.

I could go get some evidence about probability of lives threatened, then internally reflect on how I should choose to assign value to lives, then compute joint probability distributions over both the threatened lives and all my different options for utility functions on the space of threatened lives, then internally reflect on how to value joint configurations of (threatened lives, utility functions over spaces of threatened lives), then compute joint probabilities over the 2-tuple consisting of ( (threatened lives, utility functions over threatened lives), utility functions over 2-tuples of (threatened lives, utility functions over threatened lives) ), and so on ad infinitum.

At some point, because brains have finite computing resources and (at least human brains) have a machine epsilon, I just have to stop this recursive computation, draw a line in the sand, accept some conditional probabilities some at some deep ply of the computation, and then integrate my way back all the way down to the decision of choosing a utility function.

Nothing stops me from choosing a utility function that, when coupled with the probabilities that Nature gives me, causes my expectation to fail to be summable (integrable). I could, after all, act like The Ultimate Pessimist and assign a utility of -\infty to every outcome, for example. More realistically, I could choose a utility function that has the same shape as a Cauchy distribution. But in the landscape of meta-goals, or even just correspondence of utility functions to reality, this would be bad for me. How can I make decisions about which bets to accept if I am in a situation where Nature hands me an improper prior uniform probability of a set of different outcomes, and I choose to have a Cauchy distribution of personal utility over that set of outcomes? The idea of an expectation fails to even exist in that scenario. Hence, scalar multiples of Cauchy distributions don't make much sense viewed as potential utility functions.

The example here of conditional convergence is a very elementary one. More complicated issues like this arise when you think in terms of probability theory and functional analysis on the space of utility functions. But it's a salient example nonetheless. If we choose utility functions such that the resultant expectation calculation includes a conditionally convergent, or worse non-summable, series, then we can't accept or reject bets in a way that has meaningful correspondence to our perceived actual utility. Hence, implicitly, rationalists must make some time-saving admissibility criteria for what sorts of functions are even allowed to be utility functions.

Getting rid of conditional convergence, or issues of non-measurability and non-integrability, would seem like intuitively plausible first steps in forming utility functions. Similar to the way that Jaynes showed how consistent formulations of belief in terms of wagers was isomorphic to probability theory, we have similar constraints on consistent use of utility functions. But as the Cauchy distribution example above, for utility functions, shows that the restrictions must actually be quite a bit more severe than mere summability.

Replies from: endoself
comment by endoself · 2011-06-12T01:53:20.572Z · LW(p) · GW(p)

The fact that this is a problem does not make anything in the post novel. In the grandparent, I linked to discussions of this problem that touched on everything that you discussed here.

I could go get some evidence about probability of lives threatened, then internally reflect on how I should choose to assign value to lives, then compute joint probability distributions over both the threatened lives and all my different options for utility functions on the space of threatened lives

Since utility functions are only unique modulo affine transforms, you can't combine them using naive expected utility. The correct method to do so is unknown.

Replies from: None, DanielLC
comment by [deleted] · 2011-06-12T02:03:32.769Z · LW(p) · GW(p)

Since utility functions are only unique modulo affine transforms, you can't combine them using naive expected utility. The correct method to do so is unknown.

I'm aware of this, but fail to see how it would change the ability to make probability distributions over the space of utility functions and then take expectations there. Sure, you'd be doing it over equivalence classes of functions, but that's hardly any difficulty. What I am saying is you can assign utility to choices of utility functions: utility functions must inherently be recursive in practice. And so their non-summability (or other technical difficulties) causes immediate problems.

Replies from: Perplexed
comment by Perplexed · 2011-06-12T15:32:42.630Z · LW(p) · GW(p)

Utility functions are not primitive. They are constructed using an algorithm specified by vN&M (or Savage, or A&A). Constructed from preferences over lotteries over outcomes. Preferences are primitive. Priors over states of nature are primitive. Utility functions are constructs. They are not arbitrary.

As has been mentioned, if you constrain preferences using one of the standard vN&M axioms, and if you assume that you can construct a lottery leading to any outcome, then you can prove that outcome utilities are bounded.

I think that the OP needs to be seen as a proposal for constraining the freedom to construct arbitrary lottery-probes. And, if the constraint is properly defined, we can have an algorithm that generates unbounded utilities, but not poorly behaved utilities - utilities which cannot be used to construct expectations that are not unconditionally convergent.

comment by DanielLC · 2011-06-12T02:09:24.438Z · LW(p) · GW(p)

You had one link for changing the expected utility just to make Pascal's mugging go away, and another that seems to be based on the same idea, but has flawed reasoning and a different conclusion.

Replies from: endoself
comment by endoself · 2011-06-13T02:20:52.632Z · LW(p) · GW(p)

The first link was to the comment, not the post; I disagree with the post. The proposal in the second link was qualitatively similar to yours and it failed for the same reason.

comment by DanielLC · 2011-06-12T00:55:31.589Z · LW(p) · GW(p)

Using expected utility is implicitly using such a prior. If you want to use such a prior, how do you suggest replacing the concept of expected utility?

Replies from: endoself
comment by endoself · 2011-06-12T01:44:43.209Z · LW(p) · GW(p)

This is an open problem. I contest certain axioms (P6 and P7).

Replies from: jsteinhardt
comment by jsteinhardt · 2011-06-12T10:37:24.291Z · LW(p) · GW(p)

Do you also contest the Archimedean axiom for von Neumann's formulation of utility?

Replies from: endoself
comment by endoself · 2011-06-13T02:14:15.358Z · LW(p) · GW(p)

Yes. (Well, it's a bit more complicated than that; VNM utility theory doesn't extend to choices with an infinite number of possible outcomes, so I reject the whole system.) I discussed this in more detail in the comments in the linked article. In brief, there is a chance that my utility function is bounded, but I am definitely not willing to bet the universe on it.

Replies from: None, jsteinhardt
comment by [deleted] · 2011-06-14T02:35:15.799Z · LW(p) · GW(p)

VNM definitely does extend to the case of infinitely many outcomes. It requires a continuous utility function, and thus continuous preferences and a topology in outcome space. Why is this additional modeling assumption any more problematic than other VNM axioms?

Replies from: endoself
comment by endoself · 2011-06-14T03:13:01.017Z · LW(p) · GW(p)

In short, because utilities may not converge. The axioms do not assert themselves able to be applied an infinite number of times; if they did, they would run into all the usual problems with infinite series. There are modifications of the VNM theorem that extend infinitely, but they all either must only work for certain infinite sets or must require bounded utility.

Replies from: None
comment by [deleted] · 2011-06-14T03:25:15.589Z · LW(p) · GW(p)

This is exactly the stuff I was talking about. I mean, basic measure theory determines what functions you can even talk about. If you have a probability measure P, then utilities that are not in L^{1}_{P}(outcome domain) make no sense. You may need some more restrictions than that, but one can't talk about expected utility if the utility is not at least L1. You cannot define a function w.r.t. a probability measure than has a support set of infinite Lebesgue measure, is unbounded, and has a defined expectation (the L1 norm)... unless you know that the rate of growth of the unbounded utility function behaves in certain nice ways when compared to the decay of the probability measure. You might be already saying this, but this much simply can't be changed, no matter what you do. If your utility function is unbounded, then the probabilities for certain outcomes must decay faster than your utility grows. Since probabilities are given by nature and utilities (sort of) aren't, my guess would be that utilities have to decay quickly (or, conversely, probabilities have to decay super quickly).

Replies from: endoself
comment by endoself · 2011-06-14T03:44:11.977Z · LW(p) · GW(p)

If your utility function is unbounded, then the probabilities for certain outcomes must decay faster than your utility grows. Since probabilities are given by nature and utilities (sort of) aren't, my guess would be that utilities have to decay quickly (or, conversely, probabilities have to decay super quickly).

Nature does not require that it is possible to make utility function converge at all. Also, nature neither requires that taking expectations be the only way of comparing choices, nor that utilities be real.

Replies from: None
comment by [deleted] · 2011-06-14T04:11:26.546Z · LW(p) · GW(p)

I totally agree and never meant to imply otherwise. But just as any consistent system of degrees of belief can be put into correspondence with the axioms of probability, so there are certain stipulations about what can reasonably called a utility function.

I would argue that if you meet a conscious agent and your model of their utility function says that it doesn't converge (in the appropriate L1 norm of the appropriate modeled probability space) then something's wrong with that model of utility function... not with the assumption that utility functions should converge. There are many subtleties, I'm sure, but non-integrable utility functions seem futile to me. If something can be well-modeled by a non-integrable utility function, then I'm fine updating my position, but in years of learning and teaching probability theory, I've never encountered anything that would convince me of that.

Replies from: endoself, None
comment by endoself · 2011-06-15T18:43:40.820Z · LW(p) · GW(p)

Doesn't this all assume that utility functions are real-valued?

Replies from: None
comment by [deleted] · 2011-06-16T00:53:51.317Z · LW(p) · GW(p)

No, all of the integrability theory (w.r.t. probability measures) extends straightforwardly to complex valued functions. See this and this.

Replies from: endoself
comment by endoself · 2011-06-19T17:56:24.651Z · LW(p) · GW(p)

Yes, good point. Is there any study of the most general objects to which integrability theory applies? Also, are you familiar with Martin Kruskal's work on generalizing calculus to the surreal numbers? I am having difficulty locating any of his papers.

Replies from: None
comment by [deleted] · 2011-06-19T18:16:02.186Z · LW(p) · GW(p)

What comes to my mind are Bochner integrals and random elements. I'm not sure how much integrability theory one can develop outside of a Banach space, although you can get interesting fractal type integrals when dealing with Hausdorff measure. Integrability theory is really just an extension of measure theory, which was pinned down in painstaking detail by Lebesgue, Caratheodory, Perron, Henstock, and Kurzweil (no relation to the singularity Kurzweil). The Henstock-Kurzweil (HK) integral is the most generalized integral over the reals and complexs that preserves certain nice properties, like the fundamental theorem of calculus. The name of the game in integration theory was never an attempt to find the most abstract workable definitions of integration, but rather to see under what general assumptions you could get physically meaningful results, like mean value theorem or fundamental theorem of calculus, to hold. Complex integration theory, especially in higher dimensions shattered a lot of the preconceived notions of how functions should behave.

In looking up surreal numbers, it appears that Conway and Knuth invented them. I was surprised to learn that the hyperreal numbers (developed by Abraham Robinson) are contained in the surreals. To my knowledge, which is a bit limited because I focus more on applied math and so I am probably not as familiar with the literature on something like surreal numbers as other LWers may be, there hasn't been much work, if any, on defining an integral over the surreals. My guess, though, is that such an integral would wind up being an unsatisfyingly trivial extension of integration over the regular reals, as is the case for hyperreals.

I'll definitely take a look at Kruskal's papers and see what he's come up with.

Replies from: endoself
comment by endoself · 2011-06-19T22:14:01.597Z · LW(p) · GW(p)

I was surprised to learn that the hyperreal numbers (developed by Abraham Robinson) are contained in the surreals.

Every ordered field is contained within the surreals, which is why I find them promising for utility theory. The surreals themselves are not a field but a Field, since they form a proper class.

comment by [deleted] · 2011-06-14T04:21:20.787Z · LW(p) · GW(p)

Another point worth noting is that on a set D of finite measure (which any measurable subset of a probability space is), L^{N}(D) is contained in L^{N-1}(D), and so if the first moment fails to exist (non-integrable, no defined expectation) then all higher moments fail and computation of order statistics fails. Of course nature doesn't have to be modeled by statistics, but you'd be hard pressed to out-perform simple axiomatic formulations that just assume a topolgy, continuous preference functions, and get on with it and have access to higher order moments.

comment by jsteinhardt · 2011-06-13T07:29:38.220Z · LW(p) · GW(p)

How do you construct utility without the VNM axioms? Are there less strong axioms for which a VNM-like result holds?

EDIT: Sorry if this is covered in the comments in the other article, I'm being a bit lazy here and not reading through all of your comments there in detail.

Replies from: endoself
comment by endoself · 2011-06-14T01:57:04.741Z · LW(p) · GW(p)

How do you construct utility without the VNM axioms?

I don't yet. :) I have a few reason to think that it has a good chance of being possible, but it has not been done.

Replies from: jsteinhardt
comment by jsteinhardt · 2011-06-14T07:29:14.983Z · LW(p) · GW(p)

Okay. If you end up being successful, I would be quite interested to know about it. (A counterexample would also be interesting, actually probably more interesting since it is less expected.)