Kelly *is* (just) about logarithmic utility
post by abramdemski · 2021-03-01T20:02:08.300Z · LW · GW · 26 commentsContents
1: It's About Repeated Bets 2: It's About Optimizing Typical Outcomes 3: It's About Time-Averaging Rather Than Ensemble-Averaging 4: It's About Convergent Instrumental Goals 5: It's About Beating Everyone Else Conclusion: To Kelly, Or Not To Kelly? Footnotes None 26 comments
This post is a response to SimonM's post, Kelly isn’t (just) about logarithmic utility [LW · GW]. It's an edited and extended version of some of my comments there.
To summarize the whole idea of this post: I'm going to argue that any argument in favor of the Kelly formula has to go through an implication that your utility is logarithmic in money, at some point. If it seems not to, it's either:
- mistaken
- cleverly hiding the implication
- some mind-blowing argument I haven't seen before.
Actually, the post I'm responding to already mentioned one argument in this third category, which I'll mention later. But for the most part I think the point still stands: the best reasons to suppose Kelly is a good heuristic go through arguing logarithmic utility.
The main point of this post is to complain about bad arguments for Kelly -- something which I apparently enjoy doing rather a lot. Take that as an attention-conservation warning.
The rest of this post will consider various arguments in favor of the Kelly criterion (either as a decent rule of thumb, or, as the iron law of investment). Each section considers one argument, with a section title hopefully descriptive of the argument considered.
1: It's About Repeated Bets
This argument goes something like: "If you were to make just one bet, the right thing to do would be to maximize expected value; but for repeated bets, if you bet everything, you'll lose all your money quickly. The Kelly strategy adjusts for this."
A real example of this argument, from the comments [LW(p) · GW(p)]:
Kelly maximizes expected geometric growth rate. Therefore over enough bets Kelly maximizes expected, i.e. mean, wealth, not merely median wealth.
This just doesn't work out. Maximizing geometric growth rate is not the same as maximizing mean value. It turns out Kelly favors the first at a severe cost to the second.
Suppose you'd just want to maximize expected money in a single-bet case.
A Bayesian wants to maximize , where is your starting money and is a random variable for the payoff-per-dollar of your strategy. In a two-step scenario, the Bayesian wants to maximize . And so on.
If your preferred one-step strategy is one which maximizes expected money, this means for you. But this allows us to push the expectation inwards. Look at the two-step case: (the last step holds because we assume the random variables are independent). So we maximize the total expected money by maximizing the expected money of and individually.
Similarly for any number of steps: you just maximize the expectation in each step individually.
Note that the resulting behavior will be crazy. If you had a 51% chance of winning a double-or-nothing bet, you'd want to bet all the money you have. By your own probability estimates, you stand a 49% chance of losing everything. From a standard-human perspective, this looks quite financially irresponsible. It gets even worse for repeated bets. The strategy is basically "bet all your money at every opportunity, until you lose everything." Losing everything would become a virtual certainty after only a few bets -- but the expectation maximizer doesn't care. The expectation maximizer happily trades away the majority of worlds, in return for amassing exponentially huge sums in the lucky world where they keep winning.
("And that's the right thing to do, for their values!" says the one.
"Is it, though?" says the other. "That's putting the cart before the horse. In Bayesian utility theory, you first figure out what the preferences are, and then you figure out a utility function to represent those preferences. You shouldn't just go from caring about money to naively maximizing expected money."
"True," says the one. "But there is a set of preferences which someone could have, which would imply that utility function.")
So, my conclusion? If you don't prefer maximizing expected money for repeated bets (and you probably don't), then you must not prefer it for a single-shot bet, either.
Nothing about expected value maximization breaks when we apply it to multiple decisions across time. The culprit is the utility function. If the Kelly criterion is appealing, it must be because your utility is approximately logarithmic.
(By the way, this section shouldn't be confused for arguing against every possible argument for Kelly that involves repeated bets. The current section is only arguing against the super naive argument which claims Kelly is some kind of adjustment to expectation-maximization to handle the repeated-bets case.)
2: It's About Optimizing Typical Outcomes
I won't fully go through the standard derivation of Kelly, but it goes something like this. First, we suppose a specific type of investment opportunity will pay out with probability . Then, we suppose we face similar opportunities many times. We note that the fraction of successes must be very close to . Then, under that assumption, we do some math to figure out what the optimal investment strategy is.
For example, suppose we play a game: you start with $100, and I start with $. We'll make bets on a fair coin; whatever you wager, I'll multiply it by 3 if the coin comes up heads. However, if the coin comes up tails, I'll take it all. We will flip exactly 100 times. How will you decide how much to bet each time? The Kelly derivation is saying: choose your optimal strategy by assuming there will be exactly 50 heads and 50 tails. This won't be exactly true, but it's probably close; if we flipped even more times, then it would be more certain that we'd be very close to that ratio.
The main point I want to make about this is that it's not much of an argument for using the Kelly formula. Just because most worlds look very close to the 50-50 world, doesn't mean planning optimally for the 50-50 world is close to optimal in general.
Suppose you consider betting half your money every time, in our game. The Kelly evaluation strategy goes like this: when you win, you double your money (because you keep 1/2, and put 1/2 on the line; I triple that sum, to 3/2; combining that with the 1/2 you saved, you've doubled your money). When you lose, you halve your money. Since you'll win and lose equally many times, you'd break even with this strategy, keeping $100; so, it's no better than keeping all your money and never betting a cent. (The Kelly recommendation for this 1/4th; 1/2 is far too much.)
But consider: 51-49 and 49-51 are both quite probable as well, almost as probable as the 50-50 outcome. In one case, you double your money one more time, and halve it one less time. So you'll end with $400. In the other case, just the opposite, so you'll end with $25.
Do these two possibilities cancel out, so that we can act like the 50-50 case is all that matters? Not to an expected-money maximizer; the average between $400 and $25 is $212.50; a significant gain over $100. So now it sounds like this strategy might not be so close to breaking even after all.
Generally speaking, although the ratio of success to failure will converge to , the absolute difference between the true number of successes and the number expected by the Kelly analysis won't converge to zero. And the small deviations in ratio will continue to make large differences in value, like those above. So why should we care that the ratio converges?
Ok. It's hard to justify taking only the single most probable world (like the 50-50 world) and planning for that one. But there are steelmen of the basic argument. As John Wentworth said [LW(p) · GW(p)]:
maximizing modal/median/any-fixed-quantile wealth will all result in the Kelly rule
The discussion above can be thought of as maximizing the mode (choosing the strategy which maximizes the most probable amount of money we might get). John points out that we can choose many other notions of "typical outcome", and get the same result. Just so long as we don't optimize the mean (which gets us the expected-money strategy again), we end up with the Kelly strategy.
Optimizing for the mode/median/quantile is usually a significantly worse idea than optimizing expected utility. For example, optimizing for median utility just means ranking every possibility from worst to best (with a number of copies based on its probability), and judging how well we're doing by looking at the possibility which ends up at the halfway point. This is perfectly consistent with a 49% chance of extreme failure; median-utility-optimization doesn't care how bad the worst 49% is. This is really implausible, as a normative (or descriptive) theory of risk management.
The fixed-quantile-maximizer allows us to tweak this. We can look at the bottom 2% mark (ie an outcome close to the bottom of the list), so that we can't be ignoring a terrible disaster that's got almost 50% probability. But this is insensitive to really good outcomes vs merely moderately good ones, until they cross the 98% probability line. For example, if a task just inherently has a 10% chance of bad-as-it-can-be failure (which there's nothing you can do about), the 2%-quantile-maximizer won't optimize at all; any option will look equally bad to it.
If all of these choices are terrible in general, why should we find them at all plausible in the particular case of justifying the Kelly rule?
So no one should see the Kelly derivation and think "OK, Kelly maximizes long-run profits, great."
Instead, I think the Kelly derivation and related arguments should be seen as much more indirect. We look at this behavior Kelly recommends, and we say to ourselves, "OK, this seems pretty reasonable." And we look at the behavior which expected money-maximization recommends, and we say, "No, that looks entirely unreasonable." And we conclude that our preferences must be closer to those of a Kelly agent than those of an expected-money maximizer.
In other words, we conclude that our utility is approximately logarithmic in money, rather than linear.
(A conclusion which is, by the way, very plausible on other grounds [Economic Growth and Subjective Well-Being: Reassessing the Easterlin Paradox. Betsey Stevenson and Justin Wolfers.].)
3: It's About Time-Averaging Rather Than Ensemble-Averaging
A new approach to economic decision-making called Ergodicity Economics, primarily developed by Ole Peters, attempts to make a much more sophisticated argument similar to "Kelly is about repeated bets". It is not simply the naive argument I dismissed in the first section. I think it's much more interesting. But, ultimately, I think it's not that convincing.
I won't be able to explain the whole thing in this post, but one of the central ideas is time-averaging rather than ensemble-averaging. Ole Peters critiques Bayesians for averaging over possibilities. He states that ensemble averages are appropriate when a lot of things are happening in parallel, like insurance companies tabulating death rates to ensure their income is sufficient for what they'll have to pay out. However, when you're an individual, you only die once. When things happen sequentially, you should be taking the time-average.
Peters' approach addresses many more things than just the Kelly formula -- just to be clear. It's just one particular case we can analyze. But, here's roughly what Peters would do for that case. We can't time-average our profits, since those can keep increasing boundlessly. (As we accumulate more money to bet with, we can make larger bets, so the average winnings could just go to infinity.) So we look at the ratio of our money from one round to the next. This, it turns out, we can time-average. And what strategy maximizes that time-average? Kelly, of course!
My problem with this is mainly that it seems very ad-hoc. I would be somewhat more impressed if someone could prove that there was a unique correct choice of what to maximize, rather than just creatively coming up with something that can be time-averaged, and then declaring that we should maximize that. This seems suspiciously close to just taking a logarithm without any justification.
Not only do we have to choose a function to time-average, we also have to select an appropriate way to turn our situation into an iterated game. This isn't a difficulty in the Kelly case, but in principle, it's another degree of freedom in the analysis, which makes the results feel more arbitrary. (If you're a Bayesian who can represent your life as a big game tree where all the branches end in death, how would you abstract out isolated situations as infinitely-iterated games, in order to apply the Peters construction?)
4: It's About Convergent Instrumental Goals
The basic idea of this argument is similar to the naive first argument we discussed: argue that repeated bets bring you closer and closer to logarithmic utility. Unlike the first attempt, we now grant that linear utility doesn't work this way. But maybe linear utility is a very special case.
Suppose you need $5 to ride the bus. Nothing else is significant to you right now. We can think of your utility as 1u if you have $5 or more, and 0u otherwise.
Now suppose someone approaches you with a bet at the bus stop. It's a double-or-nothing bet. You yourself are 50-50 on the outcome, so ordinarily, it wouldn't be worth taking. In this case, however, the bet could save you: if you have $2.50 or more, the bet could give you a 50% chance at $5, so you could ride the bus!
So now your expected utility, as a function of money in your pocked at the beginning of the scenario, is actually a two-step function: 0u for less that $2.50, 0.5u from $2.50 to <$5, and 1u for $5 and up.
What's important about this scenario is that the bet changed your expected value function. Mossin (who I'll discuss more in a bit) calls this your derived utility function.
In the first section, I showed that this doesn't happen for linear utility functions. If your utility function is linear, your derived utility function is also linear. Mossin calls functions with this property myopic, because they can make each decision as if it was their last.
Log utility is also myopic, just like linear utility: . Maximizing long-term log-money breaks down to maximizing the log-utility of each step.
If you know a little dynamical systems theory, you might be thinking: aha, we know these are fixed points, but is one of these points an attractor? Perhaps risk-averse functions which somewhat resemble logarithmic functions will have derived utility functions which are a bit closer to logarithmic, so that when we face many many bets, our derived utility function will become very close to logarithmic.
If true, this would be a significant vindication of the Kelly rule! Imagine that you're a stock trader who plans to retire at a specific date. Your utility is some function of the amount of money you retire with. The above argument would say: your derived utility function is the result of many, many, bets. So, as long as your utility function meets some basic conditions (eg, isn't linear), your derived utility function will be a close approximation of a logarithm!
Until I read SimonM's post, I actually thought this was true. However, SimonM says the following:
"Optimal Multiperiod Portfolio Policies" (Mossin) shows that for a wide class of utilities, optimising utility of wealth at time t is equivalent to maximising utility at each time-step.
IE, Mossin shows that a lot of utility functions actually are myopic! Not all utility functions, by any means, but enough to break the hope that logarithmic utility is a strong attractor.
So, for a large class of utility functions, the "Kelly is about repeated bets" argument fails just as hard as it did for the linear case.
This is really surprising!
So it appears we can't argue that log utility is a convergent instrumental goal. It's not true that a broad variety of agents will want to Kelly-bet in the short term in order to maximize utility in the long term. This seems like a pretty bad sign for SimonM's argument that Kelly is about repeated bets.
If anyone thinks they can recover this argument, please let me know! It's still possible that some class of functions has this property. It's just that now we know we need to side-step a lot of functions, not just linear functions. So we won't be able to push the argument through with weak assumptions, EG, "any risk-averse function implies approximately logarithmic derived utility". However, it's still possible that all of Mossin's myopic functions are "unrealistic" in some way, so that we can still argue Kelly is an instrumentally convergent strategy for humans.
But I currently see no reason to suspect this.
5: It's About Beating Everyone Else
At the beginning of this post, I mentioned that SimonM did give one result which neither seems mistaken, nor seems to be about logarithmic utility. Here's what SimonM says:
"Competitive optimality". Any other strategy can only beat Kelly at most 1/2 the time. (1/2 is optimal since the other strategy could be Kelly)
This is true because Kelly optimizes median utility. No other strategy can have higher median utility; so, given any other strategy, Kelly must be better at least half the time.
Humans have a pretty big competitive component to our preferences. People enjoy being the richest person they know. So, this could plausibly be relevant for someone's betting strategy, and doesn't require logarithmic utility.
I've also heard it said that a market will evolve to be dominated by Kelly bettors. I think this basically refers to the idea that in the long run, you can expect Kelly bettors to have higher wealth than anyone else with arbitrarily high probability (because Kelly maximizes any quantile, not just median). However, I was curious if Kelly comes out on top in a more literally evolutionary model. The Growth of Relative Wealth and the Kelly Criterion examines this question. I haven't looked at it in-depth, but it appears the answer is "sometimes".
Conclusion: To Kelly, Or Not To Kelly?
My experience writing this post has been a progressive realization that the argument for the Kelly criterion is actually much weaker than I thought. I expected to mainly look at arguments for Kelly and show how they have to go through an assumption tantamount to log-utility. Instead, I spend more time finding that the arguments were just not very good.
- When I responded to ideas about optimizing mode/median/quantiles in the comment section to SimonM's post, my objection was just "it's important to point out that you're optimizing mean/median/quantile, rather than the more usual expected-value". But now I'm like: optimizing mohe/median/quantile is actually a pretty terrible principle, generally speaking! Why would we apply it here?
- I had thought that some form of "instrumental convergence" argument would work, as discussed in section 4. But it appears not!
So before writing this post, my position was: Kelly is optimal in a non-Bayesian sense, which is peculiar, but seems oddly compelling. Within a Bayesian framework, we can "explain" this compellingness by supposing logarithmic utility. So it seems like the utility of money is roughly logarithmic for humans, which, anyway, is plausible on other grounds. Furthermore, risk-averse agents will have logarithmic expected values in practice, anyway, due to instrumental convergence. So it's fair to say Kelly bets are approximately optimal for humans.
But now, I think: Kelly is optimal in a peculiar non-Bayesian sense, but it's pretty terrible. Furthermore, there's no instrumental convergence to Kelly, as far as I can tell. So all I'm left with is: human utility appears to be approximately logarithmic in money, on other grounds.
Overall, this still suggests Kelly is a decent rule of thumb!
I certainly haven't exhausted all the ways people have argued in favor of the Kelly criterion, either. If you think you know of an argument which isn't addressed by any of my objections, let me know.
Footnotes
1:
I should note that while SimonM says "a wide class", Mossin instead says:
it will be shown that the only utility functions allowing myopic decision making are the logarithmic and power functions which we have encountered earlier
IE, Mossin seems to think of it as a narrow class. However, Mossin's result is enough to block any approach I would have taken to proving some kind of convergence result. (I spend some time trying to prove a result while writing this, before I gave up and read Mossin.)
In case you're curious, Mossin's "power functions" are:
Where and are some parameters which appear to be fixed by the surrounding context in the paper (not free), but I haven't fully understood that part yet.
Mossin also discusses a broader class of weakly myopic functions. These utility functions aren't quite the same as their derived functions, but I'm guessing they're also going to be counterexamples to any attempted convergence result.
2:
SimonM realizes that Mossin's result poses a problem for his narrative, at least at a shallow level:
BUT HANG ON! I hear you say. Haven't you just spent the last 5 paragraphs saying that Kelly is about repeated bets? If it all reduces to one period, why all the effort? The point is this: legible utilities need to handle the multi-period nature of the world. I have no (real) sense of what my utility function is, but I do know that I want my actions to be repeatable without risking ruin!
At first, I thought this was waffling and excuses; but on reflection, I entirely agree. As I said in section 2, I think the right argument for Kelly as a heuristic is the fairly indirect one: Kelly seems like a sane way of managing risk of ruin, so my preferences must be closer to logarithmic than (eg) linear.
3:
I confess, although optimizing for mode/median/quantiles is not very good, I still find something interesting about the argument from section 2. The general principle "ignore extremely improbable extreme outcomes" seems like a hack, but it's an interesting hack, since it blocks many philosophical problems (such as Pascal's Wager). And, in this particular case, it seems oddly plausible: it intuitively seems like the expected-money-maximizer is doing something wrong, and a plausible analysis of that wrongness is that it happily trades away all its utility in increasingly many worlds, for a vanishing chance of happiness in tiny slivers of possibility-space. It would be nice to have solid principles which block this behavior. But mode/median/quantile maximization are not plausible as general principles.
Also, even though optimizing for mode/median/quantiles seem individually terrible, optimizing for them all at once is actually pretty good! My criticisms of the individual principles don't apply when they're all together. However, optimizing for all of them at once is not possible in general.
26 comments
Comments sorted by top scores.
comment by Oscar_Cunningham · 2021-03-04T21:37:12.865Z · LW(p) · GW(p)
One other argument I've seen for Kelly is that it's optimal if you start with $a and you want to get to $b as quickly as possible, in the limit of b >> a. (And your utility function is linear in time, i.e. -t.)
You can see why this would lead to Kelly. All good strategies in this game will have somewhat exponential growth of money, so the time taken will be proportional to the logarithm of b/a.
So this is a way in which a logarithmic utility might arise as an instrumental value while optimising for some other goal, albeit not a particularly realistic one.
comment by AlexMennen · 2022-11-12T19:36:32.334Z · LW(p) · GW(p)
Most of the arguments for Kelly betting that you address here seem like strawmen, except for (4), which can be rescued from your objection, and an interpretation of johnswentworth's version of (2), which you actually mention in footnote 3, but seem unfairly dismissive of.
The assumptions according to which your derived utility function is logarithmic is that expected utility doesn't get dominated by negligible-probability tail events. For instance, if you have a linear utility function and you act like it, you almost surely get 0 payout, but your expected payout is enormous because of the negligible-probability tail event in which you win every bet. Even if you do Kelly betting instead, the expected payout is going to be well outside the range of typical payouts, because of the negligible-probability tail event in which you win a statistically improbably number of bets. This won't happen if, for instance, you have a bounded utility function, for which typical payouts from Kelly betting will not get you infinitesimally close to the bounds. The class of myopic utility functions is infinite, yes, but in the grand scheme of things, compared to the space of possible utility functions, is very tiny, and I don't think it should be surprising that there are relatively mild assumptions that imply results that aren't true of most of the myopic utility functions.
In footnote 3, you note that optimizing for all quantiles simultaneously is not possible. Kelly betting comes extremely close to doing this. Your implied objection is, I assume, that the quantifier order is backwards from what would make this really airtight: When comparing Kelly betting to a different strategy, for every quantile, Kelly betting is superior after sufficiently many iterations, but there is no single sufficient number of iterations after which Kelly betting is superior for every quantile; if you have enough iterations such that Kelly betting is better for quantiles 1% through 99%, the alternative strategy could still be so much better at the 99.9% quantile that it outweighs all this. This is where the assumption that negligible-probability tail events don't dominate expected value calculations makes this difference not matter so much. I think that this is a pretty natural assumption, and thus that this really is almost as good.
comment by Oskar Mathiasen (oskar-mathiasen) · 2021-03-02T13:20:53.829Z · LW(p) · GW(p)
One possible way to get at the hack of ignoring unlikely possibilities in a reasonable way might be to do something similar to the "typical set" found in information theory. Especially as utility function maximization can be reformulated as relative entropy minimization.
(Epistemic status: my brain saw a possible connection, i have not spent much time on this idea)
comment by habryka (habryka4) · 2023-01-16T05:45:16.764Z · LW(p) · GW(p)
This post in particular feels like it has aged well and became surprisingly very relevant in the FTX situation. Indeed post-FTX I saw a lot of people who were confidently claiming that you should not take a 51% bet to double or nothing your wealth, even if you have non-diminishing returns to money, and I sent this post to multiple people to explain why I think that criticism is not valid.
comment by SimonM · 2021-03-01T20:16:46.439Z · LW(p) · GW(p)
Thanks for writing this! I feel like we're now much closer to each other in terms of what we actually think. I roughly suspect we agree:
- Kelly is a litmus test for utilities
- For a Bayesian with log-utility Kelly is the end of the story
You think the important bit is the utility, I think the important bit is what it says about people's utilities.
Replies from: abramdemski↑ comment by abramdemski · 2021-03-01T22:21:27.189Z · LW(p) · GW(p)
Hopefully it's clear to readers that I picked a contrarian title for fun, rather than because it's the best description of our disagreement.
Somehow I'm suspicious that we still have pretty big implicit disagreements about what kinds of arguments are OK to make; like if someone asked you to explain Kelly to them at a party, you might still rant about how it's not about utility, and you'd still say something along the lines of Ole Peters' time-averaging stuff. Or maybe I'm just reacting to the way you're not specifically saying you were wrong about any of the stuff you wrote. But I'm not saying "you're wrong and you should recant", I'm saying I'm hopeful that there's still a productive disagreement between us that we can learn from. EG, if you have in you any defense of Ole Peters or similar arguments, I would like to hear it.
Replies from: SimonM↑ comment by SimonM · 2021-03-01T22:59:55.469Z · LW(p) · GW(p)
For sure - both my titles were clickbait compared to what I was saying.
I think if I was trying to explain Kelly, I would definitely talk in terms of time-averaging and maximising returns. I (hope) I wouldn't do this as an "argument for" Kelly. I think if I was to make an argument for Kelly which is trying to persuade people it would be something close to my post. (Whereby I would say "Here are a bunch of nice properties Kelly has + it's simple + there are easy modifications if it seems too aggressive" and try to gauge from their reactions what I need to talk about).
I will definitely be more careful about how I phrase this stuff though. I think if I wrote both posts again I would think harder about which bits were an "argument" and which bits were guides for intuition.
I actually wouldn't make very much of a defence for the Peters stuff. I (personally) put little stock in it. (At least, I haven't found the "Aha!" moment where what they seem to be selling clicks for me).
I think the most interesting thing about Kelly (which has definitely come through over our posts) is that Kelly is a very useful lens into preferences and utilities. (Regardless of which perspective you come from).
comment by Oliver Sourbut · 2022-06-30T09:12:06.103Z · LW(p) · GW(p)
Three ideas, not at all worked through
- quantilisation and robustness
- quantilising is generally considered 'robust'
- not sure what the best arguments are, but maybe a Bayesian almost always 'should' have rapidly-enough decaying tails that some quantile is equivalent to EV...?
- contra Pascal's wager style failures?
- finitude of evidence can't support arbitrarily large hypotheses...?
- discount rates
- maybe exponential or hyperbolic (or other) discount rate over time steps could lead to something like logarithmic preferences?
- my intuition says nope but I've not run the maths
- I would be surprised if this worked over lots of different scales, but maybe on particular configurations
- if those configurations happened to be plausible ancestrally then...?
- value of information
- maybe some heuristic relating to value of information makes it convergently instrumental to have roughly logarithmic preferences
- you don't learn anything more if you 'go to zero'...?
- maybe cashes out something like quantilising?
- maybe some heuristic relating to value of information makes it convergently instrumental to have roughly logarithmic preferences
comment by Vitor · 2021-03-02T11:52:18.308Z · LW(p) · GW(p)
Great post, I find it really valuable to engage in this type of meta-modeling, i.e., deriving when and why models are appropriate.
I think you're making a mistake in Section 2 though. You argue that a mode optimizer can be pretty terrible (agreed). Then, you argue that any other quantile optimizer can also be pretty terrible (also agreed). However, Kelly doesn't only optimize the mode, or 2% quantile, or whatever other quantile: it maximizes all those quantiles simultaneously! So, is there any distribution for which Kelly itself fails to optimize between meaningfully different states (as in your 2%-quantile with 10% bad outcome example)? I don't think such a distribution exists.
(Note: maybe I'm misunderstanding what johnswentworth said here [LW(p) · GW(p)], but if solving for any x%-quantile maximizer always yields Kelly, then Kelly maximizes for all quantiles, correct?)
Replies from: abramdemski↑ comment by abramdemski · 2021-03-02T16:06:05.274Z · LW(p) · GW(p)
Yep, I actually note this in footnote 3. I didn't change section 2 because I still think that if each of these is individually bad, it's pretty questionable to use them as justification for Kelly.
Note that if a strategy is better or equal in every quantile, and strictly better in some, compared to some , then expected utility maximization will prefer to , no matter what the utility function is (so long as more money is considered better, ie utility is monotonic).
So all expected utility maximizers would endorse an all-quantile-optimizing strategy, if one existed. This isn't a controversial property from the EU perspective!
But it's easy to construct bets which prove that maximizing one quantile is not always consistent with maximizing another; there are trade-offs, so there's not generally a strategy which maximizes all quantiles.
So it's critically important that Kelly is only approximately doing this, in the limit. If Kelly had this property precisely, then all expected utility maximizers would use the Kelly strategy.
In particular, at a fixed finite time, there's a quantile for the all-win sequence. However, since this quantile becomes smaller and smaller, it vanishes in the limit. At finite time, the expected-money-maximizer is optimizing this extreme quantile, but the Kelly strategy is making trade-offs which are suboptimal for that quantile.
(Note: maybe I'm misunderstanding what johnswentworth said here [LW(p) · GW(p)], but if solving for any x%-quantile maximizer always yields Kelly, then Kelly maximizes for all quantiles, correct?)
That's my belief too, but I haven't verified it. It's clear from the usual derivation that it's approximately mode-maximizing. And I think I can see why it's approximately median-maximizing by staring at the wikipedia page for log-normal long enough and crossing my eyes just right [satire].
Replies from: Vitorcomment by Christopher King (christopher-king) · 2023-05-25T14:24:35.022Z · LW(p) · GW(p)
I think one thing a lot of this arguments for Kelly Betting are missing: we already know that utility is approximately logarithmic with respect to money.
So if Kelly is maximizing the expected value of log(utility), doesn't that mean it should be maximizing the expected value of log(log(money)) instead of log(money)? 🤔
Replies from: green_leaf↑ comment by green_leaf · 2023-05-25T14:39:26.257Z · LW(p) · GW(p)
I believe Kelly maximizes E(log(money)), no?
Replies from: christopher-king↑ comment by Christopher King (christopher-king) · 2023-05-25T14:42:20.267Z · LW(p) · GW(p)
Correct, but the arguments given in this post for the Kelly bet are really about utility, not money. So if you believe that you should Kelly bet utility, that does not mean maximizing E(log(money)), it means maximizing E(log(log(money)). The arguments would need to focus on money specifically if they want to argue guess maximizing E(log(money)).
comment by MorgneticField (motred) · 2023-05-24T01:13:39.460Z · LW(p) · GW(p)
I think you give short shrift to Ole Peters ideas here. His argument is similar to the one maximizing repeated bets, but it holds together a lot better. I particularly like his explanation in his paper about the St. Petersburg problem.
You say that "We can't time-average our profits [...] So we look at the ratio of our money from one round to the next." But that's not what Peters does! He looks at maximizing total wealth, in the limit as time goes to infinity.
In particular, we want to maximize where is wealth after all the bets and is 1 plus the percent-increase from bet . The unique correct thing to maximize is wealth after all your bets.
You want to know what choice to make for any given decision, so you want to maximize your rate of return for each individual bet, which is . Peters does a few variable substitutions in the limit as to get as a function of probabilities for outcomes of the bets (see the paper), and finds , where is the gain from one possible outcome of the bet and is the probability of that outcome.
Then you just choose how much to bet to maximize . The argmax of a product is the same as the argmax of sum of the logs, so choosing to maximize this time average will lead to the same observed behavior as choosing to maximize for log utility in ensemble averages (because ).
Replies from: abramdemski↑ comment by abramdemski · 2023-05-29T18:00:23.882Z · LW(p) · GW(p)
It's been a while since I reviewed Ole Peters, but I stand by what I said -- by his own admission, the game he is playing is looking for ergodic observables. An ergodic observable is defined as a quantity such that the expectation is constant across time, and the time-average converges (with probability one) to this average.
This is very clear in, EG, this paper.
The ergodic observable in the case of kelly-like situations is the ratio of wealth from one round to the next.
The concern I wrote about in this post is that it seems a bit ad-hoc to rummage around until we find an ergodic observable to maximize. I'm not sure how concerning this critique should really be. I still think Ole Peters has done something great, namely, articulate a real Frequentist alternative to Bayesian decision theory.
It incorporates classic Frequentist ideas: you have to interpret individual experiments as part of an infinite sequence in order for probabilities and expectations to be meaningful; and, the relevant probabilities/expectations have to converge.
So it similarly inherits the same problems: how do you interpret one decision problem as part of an infinite sequence where expectations converge?
If you want my more detailed take written around the time I was reading up on these things, see here [LW(p) · GW(p)]. Note that I make a long comment underneath my long comment where I revise some of my opinions.
You say that "We can't time-average our profits [...] So we look at the ratio of our money from one round to the next." But that's not what Peters does! He looks at maximizing total wealth, in the limit as time goes to infinity.
In particular, we want to maximize where is wealth after all the bets and is 1 plus the percent-increase from bet .
Taken literally, this doesn't make mathematical sense, because the wealth does not necessarily converge to anything (indeed, it does not, so long as the amount risked in investment does not go to zero).
Since this intuitive idea doesn't make literal mathematical sense, we then have to do some interpretation. You jump from the ill-defined maximization of a limit to this:
You want to know what choice to make for any given decision, so you want to maximize your rate of return for each individual bet, which is .
But this is precisely the ad-hoc decision I am worried about! Choosing to maximize rate of return (rather than, say, simple return) is tantamount to choosing to maximize log money instead of money!
So the argument can only be as strong as this step -- how well can we justify the selection of rate of return (IE, 1 + percentage increase in wealth, IE, the ratio of wealth from one round to the next)?
Ole Peters' answer for this is his theory of ergodic observables. You know that you've found the observable to maximize when it is ergodic (for your chosen infinite-sequence version of the decision problem).
One worry I have is that choice of ergodic observables may not be unique. I don't have an example where there are multiple choices, but I also haven't seen Ole Peters prove uniqueness. (But maybe I've read too shallowly.)
Another worry I have is that there may be no ergodic observable.
Another worry I have is that there will be many ways to interpret a decision problem as part of an infinite sequence of decision problems (akin to the classic reference class problem). How do you integrate these together?
I'm not claiming any of these worries are decisive.
comment by bluefalcon · 2021-03-02T09:35:29.601Z · LW(p) · GW(p)
You're leaving out geometric growth of successive bets. Kelly maximizes expected geometric growth rate. Therefore over enough bets Kelly maximizes expected, i.e. mean, wealth, not merely median wealth.
Replies from: abramdemski, GuySrinivasan↑ comment by abramdemski · 2021-03-02T16:25:57.578Z · LW(p) · GW(p)
As GuySrinavasan says, do the math. It doesn't work out. Maximizing geometric growth rate is not the same as maximizing mean value. It turns out Kelly favors the first at a severe cost to the second.
This is my big motivator for writing stuff like this: discussions of Kelly usually prove an optimality notion like expected growth rate, and then leave it to the reader to notice that this doesn't at all imply more usual optimality notions. Most readers don't notice; it's very natural to assume that "Kelly maximizes growth rate" entails "Kelly maximizes expected wealth".
But if Kelly maximized expected wealth, then that would probably have been proved instead of this geometric-growth-rate property. You have to approach mathematics the same way you approach political debates, sometimes. Keep an eye out for when theorems answer something only superficially similar to the question you would have asked.
Replies from: bluefalcon↑ comment by bluefalcon · 2021-03-03T06:03:53.962Z · LW(p) · GW(p)
A bettor who can make an infinite number of expected profitable bets is going to outperform one who can only make a finite number of bets.
(any number between 1 and 0 exclusive)^infinity=0, i.e. for an infinite series of bets, the probability of ruin with naive EV maximization is 1. So, expected value is actually -1x your bet size.
Replies from: eigil-rischel, abramdemski↑ comment by Eigil Rischel (eigil-rischel) · 2021-03-03T15:16:42.941Z · LW(p) · GW(p)
The source of disagreement seems to be about how to compute the EV "in the limit of infinite bets". I.e given bets with a chance of winning each, where you triple your stake with each bet, the naive EV maximization strategy gives you a total expect value of , which is also the maximum achievable overall EV. Does this entail that the EV at infinite bets is ? No, because with probability one, you'll lose one of the bets and end up with zero money.
I don't find this argument for Kelly super convincing.
-
You can't actually bet an infinite number of times, and any finite bound on the number of bets, even if it's , immediately collapses back to the above situation where naive EV-maximization also maximizes the overall expected value. So this argument doesn't actually support using Kelly over naive EV maximization in real life.
-
There are tons of strategies other than Kelly which achieve the goal of infinite EV in the limit. Looking at EV in the limit doesn't give you a way of choosing between these. You can compare them over finite horizons and notice that Kelly gives you better EV than others here (maximal geometric growth rate).... but then we're back to the fact that over finite time horizons, naive EV does even better than any of those.
↑ comment by SarahSrinivasan (GuySrinivasan) · 2021-03-03T20:36:15.971Z · LW(p) · GW(p)
It's worse than that. The EV at infinite bets is actually ∞ even for naive EV maximization. WolframAlpha link
Replies from: eigil-rischel↑ comment by Eigil Rischel (eigil-rischel) · 2021-03-04T15:24:28.549Z · LW(p) · GW(p)
This argument doesn't work because limits don't commute with integrals (including expected values). (Since practical situations are finite, this just tells you that the limiting situation is not a good model).
To the extent that the experiment with infinite bets makes sense, it definitely has EV 0. We can equip the space with a probability measure corresponding to independent coinflips, then describe the payout using naive EV maximization as a function - it is on the point and everywhere else. The expected value/integral of this function is zero.
EDIT: To make the "limit" thing clear, we can describe the payout after bets using naive EV maximization as a function , which is if the first values are , and otherwise. Then , and (pointwise), but .
The corresponding functions corresponding to the EV using a Kelly strategy have for all , but
Replies from: GuySrinivasan↑ comment by SarahSrinivasan (GuySrinivasan) · 2021-03-04T17:51:19.165Z · LW(p) · GW(p)
lim(EV(fn)) != EV(lim(fn))
Oooooh. Neat. Thank you. I guess... how do we know EV(lim(fn))=0? I don't know enough analysis anymore to remember how to prove this. [reads internet] Well, Wikipedia tells me two functions with the same values everywhere but measure 0 even if those values are +inf have the same integral, so looks good. :D
↑ comment by abramdemski · 2021-03-03T21:35:34.208Z · LW(p) · GW(p)
Let's consider a max-expectation bettor on a double-or-nothing bet with an 80% probability of paying out.
My expected value per dollar in this bet is $1.60, whereas the expected value of a dollar in my pocket is $1. So I maximize expected value by putting all my money in. If I start with $100, my expected value after 1 round is $160. The expected value of playing this way for two rounds is $100x1.6x1.6 = $256. In general, the expected value of this strategy is 100 .
The Kelly strategy puts 60% of its money down, instead. So in expectation, the Kelly strategy multiplies the money by .
So after one round, the Kelly bettor has $136 in expectation. After two rounds, about $185. In general, the Kelly strategy gets an expected value of .
So, after a large number of rounds, the all-in strategy will very significantly exceed the Kelly strategy in expected value.
I suspect you will object that I'm ignoring the probability of ruin, which is very close to 1 after a large number of rounds. But the expected value doesn't ignore the probability of ruin. It's already priced in: the expected value of 1.6 includes the 80% chance of success and the 20% chance of failure: . Similarly, the $256 expected value for two rounds already accounts for the chance of zero; you can see how by multiplying out (which shows the three possibilities which have value zero, and the one which doesn't). Similarly for the th round: the expected value of already discounts the winnings by the (tiny) probability of success. (Otherwise, the sum would be $2^n instead.)
↑ comment by SarahSrinivasan (GuySrinivasan) · 2021-03-02T15:36:06.755Z · LW(p) · GW(p)
I thought something like this the first time I saw abramdemski's pushback. Then I actually did the math in some simple cases. Try doing the math to find a sequence where Kelly beats naive wealth maximization. You will convince either yourself or abramdemski!