Why Bet Kelly?

joe-zimmerman

Why Bet Kelly?

post by Joe Zimmerman (joe-zimmerman) · 2022-11-29T18:47:23.142Z · LW · GW · 4 comments

4 comments

The Kelly criterion is an elegant, but often misunderstood, result in decision theory. To begin with, suppose you have some amount of some resource, which you would like to increase. (For example, the resource might be monetary wealth.) You are given the opportunity to make a series of identical bets. You determine some fraction of your wealth to wager; then, in each bet, you gain a fraction $f$ with probability $p$ , and lose a fraction $f$ with probability $(1 - p)$ .^[1]

In other words, suppose $W_{n}$ is your wealth after $n$ bets. We will define $Z_{n} = log W_{n}$ , and we will suppose for simplicity that $Z_{0} = 0$ . Then $Z_{n} = \sum_{t = 1}^{n} R$ , where $R$ is a random variable defined as:

R = {\begin{matrix} log (1 + f) & with probability p log (1 - f) & with probability (1 - p) \end{matrix}

Now suppose that, for some reason, we want to maximize $E [Z_{n}]$ . By linearity of expectation, $E [Z_{n}] = \sum_{t = 1}^{n} E [R]$ . Hence, we should simply maximize $E [R]$ . This amounts to solving:

\begin{matrix} 0 & = & \frac{\partial}{\partial f} E [R] 0 & = & \frac{\partial}{\partial f} [p log (1 + f) + (1 - p) log (1 - f)] 0 & = & (1 - f) p - (1 - p) (1 + f) f & = & 2 p - 1 \end{matrix}

This, $f = 2 p - 1$ , is known as the Kelly bet. For example, it says that if you have a 60-40 edge, then you should bet $f = 2 (0.6) - 1 = 0.2$ , i.e., bet $20 %$ of your current wealth on each bet.

That all seems pretty reasonable. But why do we want to maximize $E [Z_{n}]$ ? If we were to simply maximize expected wealth, i.e., $E [W_{n}]$ , then a straightforward calculation shows that we should not bet Kelly -- in fact, we should bet $f = 1$ ("YOLO"), wagering the entire bankroll on every bet. This seems extremely counterintuitive, since, after $n$ bets, our wealth would then be:

W_{n} = {\begin{matrix} 0 & with probability 1 - p^{n} 2^{n} & with probability p^{n} \end{matrix}

In other words, as $n$ grows large, we would almost surely go bankrupt! Nevertheless, this would be the way to maximize $E [W_{n}]$ . Kelly, whatever its merits, does not maximize $E [W_{n}]$ -- not even in the long run. Especially not in the long run.

We now come to the perennial debate: why does Kelly seem "obviously right", and YOLO "obviously wrong"? There are many answers usually offered to this question.

First, what we believe to be the correct answer:

Utility is not linear in wealth. As originally observed by Bernoulli, utility tends to be approximately logarithmic in wealth. If utility happens to be exactly logarithmic in wealth, then the Kelly bet is optimal. For most people, in most circumstances, utility is approximately logarithmic in wealth. The Kelly bet is approximately optimal. On the other hand, utility is very far from being linear in wealth, and so YOLO is a very bad idea.

In a certain sense, it is as simple as that. The von Neumann-Morgenstern utility theorem (vNM) tells us that we should be optimizing $E [U]$ for some utility function $U$ . We know that the Kelly criterion always optimizes $E [Z_{n}] = E [log W_{n}]$ . Therefore, if the Kelly criterion is optimal, it is because $U = log W_{n}$ .

Now, there are many other answers to "why bet Kelly?" that initially seem plausible:

Kelly maximizes the expected growth rate, ${lim}_{n \to \infty} E [W_{n}^{1 / n}]$ . This happens to be true, and speaks to the elegance of Kelly's result.^[2] However, unless for some reason you find yourself in a contest where you only win the prize if you have the highest expected growth rate, this is not a good reason to bet Kelly. vNM says we should maximize expected utility, not maximize expected growth rate.
Kelly maximizes the geometric mean of wealth, $M = \prod_{v} v^{P r [W_{n} = v]}$ . This is also evidently true, as $log M$ is precisely $E [log W_{n}] = E [Z_{n}]$ . However, vNM says we should maximize expected utility (i.e., arithmetic mean of utility), not geometric mean of wealth. Again, if utility happens to be approximately logarithmic in wealth, then maximizing the geometric mean of wealth feels right, but it's because of the logarithmic utility of wealth.
The Kelly bettor, with high probability, ends up with higher wealth than the non-Kelly bettor. This is particularly evident when Kelly is compared with YOLO. But, again, vNM does not say "maximize wealth with high probability"; it says "maximize expected utility".
We should try to optimize something that has nice properties (e.g., can be time-averaged, or can be optimized myopically [Mossin, 1968], [Hakansson, 1971]). There is certainly an argument that, if our utility function happens to already be approximately logarithmic, then we might want to adopt logarithmic utility as a heuristic, since it has these nice properties. However, ultimately our true utility function is what it is. If we claim that Kelly is optimal, and we claim that our true utility function is not logarithmic in wealth, then we are rejecting vNM.
We should simply reject vNM, and optimize something else as a terminal value (e.g., geometric mean or maximin). This seems quite drastic, as the vNM axioms are very mild assumptions.

So, we claim, if Kelly is optimal then it is because our utility function is $U = log W_{n}$ . However, this is not the whole story. The utility function $U$ refers to the utility of wealth at the moment after the betting experiment, not the terminal utility of wealth in general. We can imagine that this experiment is just the preamble to a much longer game, in which $U_{T}$ is the ultimate terminal value of wealth (e.g., in number of lives saved), and we are investing over $T$ time steps where, in each step, we have the opportunity to place a bet with some statistical edge $p : (1 - p)$ . We can then use backward induction to determine the utility function that we should adopt for wealth at previous points in the game: $U_{T - 1}, U_{T - 2}, \dots, U_{0}$ . It is this final function, $U_{0} (W)$ , that we should treat as our "utility function" in the preamble experiment.

Now, suppose we ultimately have something like this as our terminal utility function:

U_{T} (W) = {\begin{matrix} W & if W < C C & otherwise \end{matrix}

In other words, number-of-lives-saved is linear in money up to a certain point, then flat -- an exaggerated version of the phenomenon of diminishing returns. As it turns out, when we apply backward induction for reasonably large values of $T$ (e.g., $T = 100$ ) and modest statistical edge (e.g., $p = 0.55)$ , we obtain a preamble utility function $U_{0} (W)$ that looks something like this (taking $C = 1$ for simplicity):

In general, this function "looks more like a logarithm" than the piecewise-linear function $U_{T}$ , and falls off sharply as we approach zero. Clearly it is not actually a logarithm, as it is bounded above and below (and is, in fact, equal to $1$ for values $W \geq 1$ ). But, for a broad class of terminal utility functions $U_{T}$ , the resulting function $U_{0}$ looks surprisingly logarithm-like.

In summary, the Kelly criterion is an elegant, and surprisingly simple, formula for optimizing $E [log W]$ . As a general strategy, optimizing $E [log W]$ is appealing in a number of ways:

It has many aesthetically appealing properties: it maximizes geometric growth rate; it maximizes the geometric mean over outcomes; it results in outperforming other bettors with high probability; and it is stable in the sense of Mossin and Hakansson.
Separately, $log W$ is often a good approximation to the true expected utility of money-after-the-bet, if the scenario specifies a long series of subsequent opportunities to make bets. When we examine the instrumental utility function $U_{0}$ that arises from applying backward induction to such a series of opportunities, we find that it often "looks like a logarithm".

However, we should remember that the Kelly bet, ultimately, is only an approximation. The true optimal bet -- the one that actually maximizes expected utility $E [U_{T}]$ -- may be significantly different, in either direction.

Acknowledgements: We would like to thank davidad for many helpful comments on earlier drafts of this article.

^{^}
Note that some definitions of the Kelly betting experiment are slightly more complicated, as they presume that one wins $b f$ with probability $p$ and loses $a f$ with probability $(1 - p)$ . In this document, for simplicity, we take $a = b = 1$ .
^{^}
To show this, note that ${lim}_{n \to \infty} \frac{1}{n} \sum_{t = 1}^{n} R = δ (E [R])$ , and hence ${lim}_{n \to \infty} W_{n}^{1 / n} = {lim}_{n \to \infty} exp (\frac{1}{n} \sum_{t = 1}^{n} R) = δ (exp E [R])$ , whose expectation is maximized when we maximize $E [R]$ .

4 comments

Comments sorted by top scores.

comment by philh · 2022-12-02T00:09:33.734Z · LW(p) · GW(p)

Kelly maximizes the expected growth rate, .

I... think this is wrong? It's late and I should sleep so I'm not going to double check, but this sounds like you're saying that you can take two sequences, one has a higher value at every element but the other has a higher limit.

If something similar to what you wrote is correct, I think it will be that Kelly maximizes $E ({lim}_{n \to \infty} W_{n}^{1 / n})$ . That feels about right to me, but I'm not confident.

comment by Dagon · 2022-11-29T21:21:50.925Z · LW(p) · GW(p)

Something I've often wondered - if utility for money is logarithmic, AND maximizing expected growth means logarithmic betting in the underlying resource, should we be actually thinking log(log(n))? I think the answer is "no", because declining marginal utility is irrelevant to this - we still value more over less at all points.

Replies from: joe-zimmerman, philh

↑ comment by Joe Zimmerman (joe-zimmerman) · 2022-11-29T23:19:07.355Z · LW(p) · GW(p)

No -- you should bet so as to maximize . If $U (W) = log W$ , and you are wagering $W$ , then bet Kelly, which optimizes $E [log W] = E [U]$ . However, if for some reason you are directly wagering $U$ (which seems very unlikely), then the optimal bet is actually YOLO, not Kelly.

↑ comment by philh · 2022-12-01T23:55:29.661Z · LW(p) · GW(p)

I think the key thing to note here is that "maximizing expected growth" looks the same whether the thing you're trying to grow is money or log-money or sqrt-money or what. It "just happens" that (at least in this framework) the way one maximizes expected growth is the same as the way one maximizes expected log-money.

I've recently written about this [LW · GW] myself. My goal was partly to clarify this, though I don't know if I succeeded.

I think the post confuses things by motivating the Kelly bet as the thing that maximizes expected log-money, and also has other neat properties. To my mind, if you want to maximize expected log-money, you just... do the arithmetic to figure out what that means. It's not quite trivial, but it's stats-101 stuff. I don't think it seems more interesting to do the arithmetic that maximizes expected log-money compared to expected money or expected sqrt-money. Kelly certainly didn't introduce the criterion as "hey guys, here's a way to maximize expected log-money". (Admittedly, I don't much care about his framing either. The original paper is information-theoretic in a way that seems to be mostly forgotten about these days.)

To my mind, the important thing about the Kelly bet is the "almost certainly win more money than anyone using a different strategy, over a long enough time period" thing. (Which is the same as maximizing expected growth rate, when growth is exponential. If growth is linear you still might care if you're earning $2/day or $1/day, but the "growth rate" of both is 0 as defined here.) So I prefer to motivate the Kelly bet as being the thing that does that, and then say "and incidentally, turns out this also maximizes expected log-wealth, which is neat because...".

Why Bet Kelly?

Contents

4 comments