When and why should you use the Kelly criterion?
post by Garrett Baker (D0TheMath), philh, River (frank-bellamy) · 2023-11-05T23:26:38.952Z · LW · GW · 25 commentsContents
Possible justifications for Kelly Is Phil using words in a weird way? Should you bet everything every time in a finite game? What about infinite games? Returning to Phil's previous tl;dr None 25 comments
This is a dialogue made during the online dialogues party [? · GW]. Phil, River, and I (Garrett) talked about the Kelly criterion. I was under the mistaken impression that it would come up in finite games, even without discount factors. This turned out to be wrong! In such betting games, you always want to bet the maximal amount (assuming linear utility in money).
We then talked about how maybe you could save the criterion without bringing in log utility in money or discount rates in infinite games. The two conclusions were you could either have a finite utility function, or care intrinsically about not being broke/maximizing the probability of getting more money than other agents in your world.
The highlight in my opinion is me changing my mind about the finite games thing after using dynamic programming to try to show Phil to be wrong. The proof we came up with was both surprising and elegant to me.
Possible justifications for Kelly
Is Phil using words in a weird way?
Should you bet everything every time in a finite game?
What about infinite games?
Returning to Phil's previous tl;dr
25 comments
Comments sorted by top scores.
comment by sapphire (deluks917) · 2023-11-06T01:13:09.123Z · LW(p) · GW(p)
I have supported myself for almost a decade now via speculation / gambling / arbitrage. I almost never find the Kelly criterion all that useful in my own life. If a bet is really juicy go as hard as you can while finding the downside tolerable. If a bet isn't QUITE JUICY I usually pass.
Replies from: D0TheMath↑ comment by Garrett Baker (D0TheMath) · 2023-11-06T01:26:31.867Z · LW(p) · GW(p)
Yeah, I'd expect that for that strategy you would not want to use the Kelly criterion, and it seems more useful when you're relatively uncertain about the quality of your bet.
comment by Olli Järviniemi (jarviniemi) · 2023-11-06T14:39:08.829Z · LW(p) · GW(p)
The part about the Kelly criterion that has most attracted me is this:
That thing is that betting Kelly means that with probability 1, over time you'll be richer than someone who isn't betting Kelly. So if you want to achieve that, Kelly is great.
So with more notation, P(money(Kelly) > money(other)) tends to 1 as time goes to infinity (where money(policy) is the random score given by a policy).
This sounds kinda like strategic dominance - and you shouldn't use a dominated strategy, right? So you should Kelly bet!
The error in this reasoning is the "sounds kinda like" part. "Policy A dominates policy B" is not the same claim as P(money(A) >= money(B)) = 1. These are equivalent in "nice" finite, discrete games (I think), but not in infinite settings! Modulo issues with defining infinite games, the Kelly policy does not strategically dominate all other policies. So one shouldn't be too attracted to this property of the Kelly bet.
(Realizing this made me think "oh yeah, one shouldn't privilege the Kelly bet as a normatively correct way of doing bets".)
Replies from: philh↑ comment by philh · 2023-11-06T16:03:18.928Z · LW(p) · GW(p)
Yes, but there's an additional thing I'd point out here, which is that at any finite timestep, Kelly does not dominate. There's always a non-zero probability that you've lost every bet so far.
When you extend the limit to infinity, you run into the problem "probability zero events can't necessarily be discounted" (though in some situations it's fine to), which is the one you point out; but you also run into the problem "the limit of the probability distributions given by Kelly betting is not itself a probability distribution".
comment by Minetta (Pixellation) · 2023-11-07T23:25:25.214Z · LW(p) · GW(p)
The expected value of the product of two independent random variables is the product of the expected values of each; this concludes my proof that betting everything on each round is expected value maximizing in a finite game (and infinite too, if you adopt the common ways to make "infinite" precise). I'm surprised the dialogue got that far without this being brought up!
Replies from: philhcomment by AlexMennen · 2023-11-06T01:42:27.706Z · LW(p) · GW(p)
The Kelly criterion can be thought of in terms of maximizing a utility function that depends on your wealth after many rounds of betting (under some mild assumptions about that utility function that rule out linear utility). See https://www.lesswrong.com/posts/NPzGfDi3zMJfM2SYe/why-bet-kelly [LW · GW]
Replies from: philh↑ comment by philh · 2023-11-06T10:25:04.553Z · LW(p) · GW(p)
So I claim that Kelly won't maximize , or more generally for any , or , or , or , or even but it'll get asymptotically close when . Do you disagree?
Your "When to act like your utility is logarithmic" section sounds reasonable to me. Like, it sounds like the sort of thing one could end up with if one takes a formal proof of something and then tries to explain in English the intuitions behind the proof. Nothing in it jumps out at me as a mistake. Nevertheless, I think it must be mistaken somewhere, and it's hard to say where without any equations.
Replies from: AlexMennen↑ comment by AlexMennen · 2023-11-07T03:08:48.102Z · LW(p) · GW(p)
Correct. This utility function grows fast enough that it is possible for the expected utility after many bets to be dominated by negligible-probability favorable tail events, so you'd want to bet super-Kelly.
If you expect to end up with lots of money at the end, then you're right; marginal utility of money becomes negigible, so expected utility is greatly effected by neglible-probability unfavorable tail events, and you'd want to bet sub-Kelly. But if you start out with very little money, so that at the end of whatever large number of rounds of betting, you only expect to end up with money in most cases if you bet Kelly, then I think the Kelly criterion should be close to optimal.
(The thing you actually wrote is the same as log utility, so I substituted what you may have meant). The Kelly criterion should optimize this, and more generally for any , if the number of bets is large. At least if is an integer, then, if is normally distributed with mean and standard deviation , then is some polynomial in and that's homogeneous of degree . After a large number of bets, scales proportionally to and scales proportionally to , so the value of this polynomial approaches its term, and maximizing it becomes equivalent to maximizing , which the Kelly criterion does. I'm pretty sure you get something similar when is noninteger.
It depends how much money you could end up with compared to . If Kelly betting usually gets you more than at the end, then you'll bet sub-Kelly to reduce tail risk. If it's literally impossible to exceed even if you go all-in every time and always win, then this is linear, and you'll bet super-Kelly. But if Kelly betting will usually get you less than but not by too many orders of magnitude at the end after a large number of rounds of betting, then I think it should be near-optimal.
If there's many rounds of betting, and Kelly betting will get you as a typical outcome, then I think Kelly betting is near-optimal. But you might be right if .
Replies from: philh↑ comment by philh · 2023-11-07T07:23:58.531Z · LW(p) · GW(p)
Okay, "Kelly is close to optimal for lots of utility functions" seems entirely plausible to me. I do want to note though that this is different from "actually optimal", which is what I took you to be saying.
(The thing you actually wrote is the same as log utility, so I substituted what you may have meant)
Oops! I actually was just writing things without thinking much and didn't realize it was the same.
Replies from: AlexMennen↑ comment by AlexMennen · 2023-11-07T15:59:23.833Z · LW(p) · GW(p)
I do want to note though that this is different from "actually optimal"
By "near-optimal", I meant converges to optimal as the number of rounds of betting approaches infinity, provided initial conditions are adjusted in the limit such that whatever conditions I mentioned remain true in the limit. (e.g. if you want Kelly betting to get you a typical outcome of in the end, then when taking the limit as the number of bets goes to infinity, you better have starting money , where is the geometric growth rate you get from bets, rather than having a fixed starting money while taking the limit ). This is different from actually optimal because in practice, you get some finite amount of betting opportunities, but I do mean something more precise than just that Kelly betting tends to get decent outcomes.
Replies from: philh↑ comment by philh · 2023-11-08T12:46:14.756Z · LW(p) · GW(p)
Thanks for clarifying! Um, but to clarify a bit further, here are three claims one could make about these examples:
- As , the utility maximizing bet at given wealth will converge to the Kelly bet at that wealth. I basically buy this.
- As , the expected utility from utility-maximizing bets at timestep converges to that from Kelly bets at timestep . I'm unsure about this.
- For some finite , the expected utility at timestep from utility-maximizing bets is no higher than that from Kelly bets. I think this is false. (In the positive: I think that for all finite , the expected utility at timestep from utility-maximizing bets is higher than that from Kelly bets. I think this is the case even if the difference converges to 0, which I'm not sure it does.)
I think you're saying (2)? But the difference between that and (3) seems important to me. Like, it still seems that to a (non-log-money) utility maximizer, the Kelly bet is strictly worse than the bet which maximizes their utility at any given timestep. So why would they bet Kelly?
Here's why I'm unsure about 2. Suppose we both have log-money utility, I start with $2 and you start with $1, and we place the same number of bets, always utility-maximizing. After any number of bets, my expected wealth will always be 2x yours, so my expected utility will always be more than yours. So it seems to me that "starting with more money" leads to "having more log-money in expectation forever".
Then it similarly seems to me that if I get to place a bet before you enter the game, and from then on our number of bets is equal, my expected utility will be forever higher than yours by the expected utility gain of that one bet.
Or, if we get the same number of bets, but my first bet is utility maximizing and yours is not, but after that we both place the utility-maximizing bet; then I think my expected utility will still be forever higher than yours. And the same for if you make bets that aren't utility-maximizing, but which converge to the utility-maximizing bet.
And if this is the case for log-money utility, I'd expect it to also be the case for many other utility functions.
...but something about this feels weird, especially with , so I'm not sure. I think I'd need to actually work this out.
Here's a separate thing I'm now unsure about. (Thanks for helping bring it to light!) In my terminology from on Kelly and altruism [LW · GW], making a finite number of suboptimal bets doesn't change how rank-optimal your strategy is. In Kelly's terminology from his original paper, I think it won't change your growth rate.
And I less-confidently think the same is true of "making suboptimal bets all the time, but the bets converge to the optimal bet".
But if that's true... what actually makes those bets suboptimal, in those two frameworks? If Kelly's justification for the Kelly bet is that it maximizes your growth rate, but there are other bet sizes that do the same, why prefer the Kelly bet over them? If my justification for the Kelly bet (when I endorse using it) is that it's impossible to be more rank-optimal than it, why prefer the Kelly bet over other things that are equally rank-optimal?
Replies from: AlexMennen↑ comment by AlexMennen · 2023-11-08T18:00:49.290Z · LW(p) · GW(p)
Yeah, I was still being sloppy about what I meant by near-optimal, sorry. I mean the optimal bet size will converge to the Kelly bet size, not that the expected utility from Kelly betting and the expected utility from optimal betting converge to each other. You could argue that the latter is more important, since getting high expected utility in the end is the whole point. But on the other hand, when trying to decide on a bet size in practice, there's a limit to the precision with which it is possible to measure your edge, so the difference between optimal bet and Kelly bet could be small compared to errors in your ability to determine the Kelly bet size, in which case thinking about how optimal betting differs from Kelly betting might not be useful compared to trying to better estimate the Kelly bet.
Even in the limit as the number of rounds goes to infinity, by the time you get to the last round of betting (or last few rounds), you've left the limit, since you have some amount of wealth and some small number of rounds of betting ahead of you, and it doesn't matter how you got there, so the arguments for Kelly betting don't apply. So I suspect that Kelly betting until near the end, when you start slightly adjusting away from Kelly betting based on some crude heuristics, and then doing an explicit expected value calculation for the last couple rounds, might be a good strategy to get close to optimal expected utility.
Incidentally, I think it's also possible to take a limit where Kelly betting gets you optimal utility in the end by making the favorability of the bets go to zero simultaneously with the number of rounds going to infinity, so that improving your strategy on a single bet no longer makes a difference.
I think that for all finite , the expected utility at timestep from utility-maximizing bets is higher than that from Kelly bets. I think this is the case even if the difference converges to 0, which I'm not sure it does.
Why specifically higher? You must be making some assumptions on the utility function that you haven't mentioned.
Replies from: philh↑ comment by philh · 2023-11-11T13:12:28.574Z · LW(p) · GW(p)
You could argue that the latter is more important, since getting high expected utility in the end is the whole point. But on the other hand, when trying to decide on a bet size in practice, there's a limit to the precision with which it is possible to measure your edge, so the difference between optimal bet and Kelly bet could be small compared to errors in your ability to determine the Kelly bet size, in which case thinking about how optimal betting differs from Kelly betting might not be useful compared to trying to better estimate the Kelly bet.
So like, this seems plausible to me, but... yeah, I really do want to distinguish between
- This maximizes expected utility
- This doesn't maximize expected utility, but here are some heuristics that suggest maybe that doesn't matter so much in practice
If it doesn't seem important to you to distinguish these, then that's a different kind of conversation than us disagreeing about the math, but here are some reasons I want to distingish them:
- I think lots of people are confused about Kelly, and speaking precisely seems more likely to help than hurt.
- I think "get the exact answer in spherical cow cases" is good practice, even if spherical cow cases never come up. "Here's the exact answer in the simple case, and here are some considerations that mean it won't be right in practice" seems better than "here's an approximate answer in the simple case, and here are some considerations that mean it won't be right in practice".
- Sometimes it's not worth figuring out the exact answer, but like. I haven't yet tried to calculate the utility-maximizing bet for those other utility functions. I haven't checked how much Kelly loses relative to them under what conditions. Have you? It seems like this is something we should at least try to calculate before going "eh, Kelly is probably fine".
- I've spent parts of this conversation confused about whether we disagree about the math or not. If you had reliably been making the distinction I want to make, I think that would have helped. If I had reliably not made that distinction, I think we just wouldn't have talked about the math and we still wouldn't know if we agreed or not. That seems like a worse outcome to me.
Why specifically higher? You must be making some assumptions on the utility function that you haven't mentioned.
Well, we've established the utility-maximizing bet gives different expected utility from the Kelly bet, right? So it must give higher expected utility or it wouldn't be utility-maximizing.
Replies from: AlexMennen↑ comment by AlexMennen · 2023-11-11T21:51:28.804Z · LW(p) · GW(p)
Yeah, I wasn't trying to claim that the Kelly bet size optimizes a nonlogarithmic utility function exactly, just that, when the number of rounds of betting left is very large, the Kelly bet size sacrifices a very small amount of utility relative to optimal betting under some reasonable assumptions about the utility function. I don't know of any precise mathematical statement that we seem to disagree on.
Well, we've established the utility-maximizing bet gives different expected utility from the Kelly bet, right? So it must give higher expected utility or it wouldn't be utility-maximizing.
Right, sorry. I can't read, apparently, because I thought you had said the utility-maximizing bet size would be higher than the Kelly bet size, even though you did not.
comment by PaulK · 2023-11-08T13:50:30.843Z · LW(p) · GW(p)
I wonder if you can recover Kelly from linear utility in money, plus a number of rounds unknown to you and chosen probabilistically from a distribution.
Replies from: SimonM↑ comment by SimonM · 2023-11-08T15:19:17.245Z · LW(p) · GW(p)
No, it's fairly straightforward to see this won't work
Let N be the random variable denoting the number of rounds. Let x = p*w+(1-p)*l where p is probability of winning and w=1-f+o*f, l=1-f the amounts we win or lose betting a fraction f of our wealth.
Then the value we care about is E[x^N], which is the moment generating function of X evaluated at log(x). Since our mgf is increasing as a function of x, we want to maximise x. ie our linear utility doesn't change
comment by RationalDino · 2023-11-06T03:40:06.481Z · LW(p) · GW(p)
The simple reason to use Kelly is this.
With 100% odds, any other strategy will lose to Kelly in the long run.
This can be shown by applying the strong law of large numbers to the random walk that is the log of your net worth.
Now what about a finite game? It takes surprisingly few rounds before Kelly, with median performance, pulls ahead of alternate strategies. It takes rather more rounds before, say, you have a 90% chance of beating another strategy. So in the short to medium run, Kelly offers the top of a plateau for median returns. You can deviate fairly far from it and still do well on average.
So should you still bet Kelly? Well, if you bet less than Kelly, you'll experience lower average returns and lower variance. If you bet more than Kelly, you'll experience lower average returns and higher variance. Variance in the real world tends to translate into, "I don't have enough left over for expenses and I'm broke." Reducing variance is generally good. That's why people buy insurance. It is a losing money bet that reduces variance. (And in a complex portfolio, can increase expected returns!) So it makes sense to bet something less than Kelly in practice.
There is a second reason to bet less than Kelly in practice. When we're betting, we estimate the odds. We're betting against someone else who is also estimating the odds. The average of many people betting is usually more accurate than individual bettors. We believe that we're well-informed and have a better estimate than others. But we're still likely biased towards overconfidence in our chances. That means that betting Kelly based on what we think the odds are means we're likely betting too much.
Ideally you would have enough betting history tracked to draw a regression line to figure out the true odds based on the combination of what you think, and the market things. But most of us don't have enough carefully tracked history to accurately make such judgments.
Replies from: AlexMennen↑ comment by AlexMennen · 2023-11-06T05:38:02.779Z · LW(p) · GW(p)
If you bet more than Kelly, you'll experience lower average returns and higher variance.
No. As they discovered in the dialog, average returns is maximized by going all-in on every bet with positive EV. It is typical returns that will be lower if you don't bet Kelly.
Replies from: RationalDino, SimonM↑ comment by RationalDino · 2023-11-06T11:55:53.638Z · LW(p) · GW(p)
Dang it. I meant to write that as,
If you bet more than Kelly, you'll experience lower returns on average and higher variance.
That said, both median and mode are valid averages, and Kelly wins both.
Replies from: AlexMennen, philh↑ comment by AlexMennen · 2023-11-07T04:38:07.427Z · LW(p) · GW(p)
The reason I brought this up, which may have seemed nitpicky, is that I think this undercuts your argument for sub-Kelly betting. When people say that variance is bad, they mean that because of diminishing marginal returns, lower variance is better when the mean stays the same. Geometric mean is already the expectation of a function that gets diminishing marginal returns, and when it's geometric mean that stays fixed, lower variance is better if your marginal returns diminish even more than that. Do they? Perhaps, but it's not obvious. And if your marginal returns diminish but less than for log, then higher variance is better. I don't think any of median, mode, or looking at which thing more often gets a higher value are the sorts of things that it makes sense to talk about trading off against lowering variance either. You really want mean for that.
Replies from: RationalDino↑ comment by RationalDino · 2023-11-07T21:47:25.594Z · LW(p) · GW(p)
The reason why variance matters is that high variance increases your odds of going broke. In reality, gamblers don't simply get to reinvest all of their money. They have to take money out for expenses. That process means that you can go broke in the short run, despite having a great long-term strategy.
Therefore instead of just looking at long-term returns you should also look at things like, "What are my returns after 100 trials if I'm unlucky enough to be at the 20th percentile?" There are a number of ways to calculate that. The simplest is to say that if p is your probability of winning, the expected number of times you'll win is 100p. The variance in a single trial is p(1-p). And therefore the variance of 100 trials is 100p(1-p). Your standard deviation in wins is the square root, or 10sqrt(p(1-p)). From the central limit theorem, at the 20th percentile you'll therefore win roughly 100p - 8.5sqrt(p(1-p)) times. Divide this by 100 to get the proportion q that you won. Your ideal strategy on this metric will be Kelly with p replaced by that q. This will always be less than Kelly. Then you can apply that to figure out what rate of return you'd be worrying about if you were that unlucky.
Any individual gambler should play around with these numbers. Base it on your bankroll, what you're comfortable with losing, how frequent and risky your bets are, and so on. It takes work to figure out your risk profile. Most will decide on something less than Kelly.
Of course if your risk profile is dominated by the pleasure of the adrenaline from knowing that you could go broke, then you might think differently. But professional gamblers who think that way generally don't remain professional gamblers over the long haul.
↑ comment by philh · 2023-11-06T13:30:00.625Z · LW(p) · GW(p)
(Variance is "expected squared difference between observation and its prior expected value", i.e. variance as a concept is closely linked to the mean and not so closely linked to the median or mode. So if you're talking about "average" and "variance" and the average you're talking about isn't the mean, I think at best you're being very confusing, and possibly you're doing something mathematically wrong.)
Replies from: RationalDino↑ comment by RationalDino · 2023-11-07T20:55:36.811Z · LW(p) · GW(p)
I'm sorry that you are confused. I promise that I really do understand the math.
In repeated addition of random variables, all of these have a close relationship. The sum is approximately normal. The normal distribution has identical mean, median, and mode. Therefore all three are the same.
What makes Kelly tick is that the log of net worth gives you repeated addition. So with high likelihood the log of your net worth is near the mean of an approximately normal distribution, and both median and mode are very close to that. But your net worth is the exponent of the log. That creates an asymmetry that moves the mean away from the median and mode. With high probability, you will do worse than the mean.
The comment about variance is separate. You actually have to work out the distribution of returns after, say 100 trials. And then calculate a variance from that. And it turns out that for any finite n, variance monotonically increases as you increase the proportion that you bet. With the least variance being 0 if you bet nothing, to being dominated by the small chance of winning all of them if you bet everything.
↑ comment by SimonM · 2023-11-06T09:08:41.011Z · LW(p) · GW(p)
average returns
I think the disagreement here is on what "average" means. All-in maximises the arithmetic average return. Kelly maximises the geometric average. Which average is more relevant is equivalent to the Kelly debate though, so hard to say much more