Tyranny of the Epistemic Majority
post by Scott Garrabrant · 2022-11-22T17:19:34.144Z · LW · GW · 13 commentsContents
The Steward of Myselves Compositionality Bayesian Updating Bargaining with Myself Kelly Betting Betting Even Less Betting Less Still None 13 comments
This post is going to mostly be propaganda for Kelly betting. However, the reasons presented in this post differ greatly from the reasons people normally use to argue for Kelly betting.
The Steward of Myselves
The curse of uncertainty is that I must make decisions that simultaneously affect many different versions of myself. When I close my eyes and then flip a coin, there are two potential versions of me: one sitting in front of a coin showing heads, the other sitting in front of a coin showing tails. Both of these potential versions of me are stakeholders in my current decisions. How can I make decisions on behalf of these multiple stakeholders?
If it is a fair coin, then we can think of these two potential selves as equal stakeholders in my decisions. However, I know that it is not a fair coin. It has a 60 percent chance of coming up heads. Thus, heads-me is a 60 percent stakeholder in my current decisions, and tails-me is a 40 percent stakeholder. They amount of each one's stake is naturally in proportion to the probability that they actually exist.
You, however, do not know if it is a fair coin, and are offering me a fair bet. I only have 100 dollars to my name, and I am can bet as much as I want (up to 100 dollars) in either direction at even odds.
If I bet 100 dollars on heads, heads-me gets 200 dollars, and tails-me gets nothing. If I bet 100 dollars on tails, tails-me gets 200 dollars, and heads-me gets nothing. If I bet nothing, both versions of me get 100 dollars.
However, every dollar in the hands of heads-me is worth 1.5 times as much as a dollar in the hands of tails-me, since heads-me exists 1.5 times as much. (I am ignoring here any diminishing returns in my value of money.)
Thus, to maximize value I should bet 100 dollars on heads. However, maybe it is better to think of tails-me as the rightful owner of 40 percent of my resources. When I bet 100 dollars on heads, I am seizing money from tails-me for the greater good, since heads-me has the (proportionally greater) existence necessary to better take advantage of it.
Alternatively, I could say that since 60 percent of me is heads-me, heads-me should only control 60 dollars, which can be bet on heads. Tails-me should control 40 dollars, which can be bet on tails. These two bets partially cancel each other out, and the net result is that I bet 20 dollars on heads.
If you are especially fast at maximizing expected logarithms, you might see where this is going.
Compositionality
Now, I am ready to introduce my friend, Kelly. Kelly also has her eyes closed, also has 100 dollars, and is sitting in front of the same coin. However, Kelly has different beliefs. Kelly believes that the coin has a 90 percent chance of coming up tails, and Kelly also has 100 dollars.
I bet 20 dollars on heads, for the reasons described above. Kelly bets 80 dollars on tails for similar reasons (90 dollars on tails, partially nullified by 10 dollars on heads).
I have another friend, Marge. Marge is sitting on the other side of the table with her eyes closed. Marge has 200 dollars. Marge doesn't know much about coins, but knows my and Kelly's beliefs, and thinks Kelly and I are equally likely to be correct. Thus, Marge assigns a 65 percent chance that the coin comes up tails. Marge thus bets 60 dollars on tails (130 dollars on tails, partially nullified by 70 dollars on heads).
Note that the 60 dollars bet by Marge is the same as the net 60 dollar bet you get if you draw a box around me and Kelly. This is representing the compositionality of this betting policy. When you draw a box around me and Kelly, you can think of us as one agent whose wealth is the sum of our wealths, and whose beliefs are the weighted (by wealth) average of our beliefs.
If Kelly, Marge and I all implemented the other strategy, of putting all our money on the outcome we thought was most likely, this would not have happened. Marge would have put 200 dollars on tails, while Kelly and I would have, on net, bet nothing.
This should not be surprising. When you implement a majoritarian policy, it matters where you draw the boundaries. When you instead implement a proportional representation policy, It does not matter where you draw the boundaries. When you have an internal voting block, you have to be careful who you let into your voting block, since it might swing the whole block in the other direction. I think many phenomena that get labeled as politics are actually about fighting over where to draw the boundaries. Wouldn't it be nice if we didn't have to worry about where we draw the boundaries?
Bayesian Updating
We all open our eyes, and see that the coin came up heads. I am given 20 dollars, and now have 120 dollars. Kelly loses her 80 dollars, and is left with 20 dollars. Marge loses her 60 dollars, and is left with 140 dollars. Yay! Sorry, Kelly and Marge.
We all close our eyes, and the coin is flipped again, and we are offered the same bets.
I only had one hypothesis, that the coin was a biased coin with a 60 percent chance of coming up heads, so I do not update at all, and will bet similarly again. I have two potential selves, sitting in front of different coins: heads-me has 72 dollars, which are bet entirely on heads, while tails-me has 48 dollars, which are bet entirely on tails. These bets partially cancel out, and on net I bet 24 dollars on heads (20 percent more money than last time, since I have 20 percent more money). Note that these different versions of me are not the same as the ones from last round. There is a new coin flip, so there's a new branch in my future than before. Similarly, Kelly has 20 dollars, and so bets 16 dollars on tails (18 dollars on tails, partially nullified by 2 dollars on heads).
Marge's situation is more complicated. Marge had two different hypotheses about the coin: one in which I am right, and one in which Kelly is right. Marge has observed some Bayesian evidence that I am right, with an odds ratio of 6 to 1. This evidence that I am right translates into evidence that the coin will come up heads. Marge thus updates her 65 percent probability the coin will come up tails to an approximately 53 percent probability the coin will come up heads. (a 37/70 chance of coming up heads, to be exact). Marge then bets exactly 8 of her 140 dollars on heads (74 dollars on heads, partially nullified by 66 dollars on tails).
Again, Marge's bet is exactly the same as the net bet of me and Kelly.
Indeed, whenever Kelly and I bet, you can break this up into a net bet with the house, together with an internal bet that determines how much control we will each have over whatever money our collective ends up with. The internal bet will always exactly implement Bayesian updating on how much the collective trusts each of us.
As a Bayesian agent, you can think of yourself as a collection of bettors that implement this proportional representation betting strategy and bet with each other. Instead of betting with money, they are betting with a currency that represents your posterior beliefs. When used internally, it recreates Bayesian updating. Maybe as a society, we could get some pretty cool results if we also followed this strategy collectively.
Bargaining with Myself
The above analysis was a weird case because you were offering both sides of the bet at a fair price. In practice, this is unrealistic. Let's instead look at what happens if I only win 95 cents for each dollar I stake. Heads-me wants to put his 60 dollars on heads, and will win 57 dollars, so I end up with 117 dollars if the coin landed heads. Tails-me put his 40 dollars on tails, so I end up with 78 dollars if the coin landed tails. The fact that I am betting on both sides is wasteful. There is a Pareto improvement where I bet less on both sides, and end up only betting on heads. And, I want to pick up this Pareto improvement.
There was no such Pareto improvement before, because my two selves were essentially in a zero sum game. Every dollar one of them got corresponded to a dollar the other one didn't get. Now, they are in a positive sum game and need to split the gains they get not betting on both sides. (However, if I am Bayesian updating, they might internally bet with each other without paying the 5 percent fee.)
How should I split the gains from trade between my two potential selves? Hmm, if only I had some strategy for fairly distributing utility in a Pareto optimal way when I have uncertainty about who I am.
I will have my different selves Nash bargain [LW · GW]! The 0 utility point will be no money. The utility functions will be linear in money, and the distribution on my potential selves will come from the uncertainty I already have.
When I Nash bargain, I end up maximizing the expected logarithm of expected utility. In this case, the outer expectation is over who I am, which I am thinking of as including the state of the coin. Since we moved our uncertainty about the world into our uncertainty about identity, the only thing left in the inner expectation is randomness coming from our action. However, since we can bet continuous amounts of money, and we are treating utility of my various selves as linear in money, we don't have to ever actually randomize, we can just mix between strategies by mixing between our betting amounts. Thus, I end up maximizing the expected logarithm of wealth.
Kelly Betting
This betting strategy, where you maximize the expected logarithm of your wealth, is known as Kelly betting. In the simplifying example where you can bet on anything, and fairly take either side of any bet (which should be approximately true given sufficiently large markets), it is equivalent to treating your various hypotheses as owning proportional portions of your wealth, which they bet entirely on the world that they are in.
I will leave it as an exercise to try to get an intuitive understanding for why maximizing the expected logarithm might be deeply entangled with proportional representation. *coughlogscoringrulecough* *coughminimizingcrossentropycough*
Again, this is not the standard argument for Kelly betting. The standard argument is very good, and is basically that (roughly) if you don't Kelly bet, then after enough time, you will with probability approaching 1 have less money than if you did Kelly bet.
There is a nice parallel between what happens when you don't Kelly bet and when you don't Nash bargain. When you maximize expected wealth, you end up with more money in expectation, unfortunately all that money ends up in the same world, which over time has smaller and smaller probability. In all other worlds, you are left with nothing. This would be fine if you had some channel to transfer the wealth from the one tiny world to all the other worlds, but you don't, so you just end up broke with probability 1.
Similarly, when you maximize total utility, rather than Nash bargaining, you end up with more total utility, but you might end up devoting all of your resources to one utility monster. This would be fine if you could transfer that utility to everyone else, but you can't, so almost everyone might end up with nothing.
Betting Even Less
Many claim that even Kelly betting is not risk averse enough. One major alternative considered is fractional Kelly betting [LW · GW]. For example in half Kelly betting, you bet half as much as you would if you were Kelly betting. This may seem like a hack, but I think it kind of makes sense.
Let's say that I maintain two different probability distributions. Society has their market probability distribution , which is updated using who-knows-what. I have my inside view probabilities , which I try to update Bayesianly, but am obviously not perfect. However, I also have my outside view probabilities . My inside view might be right, or the market might be right, so let's average between them. . I want to keep my inside view and my outside view separate. I use my inside view to think, and I use my outside view to bet.
What happens when I Kelly bet according to my outside view? If you think of Kelly betting as maximizing an expected logarithm, you might start doing some crazy computations, but if you have been following this post thus far, you can just say:
I am the sum of two agents each with half my wealth. The first Kelly bets according to my inside view, and the second doesn't bet at all. This I bet half as much as I would if I were Kelly betting according to my inside view. Isn't compositionality nice?
So, why isn't half Kelly betting just thought of as Kelly betting with different beliefs? The difference is in the updating. I do not update my outside view in a Bayesian way. I update my inside view in a Bayesian way, but I maintain the fact that I think there is a 50 percent chance the market is right instead of me. This is in spite of the observation that I seem to be making money. If on round one I make money, and on round two, I still only make half a Kelly bet, I am choosing to defer to the market more than a Bayesian update on my outside view would suggest.
I am subsidizing my deference to the market by doing a wealth transfer, from my inside view to my deference to the market. Given that I do not fully trust my reasoning and my ability to update my inside view correctly, this seems not entirely crazy to me, and I think it makes more sense when thought of as market deference than when thought of as just cutting my bet in half to be conservative.
Betting Less Still
People sometimes get confused looking at the standard argument for Kelly betting, and say "My utility is already logarithmic in dollars, Shouldn't I bet so as to maximize my expected log log wealth?" Firstly, your utility is not logarithmic in dollars. Utilities are bounded. But secondly, according to the standard argument, the answer is no. If you make enough bets, and continue disagreeing in the market, in the long run, you will, with probability approaching 1, wish you maximized expected log wealth.
However, the arguments in this post are not about repeated bets. They are about respecting your epistemic subagents, and apply even if you only make one bet. If you have utility proportional to the logarithm of 1 dollar plus your wealth, and you Nash bargain across all your possible selves, you end up approximately maximizing the expected logarithm of the logarithm of 1 dollar plus your wealth. (I had to add in the dollar to avoid negative infinity madness.)
I would be careful here, though. I am not sure I endorse going this far. You are sacrificing Bayesian compositionality niceness, and I am not sure exactly what kind of introspecting I would have to do to verify that I actually have preferences logarithmic in wealth, and do not just think that I do because I have Kelly betting intuitions hard coded into me. Anyway, be careful, but again, not entirely crazy to me.
13 comments
Comments sorted by top scores.
comment by habryka (habryka4) · 2024-01-12T02:12:20.827Z · LW(p) · GW(p)
I assign a decent probability to this sequence (of which I think this is the best post) being the most important contribution of 2022. I am however really not confident of that, and I do feel a bit stuck on how to figure out where to apply and how to confirm the validity of ideas in this sequence.
Despite the abstract nature, I think if there are indeed arguments to do something closer to Kelly betting with one's resources, even in the absence of logarithmic returns to investment, then that would definitely have huge effects on how I think about my own life's plans, and about how humanity should allocate its resources.
Separately, I also think this sequence is pushing on a bunch of important seams in my model of agency and utility maximization in a way that I expect to become relevant to understanding the behavior of superintelligent systems, though I am even less confident of this than the rest of this review.
I do feel a sense of sadness that I haven't seen more built on the ideas of this sequence, or seen people give their own take on it. I certainly feel a sense that I would benefit a lot if I saw how the ideas in this sequence landed with people, and would appreciate figuring out the implications of the proof sketches outlined here.
Replies from: Jan_Kulveit↑ comment by Jan_Kulveit · 2024-01-19T00:59:56.706Z · LW(p) · GW(p)
+1 on the sequence being on the best things in 2022.
You may enjoy additional/somewhat different take on this from population/evolutionary biology (and here). (To translate the map you can think about yourself as the population of myselves. Or, in the opposite direction, from a gene-centric perspective it obviously makes sense to think about the population as a population of selves)
Part of the irony here is evolution landed on the broadly sensible solution (geometric rationality). Hower, after almost every human doing the theory got somewhat confused by the additive linear EV rationality maths, what most animals and also often humans on S1 level do got interpreted as 'cognitive bias' - in the spirit of assuming obviously stupid evolution not being able to figure out linear argmax over utility algorithms in a a few billion years.
I guess not much engagement is caused by
- the relation between 'additive' vs 'multiplicative' picture being deceptively simple in formal way
- the conceptual understanding of what's going on and why being quite tricky; one reason is I guess our S1 / brain hardware runs almost entirely in the multiplicative / log world; people train their S2 understanding on linear additive picture; as Scott explains, maths formalism fails us [? · GW]
comment by evhub · 2022-11-22T22:23:14.151Z · LW(p) · GW(p)
Sort of a side note, but one takeaway I've had from the whole FTX fiasco—particularly given SBF's comments here—is that being really careful about teaching and understanding Kelly betting is more important than I would have thought.
Replies from: Scott Garrabrant↑ comment by Scott Garrabrant · 2022-11-23T00:47:31.826Z · LW(p) · GW(p)
Yep, I had been wanting to write this sequence for months, but FTX caused me to sprint for a week until it was all done, because it seems like now is the time people are especially hungry for this theory.
This sequence was going to be my main priority for December (and Kelly betting was going to be my most central example). I thought the main reason EAs needed it was to be able to not feel guilting every time they stop to have fun, to not get Pascal's mugged by calculations about the amount of matter in the universe, to not let longtermism take over the entire EA movement, to have fewer internal politics related issues, and to be more scout-mindset-like to take Julia's term. The Kelly betting was supposed to be more of an analogy about putting all your eggs in one basket.
Then, I suddenly quickly updated on how much the EA community needed these memes.
comment by Matt Goldenberg (mr-hire) · 2022-11-23T01:34:59.308Z · LW(p) · GW(p)
You, however, do not know if it is a fair coin, and are offering me a fair bet. I only have 100 dollars to my name, and I am can bet as much as I want (up to 100 dollars) in either direction at even odds.
If I bet 100 dollars on heads, heads-me gets 200 dollars, and tails-me gets nothing. If I bet 100 dollars on tails, tails-me gets 200 dollars, and heads me gets nothing. If I bet nothing, both versions of me get 100 dollars.
However, every dollar in the hands of heads-me is worth 1.5 times as much as a dollar in the hands of tails-me, since heads-me exists 1.5 times as much. (I am ignoring here any diminishing returns in my value of money.)
Thus, to maximize value I should bet 100 dollars on heads. However, maybe it is better to think of tails-me as the rightful owner of 40 percent of my resources. When I bet 100 dollars on heads, I am seizing money from tails-me for the greater good, since heads-me has the (proportionally greater) existence necessary to better take advantage of it.
Alternatively, I could say that since 60 percent of me is heads-me, heads me should only control 60 dollars, which can be bet on heads. Tails me should control 40 dollars, which can be bet on tails. These two bets partially cancel each other out, and the net result is that I bet 20 dollars on heads.
If you are especially fast at maximizing expected logarithms, you might see where this is going.
Wow I have been looking for an intuitive explanation of Kelly Betting for years, and this is the first one that really hit from an intuitive mathematical perspective.
Replies from: Scott Garrabrant↑ comment by Scott Garrabrant · 2022-11-23T01:49:49.113Z · LW(p) · GW(p)
Thanks.
Be warned that this explanation only applies if the environment is offering both sides of every event at the same odds.
Replies from: notfnofn, Scott Garrabrant, mr-hire↑ comment by Scott Garrabrant · 2022-11-23T01:50:50.472Z · LW(p) · GW(p)
Where by the "same odds," I mean if you can take 3:2 for True, you can take 2:3 for False.
↑ comment by Matt Goldenberg (mr-hire) · 2022-11-23T01:58:50.020Z · LW(p) · GW(p)
Yes, I got down to the Nash Bargaining part which is a bit harder and got confused again, but this helped as a very simple math intuition for why to Kelly Bet, if not how to calculate it in most real world betting situation.
comment by Caspar Oesterheld (Caspar42) · 2022-11-23T01:25:29.118Z · LW(p) · GW(p)
Nice!
I'd be interested in learning more about your views on some of the tangents:
>Utilities are bounded.
Why? It seems easy to imagine expected utility maximizers whose behavior can only be described with unbounded utility functions, for example.
>I think many phenomena that get labeled as politics are actually about fighting over where to draw the boundaries.
I suppose there are cases where the connection is very direct (drawing district boundaries, forming coalitions for governments). But can you say more about what you have in mind here?
Also:
>Not, they are in a positive sum
I assume the first word is a typo. (In particular, it's one that might make the post less readable, so perhaps worth correcting.)
Replies from: Scott Garrabrant↑ comment by Scott Garrabrant · 2022-11-23T01:46:37.980Z · LW(p) · GW(p)
1) So, VNM utility theorem, assuming the space of lotteries is closed under arbitrary mixtures, where you e.g. you can specify a sequence of lotteries, and take the mixture that assigns probability to the th lottery. implies bounded utilities, since otherwise, you can get a lottery with infinite utility, and violate continuity.
I think there are some reasons to not want to allow arbitrary lotteries, and then, you could technically have unbounded utility, but then you get a utility function that can only assign utilities in such a way that you can't set up any St, Petersburg paradoxes. I think that this move makes sense, but it means you have to integrate your probability and utility, and modulo actually thinking of them as integrated in this way, I think utilities are bounded is a good approximation.
I think that almost everyone who talks about unbounded utility functions is not actually doing the above, and is actually violating the VNM axioms, and for me, the word "utility" means VNM.
2) I think that a lot of the behavior referred to as "soldier mindset" as opposed to "scout mindset" is related to the kind of boundaries we are talking about here. I think that e.g. politics with EA feels like it has a lot to do with coalition building, and conflicts about transparency, which fit int this soldier mindset thing.
I think that a lot of politics is about conflicts between respecting the rights of the individuals vs the rights of groups comprised of those individuals. This is something like saying to what extent do we want to think of various different levels as "people." Across humans, you can get things about state's rights, corporation's rights, family's rights, immigration. Within humans, you can get questions about how much you want to hold adults accountable to mistakes they might make as children. I don't know, I am hesitant to get into object level politics.
3)Yeah, I will fix it.
comment by Lorxus · 2024-04-29T18:26:03.940Z · LW(p) · GW(p)
Firstly, your utility is not logarithmic in dollars. Utilities are bounded.
Ehn, the universe is finite and there's no way we can get anywhere near a dollar per atom of value out of the universe. There's well less than particles in the universe and , so if you were wrong about utility not being O(log(money)) because it has to be bounded, how could you ever tell even in principle? (That said I do think you're right, but that's because economium is likely as edible as dollar bills are.)
comment by Slider · 2022-11-23T11:28:38.772Z · LW(p) · GW(p)
View of income disparity as a problem that overrides expected wealth among your possible selfs is a very interesting angle.
Does this mean that there are voting schemes that are structurally impossible to gerrymander? Do they inevitably fail other voting desiderata?
Wouldn't it also make sense to treat the outside-view to be updated. To treat yourself as beating the market if you are beating the market. Or is it that "unknown unknowns" and "I know that I don't know" kind of factors never shift? I read that the recommendation is that when you are wrong one should be less agentic and do the null behaviour (kind of like the action version of null hypothesis). The angle I used to apply is that if you are wrong you should update to be more right. But this recommendation works even if you don't know how to improve. Halt and do what you were previously doing instead of totally freezing.
So am I correct that taking Kelly betting seriously leads to recommendation that St. Petersburg should be rejected? I am also thinking of a continous version of the setup where at each timestep you can stake the amount of money you want for double or nothing. If double is a tiinsy tiny bit more probable than nothing you only stake very little money. And at exactly even odds you stake exactly 0 money. Is this not a solution to the Petersburg blowup?
Seems there are recommendations that are in violation of maximising for expected value and for clarity of myself and other I will restate more explicitly. You have 100 money and are considering two bets. Bet A is 2/3 for 2.1 (2+0.1) times the bet and 1/3 chance of nothing, the bet taking degree is 66.66.. . Bet B is 2/6 chance of 4.2 (2*(2+0.1)) times the bet and 4/6 nothing, the bet taking degree is 33.33. The expectation on both is 1.4 but the bets don't get treated the same, we are not ambivalent between them. We prefer A and can do so without providing a risk tolerance profile. This is probably mostly additional structure on top, most comparisons that go otherwise are overriding ambivalences to favour one side. Same expected values point in the same direction but not at the same magnitude.
It is interesting to think whether there are exceptions and this new scheme would recommend contrary to pure EV expectation. It would seem that less volatile scenarios move faster with difference in outcome intensity. As with expectation value 1.4 we had two bets with 66.66 and 33.33. Are there any bets that have bet taking degrees between those that have a lesser expected value?
I suspect the case is that provided each bet alone we would engage then to those 66.66 and 33.33 degrees but together we are not putting in the whole 100 (66.66+33.33) if offered together.
If some dyper scenario happens at a probability p, then even if the utility shoots throught the roof or is roofless, the maximum that scenario can command is that p fraction and can't go over that. You are not allowed to bet 1000 out of 100. And you can't recommend harder than "100% yes".