post by Stabilizer · 2017-01-05T21:10:44.132Z · LW · GW · Legacy · 27 comments

You are at a casino. You have $1. A table offers you a game: you have to bet all your money; a fair coin will be tossed; if it lands heads, you triple your money; if it lands tails, you lose everything. In the first round, it is rational to take the bet since the expected value of winning is$1.50, which is greater than what you started out with.

If you win the first round, you'll have $3. In the next round, it is rational to take the bet again, since the expected value is$4.50 which is larger than $3. If you win the second round, you'll have$9. In the next round, it is rational to take the bet again, since the expected value is $13.50 which is larger than$9.

You get the idea. At every round, if you won the previous round, it is rational to take the next bet.

But if you follow this strategy, it is guaranteed that you will eventually lose everything. You will go home with nothing. And that seems irrational.

Intuitively, it feels that the rational thing to do is to quit while you are ahead, but how do you get that prediction out of the maximization of expected utility? Or does the above analysis only feel irrational because humans are loss-averse? Or is loss-aversion somehow optimal here?

comment by gwern · 2017-01-05T21:23:24.949Z · LW(p) · GW(p)

Isn't this just the St Petersburg paradox?

Replies from: korin43, tgb, Stabilizer, Qiaochu_Yuan
comment by korin43 · 2017-01-05T21:32:48.010Z · LW(p) · GW(p)

The Wikipedia page has a discussion of solutions. The simplest one seems to be "this paradox relies on having infinite time and playing against a casino with infinite money". If you assume the casino "only" has more money than anyone in the world, the expected value is not that impressive.

See also the Martingale betting system), which relies on the gambler having infinite money.

comment by tgb · 2017-01-07T15:40:00.494Z · LW(p) · GW(p)

I don't like any of the proposed solutions to that when I glanced through the SEP article on it. They're all insightful but are sidestepping the hypothetical. Here's my take:

Compute the expected utility not of a choice BET/NO_BET but of a decision rule that tells you whether to bet. In this case, the OP proposed the rule "Always BET" which has expected utility of 0 and is bested by the rule "BET only once" which is in turn bested by the rule "BET twice if possible" and so on. The 'paradox' then is that there is a sequence of rules whose expected earnings are diverging to infinity. But then this is similar to the puzzle "Name a number; you get that much wealth." Which number do you name?

(Actually I think the proposed rule is not "Always BET" but "Always make the choice for which maximizes expected utility conditional to choosing NO_BET on the next choice". The fact that this strategy is flawed seems reasonable: you're computing the expectation assuming you choose NO_BET next but don't actually choose NO_BET next. Don't count your eggs before they hatch.)

comment by Stabilizer · 2017-01-05T21:31:44.628Z · LW(p) · GW(p)

Thanks! It looks very related, and is perhaps exactly the same. I hadn't heard about it till now. The Stanford encyclopedia of philosophy has a good article on this with different possible resolutions.

comment by Qiaochu_Yuan · 2017-01-12T06:15:42.902Z · LW(p) · GW(p)

No. In the St. Petersburg setup you don't get to choose when to quit, you only get to choose whether to play the game or not. In this game you can remove the option for the player to just keep playing, and force the player to pick a point after which to quit, and there's still something weird going on there.

comment by Qiaochu_Yuan · 2017-01-12T06:51:18.780Z · LW(p) · GW(p)

It's very annoying trying to have this conversation without downvotes. Anyway, here are some sentences.

1. This is not quite the St. Petersburg paradox; in the St. Petersburg setup, you don't get to choose when to quit, and the confusion is about how to evaluate an opportunity which apparently has infinite expected value. In this setup the option "always continue playing" has infinite expected value, but even if you toss it out there are still countably many options left, namely "quit playing after N victories," each of which has higher expected value than the last, and it's still unclear how to pick between them.

2. Utility not being linear in money is a red herring here; you can just replace money with utility in the problem directly, as long as your utility function is unbounded. One resolution is to argue that this sort of phenomenon suggests that utility functions ought to be bounded. (One way of concretizing what it means to have an unbounded utility function: you have an unbounded utility function if and only if there is a sequence of outcomes each of which is at least "twice as good" as the previous in the sense that you would prefer a 50% chance of the better outcome and a 50% chance of some fixed outcome to a 100% chance of the worse outcome.)

3. Thinking about your possible strategies before you start playing this game, there are infinitely many: for every nonnegative integer N, you can choose to stop playing after N rounds, or you can choose to never stop playing. Each strategy is more valuable than the next, and the last strategy has infinite expected value. If you state the question in terms of utilities, that means there's some sense in which the naive expected utility maximizer is doing the right thing, if it has an unbounded utility function.

4. On the other hand, the foundational principled argument for taking expected utility maximization seriously as a (arguably toy) model of good decision-making is the vNM theorem, and in the setup of the vNM theorem lotteries (probability distributions over outcomes) always have finite expected utility, because 1) the utility function always takes finite values; an infinite value violates the continuity axiom, and 2) lotteries are only ever over finitely many possible states of the world. In this setup, without a finite bound on the total number of rounds, the possible states of the world are given by possible sequences of coin flips, of which there are uncountably many, and the lottery over them you need to consider to decide how good it would be to never stop playing involves all of them. So, you can either reject the setup because the vNM theorem doesn't apply to it, or reject the vNM theorem because you want to understand decision making over infinitely many possible outcomes; in the latter case there's no reason a priori to talk about expected utility maximization. (This point also applies to the St. Petersburg paradox.)

5. If you want to understand decision making over infinitely many possible outcomes, you run into a much more basic problem which has nothing to do with expected values: suppose I offer you a sequence of possible outcomes, each of which is strictly more valuable than the previous one (and this can happen even with a bounded utility function as long as it takes infinitely many values, although, again, there's no reason a priori to talk about expected utility maximization in this setting). Which one do you pick?

Replies from: Stabilizer
comment by Stabilizer · 2017-01-13T21:22:06.308Z · LW(p) · GW(p)

Thank you for this clear and useful answer!

comment by Anders_H · 2017-01-05T22:52:14.657Z · LW(p) · GW(p)

The rational choice depends on your utility function. Your utility function is unlikely to be linear with money. For example, if your utility function is log (X), then you will accept the first bet, be indifferent to the second bet, and reject the third bet. Any risk-averse utility function (i.e. any monotonically increasing function with negative second derivative) reaches a point where the agent stops playing the game.

A VNM-rational agent with a linear utility function over money will indeed always take this bet. From this, we can infer that linear utility functions do not represent the utility of humans.

(EDIT: The comments by Satt and AlexMennen are both correct, and I thank them for the corrections. I note that they do not affect the main point, which is that rational agents with standard utility functions over money will eventually stop playing this game)

Replies from: satt, AlexMennen, Douglas_Knight
comment by satt · 2017-01-07T15:19:45.455Z · LW(p) · GW(p)

For example, if your utility function is log (X), then you will accept the first bet

Not even that. You start with $1 (utility = 0) and can choose between 1. walking away with$1 (utility = 0), and

2. accepting a lottery with a 50% chance of leaving you with $0 (utility = −∞) and a 50% chance of having$3 (utility = log(3)).

The first bet's expected utility is then −∞, and you walk away with the \$1.

comment by AlexMennen · 2017-01-05T23:47:26.239Z · LW(p) · GW(p)

Any risk-averse utility function (i.e. any monotonically increasing function with negative second derivative) reaches a point where the agent stops playing the game.

Not true. It is true, however, that any agent with a bounded utility function eventually stops playing the game.

Replies from: Anders_H
comment by Anders_H · 2017-01-05T23:59:28.417Z · LW(p) · GW(p)

Thanks for catching that, I stand corrected.

comment by Douglas_Knight · 2017-01-06T01:40:26.525Z · LW(p) · GW(p)

You are fighting the hypothetical.

In the St Petersburg Paradox the casino is offering a fair bet, the kind that casinos offer. It is generally an error for humans to take these.

In this scenario, the casino is magically tilting the bet in your favor. Yes, you should accept that bet and keep playing until the amount is an appreciable fraction of your net worth. But given that we are assuming the strange behavior of the casino, we could let the casino tilt the bet even farther each time, so that the bet has positive expected utility. Then the problem really is infinity, not utility. (Even agents with unbounded utility functions are unlikely to have them be unbounded as a function of money, but we could imagine a magical wish-granting genie.)

Replies from: AlexMennen, Jiro
comment by AlexMennen · 2017-01-07T04:30:21.433Z · LW(p) · GW(p)

He's not fighting the hypothetical; he merely responded to the hypothetical with a weaker claim than he should have. That is, he correctly claimed that realistic agents have utility functions that grow too slowly with respect to money to keep betting indefinitely, but this is merely a special case of the fact that realistic agents have bounded utility, and thus will eventually stop betting no matter how great the payoff of winning the next bet is.

Replies from: Douglas_Knight
comment by Douglas_Knight · 2017-01-07T06:51:32.355Z · LW(p) · GW(p)

This is a stupid comment. I would downvote it and move on, but I can't, so I'm making this comment.

Replies from: entirelyuseless
comment by entirelyuseless · 2017-01-07T17:22:05.518Z · LW(p) · GW(p)

I agree with this, if "this" refers to your own comment and not the one it replies to.

comment by Jiro · 2017-01-06T15:59:21.880Z · LW(p) · GW(p)

Assuming that the total time it takes to make all your bets is not infinite, this results in

comment by Dagon · 2017-01-05T22:01:09.289Z · LW(p) · GW(p)

Calculate the chance of breaking the casino, for any finite maximum payout. It's always non-zero, there are no infinities.

Replies from: entirelyuseless
comment by entirelyuseless · 2017-01-06T06:12:28.810Z · LW(p) · GW(p)

This is not anyone's true rejection, since no one would plan to play until they lost everything even if the casino had infinite wealth.

Replies from: Dagon
comment by Dagon · 2017-01-06T06:52:05.995Z · LW(p) · GW(p)

It's not a true offer, so it's hard to predict whether a rejection is true. I think I'd be willing to play for any amount they'd let me.

But that doesn't matter. No matter where you stop, the "paradox" doesn't happen for finite amounts.

comment by Mati_Roy (MathieuRoy) · 2020-03-22T10:58:52.315Z · LW(p) · GW(p)

Maybe a billion dollars can buy more utilons than 50% of 3 billion dollars because of diminishing returns of money.

If it's utilons, then I think I'd want to play one more time for all natural numbers.

comment by WalterL · 2017-01-09T23:38:26.143Z · LW(p) · GW(p)

Money doesn't work in a way that makes this interesting.

If I have one dollar, I basically have nothing. What can I buy for one dollar? Well, a shot at this guys casino. Why not?

Now I'm broke, ah well, but I was mostly here before. OR Now I've got 3 dollars. Still can't buy anything,except for this crazy casino.

Now I'm broke, ah well, but I was mostly here before, OR Now I've got 9 dollars. That's a meal if I'm near a fast food joint, or another spin at the casino which seems to love me

Now I'm broke, ah well, but I was mostly here before, OR Now I've got 27 dollars. That's a meal at a casino's prices. Do I want a burger or another spin?

And so on. At each level, compare the thrill of gambling with whatever you could buy for that money. You will eventually be able to get something that you value more than the chance to go back up to the table.

Always do the thing with this money that you will enjoy the most. Initially that is gonna be the casino, because one dollar. Before it gets to something interesting you will lose, because odds.

comment by Flinter · 2017-01-15T21:44:56.257Z · LW(p) · GW(p)

I have my own explanation for this but it will take time to compress. We are implying two different definitions or contexts of the word rational though imo which is sort of the crux of my argument. I think we are also conflating definitions of time, and also conflating different definitions of reality.

comment by selylindi · 2017-01-14T17:59:15.462Z · LW(p) · GW(p)

I think the most fun and empirical way to dissolve this confusion would be to hold a tourney. Remember the Prisoner's Dilemma competitions that were famously won, not by complex algorithms, but by simple variations on Tit-for-Tat? If somebody can host, the rules would be something like this:

1. Players can submit scripts which take only one input (their current money) and produce only one output (whether to accept the bet again). The host has infinite money since it's just virtual.
2. Each script gets run N times where N isn't told to the players in advance. The script with the highest winnings is declared Interesting.
comment by MrMind · 2017-01-09T09:41:14.079Z · LW(p) · GW(p)

But if you follow this strategy, it is guaranteed that you will eventually lose everything. You will go home with nothing. And that seems irrational.

It is not irrational, just a case of revealed preference. It intuitively doesn't sound good because your utility function for money is not linear: otherwise you would be indifferent at losing money. Indeed, humans are more risk averse than linearly allowed.

comment by MrMind · 2017-01-09T09:19:10.461Z · LW(p) · GW(p)

This is a classic theme that regularly comes back. I propose you a different but related paradox: there's a box with one utilon inside. Every hour that the box stay closed you get one more utilon inside.
When do you open it?

comment by RandomPasserby · 2017-01-08T17:09:21.138Z · LW(p) · GW(p)

Consider a sequence of numbers of the form $\\left\(1\+\\frac\{1\}\{n\}\\right\$%5En), where n are natural numbers. Each number in the sequence is rational, the limit of the sequence is not.

Consider your opening question. Assume that your utility is linear in money or that the magical casino offers to triple your utility directly. Each step in the sequence of rounds is rational, the limit itself is not.

Infinities are weird.

comment by stanleywinford · 2017-01-08T05:10:39.802Z · LW(p) · GW(p)

Suppose that at the beginning of the game, you decide to play no more than N turns. If you lose all your money by then, oh well; if you don't, you call it a day and go home.

• After 1 turn, there's a 1/2 chance that you have 3 dollars; expected value = 3/2
• 2 turns, 1/4 chance that you have 9 dollars; expected value = (3/2)^2
• 3 turns, 1/8 chance of 27 dollars; E = (3/2)^3
• 4 turns, 1/16 chance of 81 dollars; E=(3/2)^4
• ...
• N turns, 1/2^N chance of 3^N dollars; E=(3/2)^N

So the longer you decide to play, the higher your expected value is. But is a 1/2^100 chance of winning 3^100 dollars really better than a 1/2 chance of winning 3 dollars? Just because the expected value is higher, doesn't mean that you should keep playing. It doesn't matter how high the expected value is if a 1/2^100 probability event is unlikely to happen in the entire lifetime of the Universe.