Nash equilibriums can be arbitrarily bad

stuart_armstrong

Nash equilibriums can be arbitrarily bad

post by Stuart_Armstrong · 2019-05-01T14:58:21.765Z · LW · GW · 24 comments

  Go hungry with Almost Free Lunches
    Pareto optimal
    Minmax and maximin
  Arbitrary badness with two options
None
24 comments

Go hungry with Almost Free Lunches

Consider the following game, called "Almost Free Lunches" (EDIT: this seems to be a variant of the traveller dilemma). You name any pound-and-pence amount between £0 and £1,000,000; your opponent does likewise. Then you will both get whichever amount named was lowest.

On top of that, the person who named the highest amount must give £0.02 to the other. If you tie, no extra money changes hands.

What's the Nash equilibrium of this game? Well:

The only Nash equilibrium of Almost Free Lunches is for both of you to name £0.00.

Proof: Suppose player A has a probability distribution $p_{A}$ over possible amounts to name, and player $B$ has a probability distribution $p_{B}$ over possible amounts. Let $m_{A}$ be the highest amount such that $p_{A} (m_{A})$ is non-zero; let $m_{B}$ be the same, for $B$ . Assume that $(p_{A}, p_{B})$ is a Nash equilibrium.

Assume further that $m_{A} \geq m_{B}$ (if that's not the case, then just switch the labels $A$ and $B$ ). Then either $m_{A} >$ £0.00 or $m_{A} =$ £0.00 (and hence both players select £0.00).

We'll now rule out $m_{A} >$ £0.00. If $m_{B} >$ £0.00, then player $A$ can improve their score by replacing $m_{A}$ with $m_{A}^{'} = m_{B} - £ 0.01$ . To see this, assume that player $B$ has said $n_{B}$ , and player $A$ has said $m_{A}$ . If $n_{B} < m_{A}^{'} < m_{A}$ , then player $A$ can say $m_{A}^{'}$ just as well as $m_{A}$ - either choice gives them the same amount (namely, $n_{B} -$ £0.02).

There remain two other cases. If $n_{B} = m_{A}^{'}$ , then $m_{A}^{'}$ is superior to $m_{A}$ , getting $m_{A}^{'}$ (rather than $m_{A}^{'} -$ £0.02). And if $n_{B} = m_{B}$ , then $m_{A}^{'}$ gets $m_{A}^{'} +$ £0.02 $= m_{B} +$ £0.01, rather than $m_{B}$ (if $m_{A} = m_{B}$ ) or $m_{B} - £ 0.02$ (if $m_{A} > m_{B}$ ).

Finally, if $m_{B} =$ £0.00, then player $A$ gets -£0.02 unless they also say £0.00.

Hence if $m_{A} >$ £0.00, the $p_{A}$ cannot be part of a Nash Equilibrium. Thus $m_{A} =$ £0.00 and hence the only Nash Equilibrium is at both players saying £0.00.

Pareto optimal

There are three Pareto-optimal outcomes: (£1,000,000.00, £1,000,000.00), (£1,000,000.01, £999,999.97), and (£999,999.97, £1,000,000.01). All of them are very much above the Nash Equilibrium.

Minmax and maximin

The minmax and maximin values are also both terrible, and also equal to £0.00. This is not surprising, though, as minmax and maximin implicitly assume the other players are antagonistic to you, and are trying to keep your profits low.

Arbitrary badness with two options

This shows that choosing the Nash Equilibrium can be worse than almost every other option. We can of course increase the maximal amount, and get the Nash Equilibrium to be arbitrarily worse than any reasonable solution (I would just say either £1,000,000.00 or £999,999.99, and leave it at that).

But we can also make the Nash Equilibrium arbitrarily close to the worst possible outcome, and that without even requiring more than two options for each player.

Assume that there are four ordered amounts of money/utility: $n_{3} > n_{2} > n_{1} > n_{0}$ . Each player can name $n_{2}$ or $n_{1}$ . Then if they both name the same, they get that amount of utility. If they name different ones, then then player naming $n_{2}$ gets $n_{0}$ , and the player naming $n_{1}$ gets $n_{3}$ .

By the same argument as above, the only Nash equilibrium is for both to name $n_{1}$ . The maximum possible amount is $n_{3}$ ; the maximum they can get if they both coordinate is $n_{2}$ , the Nash equilibrium is $n_{1}$ , and the worst option is $n_{0}$ . We can set $n_{1} = n_{0} + ϵ$ and $n_{3} = n_{2} + ϵ$ for arbitrarily tiny $ϵ > 0$ , while setting $n_{2}$ to be larger than $n_{1}$ by some arbitrarily high amount.

So the situation is as bad as it could possibly be.

Note that this is a variant of the prisoner's dilemma with different numbers. You could describe it as "Your companion goes to a hideous jail if and only if you defect (and vice versa). Those that don't defect will also get a dust speck in their eye [LW · GW]."

24 comments

Comments sorted by top scores.

comment by cousin_it · 2019-05-01T21:33:17.470Z · LW(p) · GW(p)

Yeah. Usually the centipede game is used to teach this lesson, but your game is also very nice :-)

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-01T22:01:44.898Z · LW(p) · GW(p)

Cool! I prefer my example, though; it feels more intuitive (and has a single equilibrium).

comment by Taymon Beal (taymon-beal) · 2019-05-01T21:59:59.631Z · LW(p) · GW(p)

Nit: I think this game is more standardly referred to in the literature as the "traveler's dilemma" (Google seems to return no relevant hits for "almost free lunches" apart from this post).

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-02T06:53:54.136Z · LW(p) · GW(p)

That's useful; I added a link to the other game in the main text (as far as I can tell, I came up with this independently).

comment by Shmi (shminux) · 2019-05-02T06:05:47.144Z · LW(p) · GW(p)

So the situation is as bad as it could possibly be.

You mean, it is as bad as it could possibly be for the Nash equilibrium to be a good strategy and a good predictor in this setup? Yep, absolutely. All models tend to have their domain of validity, and this game shows the limits of the Nash equilibrium model of decision making.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-02T07:39:36.869Z · LW(p) · GW(p)

That is true, but I meant it as "as close as you want to the worst possible outcome, and as far as you want from the best mutual outcome".

comment by Unnamed · 2019-05-02T05:45:37.329Z · LW(p) · GW(p)

Unilateral precommitment lets people win at "Almost Free Lunches".

One way to model precommitment is as a sequential game: first player 1 chooses a number, then player 1 has the option of either showing that number to player 2 or keeping it hidden, then player 2 chooses a number. Optimal play is for player 1 to pick £1,000,000 and show that number, and then for player 2 to choose £999,999.99.

An interesting feature of this is that player 1's precommitment helped player 2 even more than it helped player 1. Player 1 is "taking one for the team", but still winning big. This distinguishes it from games like chicken, where precommitment is a threat that allows the precommitter to win the larger share. Though this means that if either player can precommit (rather than one being pre-assigned to go first as player 1) then they'd both prefer to have the other one be the precommitter.

This benefit of precommitment does not extend to the two option version (n2 vs. n1). In that version, player 2 is incentivized to say "n1" regardless of what player 1 commits to, so unilateral precommitment doesn't help them avoid the Nash Equilibrium. As in the prisoner's dilemma.

Replies from: Zvi, Stuart_Armstrong

↑ comment by Zvi · 2019-05-07T16:18:09.516Z · LW(p) · GW(p)

Is there a name for this type of equilibrium, where a player can pre-commit in a way where the best response leaves the first player very well-off, but not quite optimally well-off? What about if it is a mixed strategy (e.g. consider the version of this game where the player who gave the larger number gets paid nothing).

↑ comment by Stuart_Armstrong · 2019-05-02T10:30:07.082Z · LW(p) · GW(p)

I think another key difference between PD and traveller/AFL is that in the PD variant, (n2, n1) is a Pareto outcome - you can't improve the first player's outcome without making the second one worse off. However, in the other problem, (0,0) is very very far from being Pareto.

comment by quanticle · 2019-05-01T16:21:31.585Z · LW(p) · GW(p)

Doesn't the existence of the rule that says that no money changes hands if there's a tie alter the incentives? If we both state that we want 1,000,000 pounds, then we both get it and we both walk away happy. What incentive is there for either of the two agents to name a value that is lower than 1,000,000?

Replies from: GuySrinivasan, Dagon, Charlie Steiner

↑ comment by SarahNibs (GuySrinivasan) · 2019-05-01T17:21:59.579Z · LW(p) · GW(p)

If your strategy remains unchanged, I can change my strategy to "999,999.99 please" and come away with 1,000,000.01 in total, so that's not a Nash equilibrium.

Replies from: quanticle

↑ comment by quanticle · 2019-05-02T03:51:11.161Z · LW(p) · GW(p)

I see. And from then it follows the same pattern as a dollar auction, until the "winning" bet goes to zero.

↑ comment by Dagon · 2019-05-01T17:45:49.952Z · LW(p) · GW(p)

The maximum theoretical payout is 1000000.01 pounds. And since both players know this, and for any given tie amount, a player's net can be increased by a penny by reducing their bid by a penny, they will recursively calculate to 0.

Unless they have a theory of mind and can model ways to take risks in order to increase results in the cases where the other player ALSO takes risks. We call this "trust". and it can be greatly increased with communication, empathy, and side-agreements.

I think it's long been known that Nash equilibria are not necessarily optimal, only guaranteed that the other player's choices can't hurt you. It's perfectly defensive in adversarial games. This is great for zero-sum games, where the other player's increase is exactly your decrease. It's nearly irrelevant (except as a lower bound) for cooperative (variable-sum) games.

↑ comment by Charlie Steiner · 2019-05-01T17:30:53.379Z · LW(p) · GW(p)

Yup. Though you don't necessarily need to imagine the money "changing hands" - if both people get paid 2 extra pennies if they tie, and the person who bids less gets paid 4 extra pennies, the result is the same.

The point is exactly what it says in the title. Relative to the maximum cooperative payoff, the actual equilibrium payoff can be arbitrarily small. And as you change the game, the transition from low payoff to high payoff can be sharp - jumping straight from pennies to millions just by changing the payoffs by a few pennies.

Replies from: Dagon

↑ comment by Dagon · 2019-05-01T18:55:18.048Z · LW(p) · GW(p)

Relative to the maximum cooperative payoff, the actual equilibrium payoff can be arbitrarily small.

Please be careful to specify "Nash equilibrium" here, rather than just "equilibrium". Nash is not the only possible result that can be obtained, especially if players have some ability to cooperate or to model their opponent in a way where Nash's conditions don't hold.

In actual tests (admittedly not very complete, usually done in psychology or economics classes), almost nobody ends up at the Nash equilibrium in this style game (positive-sum where altruism or trust can lead one to take risks).

comment by Jay Molstad (jay-molstad) · 2019-05-02T10:47:26.899Z · LW(p) · GW(p)

It checks out empirically. War is a Nash semi-equilibrium - both sides would be better off if they could coordinate an end to it, but they usually can't (in the near term). If Stanislav Petrov had been a bit less chill, the Cold War would have ended in 1983 in the "everybody launches all the nukes; cockroaches win" Nash equilibrium. If that's not arbitrarily bad, it's plenty bad enough.

Replies from: RorscHak

↑ comment by RocksBasil (RorscHak) · 2019-05-03T05:32:42.214Z · LW(p) · GW(p)

I think "everybody launches all nukes" might not be a Nash Equilibrium.

We can argue that once one side launched their nukes the other side does not necessarily have an incentive to retaliate, given they won't really care whether the enemy got nuked or not after they themselves are nuked, and they probably will have an incentive to not launch the nukes to prevent the "everybody dies" outcome, which can be argued to be negative for someone who is about to die.

Replies from: jay-molstad

↑ comment by Jay Molstad (jay-molstad) · 2019-05-27T13:03:31.525Z · LW(p) · GW(p)

It seems to me that both parties to the Cold War favored the defect-defect outcome (launch all the nukes) over the cooperate-defect outcome (we die, they don't). It's hard to tell, though, because both sides had an incentive to signal that preference regardless of the truth.

But that's an extreme case. Any war you choose will have each side choosing between continuing to fight and surrendering. The cooperate-cooperate outcome (making peace in a way that approximates the likely outcome of a war) is probably best for all, but it's hard to achieve in practice. And it seems to me that at least part of the problem is that, if one side chooses to cooperate (sue for peace and refrain from maximally fighting), they run the risk that the other side will continue to defect (fight) and seize an advantage.

comment by RocksBasil (RorscHak) · 2019-05-02T09:54:27.162Z · LW(p) · GW(p)

It is interesting that experimental results of traveller's dilemma seems to give results which deviate strongly from the Nash Equilibrium, and in fact quite close to the Pareto Optimal Solution.

This is pretty strange for a game that has only one round and no collusion (you'd expect it to end as Prisoner's Dilemma, no?)

It is rather different from what we would see from the dollar auction, which has no Nash Equilibrium but always deviate far away from the Pareto optimal solution.

I suspect that the this game being one round-only actually improves the Pareto efficiency of its outcomes:

Maybe if both participants are allowed to change their bid after seeing each other's bid they WILL go into a downward spiral one cent by one cent until they reach zero or one player gives up at some point with a truce, just like how dollar auctions always stop at some point.

When there is only one round, however, there is no way for a player to make their bid exactly 1 or 2 cents less than the other player, and bidding any less than that is suboptimal compared to bidding more than the other player, so perhaps there is an incentive against lowering one's bidding indefinitely to 0 before the game even starts, just like no one would bid $1000 in the dollar auction's first round.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-02T13:08:30.052Z · LW(p) · GW(p)

you'd expect it to end as Prisoner's Dilemma, no?

I think a key difference is that in PD, (Defect, Cooperate) is a Pareto outcome (you can't make it better for the cooperator without making it worse for the defector). While (0, 0) is far from the Pareto boundary. So people can clearly see that naming numbers around 0 is a massive loss, so they focus on avoiding that loss rather than optimising their game vs the other player.

Replies from: RorscHak

↑ comment by RocksBasil (RorscHak) · 2019-05-03T05:28:43.091Z · LW(p) · GW(p)

I haven't found any information yet, but I suspect there is a mixed Nash somewhere in TD.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-03T09:19:12.380Z · LW(p) · GW(p)

There is no mixed Nash equilibrium in the TD example above (see the proof above).

Replies from: RorscHak

↑ comment by RocksBasil (RorscHak) · 2019-05-03T11:32:17.686Z · LW(p) · GW(p)

Thanks, I forgot the proof before replying your comment.

You are correct that in PD (D,C) is Pareto, and so the Nash Equilibrium (D,D) is much closer to a Pareto outcome than the Nash Equilibrium (0,0) of TD is to its Pareto outcomes (somewhere along each person getting a million pounds, give or take a cent)

It still strange to see a game with only one round and no collusion to land pretty close to the optimal, while its repeated version (dollar auction) seems to deviate badly from the Pareto outcome.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2019-05-03T11:40:37.349Z · LW(p) · GW(p)

It still strange to see a game with only one round and no collusion to land pretty close to the optimal, while its repeated version (dollar auction) seems to deviate badly from the Pareto outcome.

It is a bit strange. It seems this is because in the dollar auction, you can always make your position slightly better unilaterally, in a way that will make it worse once the other player reacts. Iterate enough, and all value is destroyed. But in a one-round game, you can't slide down that path, so you pick by looking at the overall picture.

Nash equilibriums can be arbitrarily bad

Contents

Go hungry with Almost Free Lunches

Pareto optimal

Minmax and maximin

Arbitrary badness with two options

24 comments