# Mixed strategy Nash equilibrium

post by Meni_Rosenfeld · 2010-10-16T16:00:05.537Z · LW · GW · Legacy · 47 comments**Inspired by:** Swords and Armor: A Game Theory Thought Experiment

Recently, nick012000 has posted Swords and Armor: A Game Theory Thought Experiment. I was disappointed to see many confused replies to this post, even after a complete solution was given by Steve_Rayhawk. I thought someone really ought to post an explanation about mixed strategy Nash equilibria. Then I figured that that someone may as well be me.

I will assume readers are familiar with the concepts of a game (a setting with several players, each having a choice of strategies to take and a payoff which depends on the strategies taken by all players) and of a Nash equilibrium (an "optimal" assignment of strategies such that, if everyone plays their assigned strategy, no player will have a reason to switch to a different strategy). Some games, like the famous prisoner's dilemma, have a Nash equilibrium in so-called "pure strategies" (as opposed to mixed strategies, to be introduced later). Consider, however, the following variant of the matching pennies game:

Player 1 is a general leading an attacking army, and player 2 is the general of the defending army. The attacker can attack from the east or west, and the defender can concentrate his defenses on the east or west. By the time each side learns the strategy of its enemy, it is too late to switch strategies. Attacking where the defenses aren't concentrated gives a great advantage; additionally, due to unspecified tactical circumstances, attacking from the east gives a slight advantage. The sides have no interest in cooperating, so this is a zero-sum game (what one side wins, the other loses).

This elaborate description can be summarized in the following payoff matrix (these payoffs are for the attacker; the defender's payoffs are their negatives):

2: East | 2: West | |

1: East | -1 | 2 |

1: West | 1 | -2 |

What strategy should each side play? The attacker can think, "Overall, going east is advantageous. So I'll go east." The defender, anticipating this, might say, "Surely they will go east, so that's where I'll wait for them." But after some deliberation, the attacker will realize this, and say "They will expect me from the east! I'll surprise them and go west." You can see where this is going - the defender will think "they'll try to be smart and go west; I'll be smarter and be ready for them", the attacker will think "I was right the first time, east is the way to go" and so on, ad infinitum.

Indeed, looking at the table does not reveal any obvious Nash equilibrium. Every assignment of strategies, represented as a square in the table, will leave one side preferring the alternative. So, are the sides deadlocked in an infinite recursion of trying to outsmart each other? No. They have the option of choosing a strategy randomly.

Suppose the attacker decides to toss a coin, and go east if it lands heads, and west for tails. Suppose also that the defender, with his mastery of psychology, manages to accurately predict the attacker's bizarre behavior. What can he do with this knowledge? He cannot predict the outcome of a coin toss, so he makes do with calculating the expected outcome for each of his (the defender's) strategies. And it can be easily seen that no matter what the defender does, the expectation is 0.

Similarly, suppose the defender consults his preferred (P)RNG so that he defends east with probability 2/3, west with probability 1/3, and suppose that the attacker anticipates this. Again, either of the attacker's strategies will give an expected gain of 0.

This randomized behavior, which is a combination of strategies from which one is randomly chosen with specified probabilities, is called a "mixed strategy". They are typically denoted as a vector listing the probability for choosing each pure strategy, so the defender's suggested strategy is (2/3, 1/3).

What we have seen here, is that by a clever choice of mixed strategy, each side can make sure they cannot be outsmarted! By constraining ourselves to rely on randomness for deciding on an action, we denied our opponent the ability to predict it and counter it. If yourself you don't know what you'll do, there's no way your opponent will know. We conclude that sometimes, acting randomly is the rational action to take.

As we've seen, for each player we have that if he uses his suggested mixed strategy, then it doesn't matter what the other player will do. The corollary is that if the attacker plays (1/2, 1/2) and the defender plays (2/3, 1/3), then no player will have a reason to switch to a different strategy - this is a Nash equilibrium!

Some more points of interest: "Just" randomizing won't do - you have to pick the right probabilities. The exact probabilities of the mixed strategy Nash equilibria, and the resulting payoff, depend on the specifics of the payoff matrix. In my example, the defender needs a high probability of defending east to prevent the attacker from exercising his advantage, but the symmetry is such that the attacker chooses with even odds. In games with more strategies per player, an equilibrium mixed strategy may be supported on (have positive probability for) all, several, or one of the pure strategies. If all players apply a Nash equilibrium, then any player can switch to any one of the pure strategies supporting his mixed strategy without changing his expected payoff; switching to any other strategy either decreases the payoff or leaves it unchanged.

Perhaps most interesting of all is Nash's theorem, saying that every finite game has a mixed strategy Nash equilibrium! Our original problem, that some games have no equilibrium, is solved completely once we move to mixed strategies.

One thing should be kept in mind. A Nash equilibrium strategy, much like a minimax strategy, is "safe". It makes sure your expected payoff won't be too low no matter how clever your opponent is. But what if you don't want to be safe - what if you want to **win**? If you have good reason to believe you are smarter than your opponent, that he will play a non-equilibrium strategy you'll be able to predict, then go ahead and counter that strategy. Nash equilibria are for smart people facing smarter people.

In fact, it is possible that some of the comments I called confused earlier were fully aware of these issues and coming from this "post-Nash" view. To them I apologize.

Examples where mixed strategies are crucial are plentiful. I'll give one more - the volunteer's dilemma. A group of people are about to suffer greatly from some misfortune, unless at least one of them volunteers to take a slightly inconvenient action. They cannot communicate to coordinate the volunteering. If everyone uses the same deterministic strategy, then either all or none will volunteer, neither of which is stable. But they have a mixed strategy equilibrium which, by carefully balancing the risks of needlessly volunteering and having nobody volunteer, leaves everyone with an expected penalty equal to just the inconvenience of volunteering. Not as good as having just one person volunteer, but at least it's stable.

For further reading, Wikipedia has many articles about game theory and Nash equilibria. Also, Perplexed made this related comment recently.

## 47 comments

Comments sorted by top scores.

## comment by cousin_it · 2010-10-14T21:40:06.803Z · LW(p) · GW(p)

It's indeed a mystery to me why anyone bothered to post and discuss "solutions" different from Rayhawk's in the swords and armor thread. This stuff is like arithmetic: one right answer, nothing to argue about.

As a bonus, I'll give an introduction to the notion of "correlated equilibrium" invented by Aumann, using a model game invented by Shapley. Imagine you're playing a variant of Rock Paper Scissors where a win gives you 1 point, but a lose or a draw give 0 points. (So the game is no longer zero-sum - this is essential.) Obviously, if you use some strategy more than 1/3 of the time, the other guy may adjust to that; therefore the only Nash equilibrium is mixed where you both play each strategy with probability 1/3, which gets you both an expected payoff of 1/3. But the problem with this result is that sometimes the game ends in a draw and no one wins any money. So it would be mutually beneficial to somehow arrange that you never play the same strategy. But doesn't the uniqueness of the Nash equilibrium mean that any such arrangement would be unstable?

Well, here's how you do it. Suppose you both ask a trusted third party to randomly pick one of the six non-draw outcomes of the game, and then privately *tell each of you which strategy to play* (without telling you what they said to the other guy). For example, they might randomly pick "Rock Scissors", tell you to play Rock, and tell your opponent to play Scissors. In this freaky situation, even though no one's forcing you to follow the advice, doing so is an *equilibrium*! This means that neither of you can gain anything by deviating from the advice - provided that the opponent doesn't deviate. And your expected payoff is now 1/2, because draws cannot happen, which is better then the Nash equilibrium payoff of 1/3. This is called a "correlated equilibrium". It's one of the examples that show how even *non-binding* agreements, "cheap talk", can still make people better off.

## ↑ comment by jmmcd · 2010-10-14T22:50:52.477Z · LW(p) · GW(p)

This stuff is like arithmetic: one right answer, nothing to argue about.

*Under the assumption of universal rationality.* Without that assumption (which would not be fulfilled in a real fantasy sword-fighting game), the best strategy/mixed strategy does not correspond to the Nash equilibrium, and there remains plenty to argue about.

What happens when some percentage of people are picking randomly, some people are "stylin", and some people are performing "misinformed" calculations and/or simulations? The fact that some people actually did the latter shows that the effect must be taken into account.

Replies from: Swimmy## ↑ comment by Swimmy · 2010-10-16T18:36:21.895Z · LW(p) · GW(p)

The effect of irrationality should be taken into account, but unless you have a good way to do so (like a solid model of the effect), adopting another strategy would be akin to betting on red sometimes.

## ↑ comment by JenniferRM · 2010-10-18T04:16:12.159Z · LW(p) · GW(p)

It's indeed a mystery to me why anyone bothered to post and discuss "solutions" different from Rayhawk's in the swords and armor thread. This stuff is like arithmetic: one right answer, nothing to argue about.

In some sense, I agree with you. The problem as posed had a clear answer that was calculable by a known method (if one had done the requisite reading in game theory). The thing I particularly liked about Rayhawk's post was the link to the a library of game theory software and tools for the construction and analysis of finite extensive and strategic games: gambit. That link was the kind of novel and useful pointer that is one of the many reasons I have for reading LW :-)

On the other hand, I find that the world frequently fails to present situations to me that are intelligible to the point that I can build a payoff matrix and run the numbers. So, as *a simple exercise standing in for a more complex world* there was potentially much more to say about the puzzle. In that vein I particularly liked Nominull's fast and frugal answer:

My general heuristic for these sorts of games is to play the option that beats the option that beats the option that looks best to me at first blush. In this case that means I play green sword, yellow armor. It's a reasonably fast heuristic that does reasonably well.

I expect that I would find it very difficult to mimic Rayhawk's application of gambit in the bulk of real life circumstances. Nominull's heuristic (which incidentally produced one of the options from the optimal mixed strategy) seems more generally applicable. I can imagine using Nominull's heuristic in much fuzzier contexts for much lower data gathering costs and getting pretty good results thereby. Not that I've tested it or anything... but it's the sort of thing I'll be looking for an opportunity to try out in the real world someday, and see if it helps :-)

## ↑ comment by Perplexed · 2010-10-15T02:32:50.963Z · LW(p) · GW(p)

As an *extra* added bonus, I'll mention the "revelation principle" which applies in games where there is some uncertainty regarding the other guy's payoffs or constraints. (No math or proofs here - I just want to sketch the problem and assert the solution. For proofs and details, see a good game theory textbook, such as Myerson.)

For example, say that Player1 has arthritis and can't form a fist for "rock" without some pain. As a result, this player suffers a penalty of -1/3 whenever he plays "rock" - meaning he only has a net payoff of 2/3 against "scissors" and actually loses 1/3 against "rock" or "paper". Player2 has heard rumors about Player1's arthritis, but doesn't know whether the pain occurs when Player 1 plays "rock" or whether it is "paper" that causes the problem. She assigns a Bayesian prior probability of 0.5 to each kind of arthritis. Player 1 knows about these priors - that is, these priors are common knowledge.

Clearly, Player1 wants to keep secret from Player2 exactly which form of arthritis he suffers from. Harsanyi (1967 or so) invented the concept of "Bayesian equilibrium" to characterize and calculate solutions to these kinds of games where the players are unsure of each other's payoffs and beliefs.

Cousin_it just told us about how the basic concept of Nash equilibrium can sometimes be improved to "correlated equilibrium" if the players can find a mutually trusted 3rd person to assist them. Actually, they don't need a person - a mutually trusted machine will do just fine. So the question naturally arises whether a Bayesian equilibrium can be improved to a correlated equilibrium just as the simpler Nash equilibrium could be improved. The answer is yes - but there is a catch. The catch is called the "revelation principle".

The principle is that both players have to trust the 3rd person or machine with their secrets. It turns out that there always is a correlated equilibrium in which both players have an incentive to *reveal their secrets honestly to the arbitrator* (though not to each other) **and** both players have an incentive to *carry out the instructions* of the arbitrator.

The algorithms used by the arbitrator in a correlated equilibrium are not rocket science. They can't be. They have to be understandable and understood by both players. How else can they be convinced that they will do better by trusting the arbitrator than by just playing the game without arbitration?

So if the game players are two powerful AIs, they will want to use a very simple and secure computer system as the jointly trusted arbiter. I can't help but wonder whether the best way to avoid uFAIs that secretly seek to take over the world is to somehow make use of the revelation principle.

Replies from: cousin_it## ↑ comment by cousin_it · 2010-10-15T03:27:55.337Z · LW(p) · GW(p)

I love the way Ken Binmore explains the revelation principle in "Fun and Games" in the context of auctions and mechanism design. Unfortunately a full explanation would take too much space here, but it's one of the really beautiful results in game theory. Here's an example of what it's good for: suppose you want to sell a house and you have a Bayesian prior over the prices that different buyers might be willing to pay. You're faced with the problem of setting up an auction that would maximize the expected selling price. Now there are many types of auctions: first price sealed bid, Vickrey, English, and to quote the book

...There are many others. She might set entry fees to be paid by all bidders. She might designate minimum bids, or otherwise restrict the bids that can be made. She might even seed the auction room with shills primed to push the bidding up if things look slow. The possibilities are bewilderingly large...

Nevertheless, the "revelation principle" offers the seller a rigorous way to maximize over this *whole* space of possible auctions and design the best possible mechanism (which will in fact be different for different priors over buyers). It's a startling result, but the trick is quite obvious once you understand it.

Overall, I feel that "Fun and Games" is the best book I've ever owned, even though it breaks my heart to relegate Cormen/Leiserson/Rivest and Schelling to 2nd and 3rd. I must've plugged it here once or twice already.

Replies from: paulfchristiano## ↑ comment by paulfchristiano · 2010-10-16T19:05:06.681Z · LW(p) · GW(p)

If I was a seller, I would never use what you term the "best possible mechanism". This notion of mechanism design, and more generally of rational play, is certainly interesting mathematics, but in practice it often leads to mechanisms that consistently perform very badly (sometimes it gives good mechanisms, but that is no thanks to the formalism). I don't think the revelation principle really has much to say about practical mechanisms. I'm not arguing that mechanism design shouldn't be approached theoretically; I'm just saying that you might use a different model if you re-examined the maxim "rational play ends at a Nash equilibrium" and its justification.

Replies from: Perplexed, noitanigami## ↑ comment by Perplexed · 2010-10-17T05:26:18.511Z · LW(p) · GW(p)

This notion of mechanism design, and more generally of rational play, is certainly interesting mathematics, but in practice it often leads to mechanisms that consistently perform very badly (sometimes it gives good mechanisms, but that is no thanks to the formalism).

When you say that a mechanism "performs badly", do you mean that it performs badly for one party (and hence very well for the other party) or do you mean that it performs badly for all parties to the attempted transaction?

I'm just saying that you might use a different model if you re-examined the maxim "rational play ends at a Nash equilibrium" and its justification.

Could *you* re-examined the maxim "rational play ends at a Nash equilibrium"? The usual justification is that rational play can not possibly *end* anywhere else - otherwise one rational player or the other would change strategies. What is wrong with that, in a two person game? For that matter, doesn't the justification still work when there are many players?

## ↑ comment by paulfchristiano · 2010-10-17T23:34:35.237Z · LW(p) · GW(p)

By performs badly, I meant that it fails to exhibit the properties its designers imagined, or "proved." For example, if the designers prove that this mechanism generates the maximum possible revenue and the mechanism ends up generating no revenue when deployed in practice, I would say it performs badly. Similarly, if the mechanism is intended to maximize the social welfare but then selects a pareto inefficient outcome, I would say that it performs badly.

When I say that rational play may not "end" at a Nash equilibrium, I mean that when rational players (fully aware of each others' rationality, etc.) sit down to play a game, we should not be too confident that they will play a Nash equilibrium. I think my objection to your reasoning is that the players are not sequentially given opportunities to deviate; they choose a strategy and then play it. That is the definition of a strategy; if you are allowed to change your strategy iteratively then you are playing a new game, in which the strategy set has simply been enlarged and to which a similar criticism applies. Here is an example which at least calls into doubt the normal justification.

Suppose that a mechanism with two Nash equilibria, A and B, is deployed commonly at auctions all around the world. Because the mechanism was carefully designed, the goods are allocated efficiently at both equilibria. In Italy, everyone plays equilibrium A. Knowing this, and aware of the rationality of the average Italian, all Italians participating at auctions in Italy select equilibrium A. In America, everyone plays equilibrium B. Knowing this, and aware of the rationality of the average American, all Americans participating at auctions in America select equilibrium B. Now a poor American tourist in Italy participates in an auction, and (ignorant of the world as Americans are) he tries to play equilibrium B. Consequently, the auction fails to allocate goods efficiently---the mechanism made no guarantees about what happened when some individuals played from one equilibrium and others played from a different equilibrium. The American's failure cannot be attributed to some failure of rationality; after playing, the Italians might also all wish that they had changed their strategy. This is also a real problem for mechanisms; there are certain classes of problems and mechanisms for which you can prove that this sort of thing will always be possible. You can try, as the mechanism designer, to suggest an equilibrium A to the players. But if one equilibrium is better for some players and worse for others, why should the players automatically accept your proposed equilibrium? If they all choose to play another Nash equilibrium B which is better for all of them, are they behaving irrationally?

(Of course this example does not apply to dominant strategy truthful mechanisms.)

Replies from: Perplexed## ↑ comment by Perplexed · 2010-10-18T01:01:51.114Z · LW(p) · GW(p)

By performs badly, I meant that it fails to exhibit the properties its designers imagined, or "proved." For example, if the designers prove that this mechanism generates the maximum possible revenue and the mechanism ends up generating no revenue when deployed in practice, I would say it performs badly.

Thx for the reference to Abreu-Matsushima (in your response to noitanigami). I wasn't familiar with that. Nonetheless, to the extent I understood, the mechanism only fails in the case of collusion among the bidders (violating an assumption of the proof - right?). And the seller can protect himself by making a bid himself on each item in the lot (a minimum selling price).

Similarly, if the mechanism is intended to maximize the social welfare but then selects a pareto inefficient outcome, I would say that it performs badly.

I assume you are referring to VCG here. Yeah, chalk up another failure for excessively iterated removal of dominated strategies. It seems we really do need a theory of "trembling brain equilibrium". But, then, short of a fully competitive market, *nothing* achieves Pareto optimality, so I don't think VCG should be judged too harshly. It is not a practical mechanism, but it is somewhat enlightening.

Regarding Nash equilibrium:

Here is an example which at least calls into doubt the normal justification.

But your example of the American playing B while the Italians play A *is not a Nash equilibrium*. Your example only demonstrates that it is foolish to promote a mechanism for which the equilibrium is not unique.

## ↑ comment by paulfchristiano · 2010-10-18T01:46:28.725Z · LW(p) · GW(p)

To clarify: Abreu-Matsushima fails in practice, regardless of whether there is collusion (and certainly it fails entirely if there is a coalition of even two players). VCG is dominant strategy truthful, but fails in the presence of even two colluding players. I agree that VCG is extremely interesting, but I also think that you should not consider the problem solved once you know VCG. Also, there are mechanisms which do much better than competitive markets can hope to. The question now is how well a benevolent dictator can allocate goods (or whatever you are trying to do).

I agree that my example is not a Nash equilibrium. The point was that rational players may not play a Nash equilibrium. If your notion of a reasonable solution is "it works at equilibria" then sure, this isn't a counterexample. But presumably the minimal thing you would want is "it works when the players are all perfectly rational and don't conclude" which this example shows isn't even satisfied if there are multiple Nash equilibria.

Most mechanisms don't have unique Nash equilibrium. The revelation principle also doesn't preserve the uniqueness of a Nash equilibrium, if you happened to have one at the beginning.

## ↑ comment by CronoDAS · 2010-10-17T05:41:26.223Z · LW(p) · GW(p)

A Nash equilibrium is frequently not Pareto efficient; if everyone changed their strategy at once, everyone could do better.

The Traveler's Dilemma is a game that's similar to the Prisoner's Dilemma, and humans usually don't play the Nash Equilibrium strategy.

In other words,

This notion of mechanism design, and more generally of rational play, is certainly interesting mathematics, but in practice it often leads to mechanisms that consistently perform very badly (sometimes it gives good mechanisms, but that is no thanks to the formalism)

means "people often don't behave the way game theory says they should, and assuming that they will is often foolish."

Replies from: Perplexed## ↑ comment by Perplexed · 2010-10-17T14:06:12.712Z · LW(p) · GW(p)

A Nash equilibrium is frequently not Pareto efficient; if everyone changed their strategy at once, everyone could do better.

If everyone does better at a *different* Nash equilibrium, then that just shows that being a NE is necessary, but not sufficient for mutual rationality.

If everyone does better at a joint strategy that is *not* an NE (PD, for example), then one of the players is not playing rationally - he could do better with another strategy, assuming the other player stands pat.

... people often don't behave the way game theory says they should, and assuming that they will is often foolish.

Assuming that they won't be rational can often be foolish too.

Rational-agent game theory is not claimed to have *descriptive* validity; its validity is *prescriptive* or *normative*. Or, to be more precise, it provides normatively valid advice to *you*, under the *assumption* that it is *descriptively valid for everyone else*.
And yes, I do appreciate that this is a *very weird* kind of validity for a body of theory to claim for itself.

## ↑ comment by noitanigami · 2010-10-17T02:53:57.181Z · LW(p) · GW(p)

Can you give examples of a mathematical proof leading to an ineffective mechanism?

Replies from: paulfchristiano## ↑ comment by paulfchristiano · 2010-10-17T03:59:32.633Z · LW(p) · GW(p)

For example, the Abreu-Matsushima mechanism implements essentially any implementable property at the unique rationalizable Nash equilibrium (a much stronger guarantee than anything preserved by the revelation principle). If you actually use the Abreu-Matsushima mechanism you will find that it basically never works unless the players want it to work (and often, not even then) as has been verified empirically.

The VCG mechanism maximizes social welfare whenever the players play undominated strategies. In practice, the possibility of even very weak collusion between even two players destroys this guarantee.

In general, the claim that rational players will always choose a Nash equilibrium has little empirical support, and in fact doesn't have very good theoretical support either outside of two-player zero sum games (of course in the OP the game is two-player and zero sum; there my complaint is that common knowledge of rationality is a bad assumption).

Replies from: timtyler## ↑ comment by timtyler · 2010-10-17T11:51:50.274Z · LW(p) · GW(p)

The Nash equilibrium suggests playing randomly in matching pennies - yet you can do much better than that if facing an irrational opponent - such as a typical unmodified human. The Nash equilibrium is for when *both* players play rationally.

## ↑ comment by Meni_Rosenfeld · 2010-10-14T22:25:50.850Z · LW(p) · GW(p)

I actually briefly considered mentioning correlated equilibria, but the post was getting long already.

## comment by paulfchristiano · 2010-10-16T19:18:20.039Z · LW(p) · GW(p)

The distribution of sword/armor choices in an MMO will not be the Nash equilibrium with overwhelming probability. In fact, it probably won't be anywhere close if the choice is at all complicated. If you are playing an MMO with random pvp, populated by people the people who play MMOs, and you choose the Nash equilibrium, you will probably do worse than me.

The point of the game isn't to do as well as possible against the worst possible distribution over opposing strategies. The goal is to do as well as possible over the actual distribution of strategies your opponents play. Reasoning about the distribution of strategies your opponents play is the correct approach to this problem, not reasoning about the Nash equilibrium. The two notions coincide in a certain rather rare limit, which certainly doesn't apply to any game played between anonymous people on the internet.

Of course, this doesn't lead to a very rigorously defined problem. But saying that someone who talks about what their opponents are likely to do is "wrong" is itself quite wrong. If 90% of American generals in the past have chosen to attack West, it is probably wrong to defend East with the "optimal" probability. You should not believe more about Nash equilibrium than is actually true.

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-17T06:01:41.581Z · LW(p) · GW(p)

Short answer: I already addressed this. Is your point that I didn't emphasize it enough?

One thing should be kept in mind. A Nash equilibrium strategy, much like a minimax strategy, is "safe". It makes sure your expected payoff won't be too low no matter how clever your opponent is. But what if you don't want to be safe - what if you want to win? If you have good reason to believe you are smarter than your opponent, that he will play a non-equilibrium strategy you'll be able to predict, then go ahead and counter that strategy. Nash equilibria are for smart people facing smarter people.

Long answer:

The distribution of sword/armor choices in an MMO will not be the Nash equilibrium with overwhelming probability. In fact, it probably won't be anywhere close if the choice is at all complicated.

Correct. Do you know what the distribution is? Can you gather statistics? Do you understand the mentality of your opponents so well that you can predict their actions? Can you put the game in some reference class and generalize from that? If any of the above, knock yourself out. Otherwise there's no justification to use anything but the equilibrium.

If you are playing an MMO with random pvp, populated by people the people who play MMOs, and you choose the Nash equilibrium, you will probably do worse than me.

If you assume I will continue to use the NE, even after collecting sufficient statistics to show that the players follow *some specific distribution* (as opposed to "not Nash equilibrium"), then this is a strawman. If you're just saying you're very good at understanding the mind of the MMOer then that's likely, but doesn't have much to do with the post.

But saying that someone who talks about what their opponents are likely to do is "wrong" is itself quite wrong.

I did not criticize the rare swords&armor posts that actually tried to profile their opponents and predict their actions. I criticized the posts that tried to do some fancy math to arrive at the "optimal" solution inherent to the game, and then failed to either acknowledge or reconcile the fact that their solution is unstable.

Reasoning about the distribution of strategies your opponents play is the correct approach to this problem, not reasoning about the Nash equilibrium.

One should first learn how to do Nash equilibrium, then learn how to not do Nash equilibrium. NE is the baseline default choice. If someone doesn't understand the problem well enough to realize that any suggested deterministic strategy will be unstable and that the NE is the answer to that, what hope does he have with the harder problem of reasoning about the distribution of opponents?

If 90% of American generals in the past have chosen to attack West, it is probably wrong to defend East with the "optimal" probability.

Even under the strong assumption that there is some clear reference class about which you can collect these statistics, this is true only if one of the following conditions holds:

- The attacker hasn't heard about those statistics.
- The attacker is stupid.
- The attacker is plagued with the same inside-view biases that made all his predecessors attack west.

The plausibility of each depends on the exact setting. If one holds, and we know it, and we really do expect to be attacked west with 90% probability, then this is no longer a two-player game. It's a one-player decision theory problem where we should take the action that maximizes our expected utility, based on the probability of each contingency.

But if all fail, we have the same problem all over again. The attacker will think, "paulfchristiano will look at the statistics and defend west! I'll go east". How many levels of recursion are "correct"?

Replies from: paulfchristiano, khafra## ↑ comment by paulfchristiano · 2010-10-18T00:14:44.934Z · LW(p) · GW(p)

I didn't really mean to insult your post (although I apparently did :). I was probably just as surprised as you at many of the comments in that thread. I agree that you should understand NE if you want to say anything useful about games (and that they are basically the complete story for two-player zero sum games from a theoretical perspective). The one thing I object to is the sentiment that "if you don't know exactly what is going on, you should just play NE." After re-reading your latest post this a bit of a straw man. I agree totally that if you know nothing else then you have nothing to do but play the NE, which is all that you actually said. However, you can put any game that you are faced with into the reference class of "games humans play," and so I don't think this fact is very relevant most of the time. In particular, if the question was "what should you do in an MMO with these properties" then there are many acceptable answers other than play the NE. It may or may not be the case that anyone in the thread in question actually gave one.

In particular, because I can put all games I have ever played into the reference class "games that I have ever played," I can apply, over the course of my entire life, an online learning algorithm which will allow me to significantly outperform the Nash equilibrium. In practice, I can do better in many games than is possible against perfectly rational opponents.

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-18T07:32:54.501Z · LW(p) · GW(p)

Okay, then it looks like we are in agreement.

## ↑ comment by khafra · 2010-10-18T15:19:21.783Z · LW(p) · GW(p)

The distribution of sword/armor choices in an MMO will not be the Nash equilibrium with overwhelming probability. In fact, it probably won't be anywhere close if the choice is at all complicated.

Correct. Do you know what the distribution is? Can you gather statistics? Do you understand the mentality of your opponents so well that you can predict their actions? Can you put the game in some reference class and generalize from that? If any of the above, knock yourself out. Otherwise there's no justification to use anything but the equilibrium.

To expand on palfchristiano's quibble with "if you don't know exactly what is going on, you should just play NE": The technically correct phraseology for that part may be too complicated for this post. But it would be closer to "if your belief that you can predict your opponent's non-ideal behavior, multiplied by your expected gain from exploitation of that prediction, exceeds your expected loss from failing to correctly predict your opponent's behavior, go ahead."

## comment by drc500free · 2010-10-18T03:11:15.226Z · LW(p) · GW(p)

Humans are really bad at acting according to an ideal, random strategy. See:

Wine in front of me - A term from the game Mafia, describing infinitely layered thinking, based on recursive predictions of an opponent's predictions.

Yomi (mind reading) - David Sirlin discusses the ability to predict non-ideal trends in an opponent's behavior, and how infinite layers of predictions actually repeat and only require a limited and finite set to consider.

## comment by marc · 2010-10-18T11:30:55.869Z · LW(p) · GW(p)

Although I didn't actually comment, I based my choice on the fact that most people only seem to be able cope with two or three recursions before they get bored and pick an option. The evidence for this was based on the game where you have to pick a number between 0-100 that is 2/3 of the average guess. I seem to recall that the average guess is about 30, way off true limit of 0.

Replies from: JenniferRM## ↑ comment by JenniferRM · 2010-10-21T23:26:35.676Z · LW(p) · GW(p)

The true limit would be 0 if everyone was rational, and aware of the rationality of everyone else, but rational people in the real world should be taking into account...

the fact that most people only seem to be able cope with two or three recursions before they get bored and pick an option

So what you should do, based on that, is try to figure out how many iterations "most people" will do, and then estimate the smaller percentage of the "rank one pragmatic rationalists" who will realize this and aim for 2/3 of "most people" and so on until you have accounted for 100% of the population. The trick is *knowing* that some people aren't logical means logical strategy (that will actually win) requires population modeling rather than game theory. Hearing your average of 30 makes me hypothesize that the distribution of people looks something like this:

`0.1% 100 Blind number repeaters`

`4.9% 66 Blind multipliers`

`17% 44 Beating the blind!`

`28% 30 "most people"`

`20% 20 Read a study or use a "recurse three times" heuristic`

`15% 13 Contrarian rationalists`

`10% 9 Meta-contrarian rationalists`

`4.9% 6 Economists (I kid!)`

`0.1% 0 Obstinate game theorists`

Suppose you run the experiment again (say, on a mixture of people from LW who have read this combined with others) where you expect that a lot of people might have actually seen this hypothesis, but a lot of people will naively play normally. I think the trick might be to figure out what the percentage of LWers you're dealing with is and then figure out what to do based on that.

I'm tempted (because it would be amusing) to try to estimate the percentage of LWers relative to naive, and then model the LWers as people who execute some variant of timeless decision theory in the presence of the baseline people. If I understand timeless decision theory correctly, the trick would be for everyone to independently derive the number *all the LWers* would have to pick in order for us all to tie at winning given the presence of baselines. It seems kind of far fetched, but it would be ticklesome if it ever happened :-D

OK, now I'm curious! I think I might try this at the next SoCal meetup.

**ETA:** ...or does utilitarian consequentialism mean we need to calculate the value we'd all have to vote to maximize the number of winners, whether or not they were expected to participate in the calculation? That sounds even harder.

## ↑ comment by JenniferRM · 2010-10-25T19:41:13.752Z · LW(p) · GW(p)

This is a follow up to my comment just above. At the meetup this game was played with six people: me, three other people deeply read in the sequences, and two people who hadn't read the sequences.

The results were: four people guessed 0, a deeply read person guessed 3, and one of the people not familiar with the sequences guessed 23. The target was (26/6)*(2/3) = 2 and 8/9ths, so the guess of 3 won with an error of only 1/9th.

I'll leave the other people to explain what their thinking was if they want, but I directly estimated "the obvious thing" for each of the other players and got an estimated sum of 36. Then I tried to imagine what would happen if some of us chose to do what I was doing (either with TDT or without). It became clear that even with TDT as an *intention*, differences in estimates could lead to slightly different fixed point calculations and someone winning over someone else "by accident".

I didn't personally have the mental arithmetic skills to even calculate the fixed point fast enough but I noticed that I expected at least one other 0 guess anyway. And people's guesses would be revealed so I thought maybe it would be an opportunity to signal, and I wanted to signal "non-defection" and figured other people who had this insight would also want to signal that way. So zero started to look like "the moral schelling point" that might be everyone's guess and would at least give me a good signaling outcome in compensation for not "winning along with all the other 0's", so I guessed 0.

In the immediate aftermath I felt pretty good about it when the numbers were revealed because I got the signaling outcome I wanted and my guesstimates for other people were off by enough that if I *had* defected I would have probably been at like 5 or 6 and lost anyway to the 3. So it appears that I was too close to the median to out-clever this group, and though I wasn't honest enough to *realize this at the time* my desire for a good signaling outcome left me with exactly the compensation for losing that I was hedging for, and I wasn't left high and dry :-)

Writing this, I think the "small group with guesses revealed" aspect of the situation definitely influenced me, may have been a major influence on other players as well.

If I were doing a study using a one or two shot version of this game as the paradigm for political cooperation I think I might try to experimentally vary small versus large groups, the mixture of players with game naivety and/or pre-existing relationships, and player's expectations of post-hoc analysis to see if they affect group outcomes.

My hypothesis would be that with at least one game-naive player and a significant mixture of pre-existing relationships, groups smaller than ~8 would usually converge towards signaling cooperation and game theoretic "optimal guesses", groups larger than ~20 would usually converge on vying for recognition as "the winner" with precisely calibrated "very small answers", and that post-hoc analysis expectations would positively correlate with convergence effects. My hypothesis is probably wrong in some details, but I think an experiment designed to test it would be fascinating :-)

## comment by timtyler · 2010-10-16T20:12:43.657Z · LW(p) · GW(p)

There are some nice videos about this - which I came across while researching my own matching pennies video recently:

Game Theory 101: Matching Pennies and Mixed Strategy Nash Equilibrium

The sequel discusses how to solve these kind of games - by finding the Mixed Strategy Nash Equilibrium:

## comment by whpearson · 2010-10-14T22:26:33.770Z · LW(p) · GW(p)

You can read the problem description as requiring you to only pick one set of sword + armour ( which makes sense in terms of cost of equipment etc ). So the answer "mixed strategy" may or may not be allowed. Although it can be used as reasoning for why you picked the colour combination you did.

The problem isn't well specified, are you allowed to know the current population, or history? If mixed strategies are not allowed, then all you should do is look at the current distribution of players before making a decision.

I suppose what I am saying is that while the problem statement mentioned game theory, the actual question indicated a non-game theoretic answer by not allowing mixed strategy.

Replies from: jmmcd, Meni_Rosenfeld## ↑ comment by jmmcd · 2010-10-14T22:57:47.828Z · LW(p) · GW(p)

I think you're making a distinction between one-shot and iterated games. This is a different distinction than that between mixed and non-mixed strategies. You can have a mixed strategy equilibrium in a one-shot game, as in both the swords-n-armour example and the attack/defend East/West example.

So: if the game is one-shot (and you can see the current distribution), the right decision depends only on the current distribution, as you said.

Replies from: whpearson## ↑ comment by whpearson · 2010-10-14T23:41:49.235Z · LW(p) · GW(p)

The question was

Which sword and armor combination do you choose, and why?

My point was that you can't say mixed strategy to the first question, although it is a valid answer to the second part.

You could say yellow/yellow, because that was the value selected by my RNG using the mixed strategy distribution Z.

But yes I was being imprecise with my language, I'm a little rusty. I'll fix once I have had some sleep.

## ↑ comment by Meni_Rosenfeld · 2010-10-14T22:38:21.279Z · LW(p) · GW(p)

But if you can't look at the current distribution, you still need to use the equilibrium for this single choice. Otherwise, you're at risk that everyone will think the same as you, except for a few smarter players who will counter it.

## comment by **[deleted]** ·
2010-10-17T03:22:08.169Z · LW(p) · GW(p)

Would you be willing to do a sequence on game theory? I, for one, would be eager to read it.

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-17T06:10:04.134Z · LW(p) · GW(p)

I'll consider it, But I don't know if I'm the right person for that, or if I'll have the time.

## comment by scadza · 2010-10-24T08:37:31.353Z · LW(p) · GW(p)

Please have a look at this.
Corax. Its is quite similar to the game that you have mentioned.

http://lumiere.ens.fr/~alphapsy/blog/?2006/10/11/84-coraxed

## ↑ comment by adsenanim · 2010-10-25T05:15:41.529Z · LW(p) · GW(p)

I would take the basic premise to be that we are trying the "guilty party" with the idea of "reasonable doubt".

I'm ok with "Agatha Christy" to the limit of fictional argument, but one would have to give a stronger argument than the "Corax" to find a plausible definition to a physical phenomena. After all, the whole point is to understand.

## comment by nerzhin · 2010-10-16T17:26:40.846Z · LW(p) · GW(p)

The payoffs in your matrix are for the attacker (player 1). This is probably some obvious convention to you, but I didn't know it, and I think you should say it explicitly.

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-16T17:31:49.634Z · LW(p) · GW(p)

Fixed.

## comment by Paul Crowley (ciphergoth) · 2010-10-15T11:22:42.039Z · LW(p) · GW(p)

This would make a fine top-level post. Not everything has to be super-advanced!

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-15T12:45:06.377Z · LW(p) · GW(p)

Thanks. I'll move it as soon as I have the required karma.

## comment by adsenanim · 2010-10-18T04:43:13.251Z · LW(p) · GW(p)

Nice Job!

Can you relate this to Parrondo's Paradox?

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-18T07:59:15.993Z · LW(p) · GW(p)

Now that I've looked it up, I don't think it really has the same intuitions behind it as mixed strategy NE. But it does have an interesting connection with swings. If you try to push a heavy pendulum one way, you won't get very far. Trying the other way you'll also be out of luck. But if you push and pull alternately at the right frequency, you will obtain an impressive amplitude and height. Maybe it is because I've had firsthand experience with this that I don't find Parrondo's paradox all that puzzling.

Replies from: adsenanim## ↑ comment by adsenanim · 2010-10-18T09:12:42.861Z · LW(p) · GW(p)

From what you are saying, with the mixed strategy NE, I get that possible moves increase in relation to the complexity of the equilibrium, so that it becomes increasingly likely that any possible action could have an added emphasis that would cause a specific outcome as the equilibrium increases in complexity.

e.g.

What you are describing with the pendulum motion, the pendulum does not require additional effort in both directions to increase, only one direction, and the effort need be only the smallest (or smaller in addition) in relation to the period, and direction. An action to large in the same direction, or against the direction will destabilize it.

Isn't it true that the more precise the equilibrium, the less effort is required to destabilize it?

I think that the main difference between our arguments is that while you are talking of simultaneous action, I am talking of sequential action...

Replies from: Meni_Rosenfeld## ↑ comment by Meni_Rosenfeld · 2010-10-18T19:30:32.721Z · LW(p) · GW(p)

Sorry, I'm not sure I know how to answer that.

Replies from: adsenanim## comment by Meni_Rosenfeld · 2010-10-14T20:34:38.822Z · LW(p) · GW(p)

Does anyone have an idea how to make the table formatted better?