Game theory and expected opponents

post by Manfred · 2013-11-14T23:26:19.693Z · score: 1 (2 votes) · LW · GW · Legacy · 9 comments


  Newcomb's Problem
  Newcomb's Game vs. an Ignorant opponent
  Human opponents

Thanks to V_V and Emile for some great discussion. Since writing up a post seems to reliably spark interesting comments, that's what I'll do!



If I wanted to write down a decision theory that gets the correct answer to game-theoretic problems (like playing the middle Nash equilibrium in a blind chicken-like game), it would have to, in a sense, implement all of game theory. This is hard because human-generated solutions to games use a lot of assumptions about what the other players will do, and putting those assumptions into our algorithm is a confusing problem. In order to tell what's really going on, we need to make that information more explicit. Once we do that, maybe we can get a UDT-like algorithm to make good moves in tricky games.


Newcomb's Problem

For an example of a game with unusually good information about our opponent, how about Newcomb's problem. Is it really a game, you ask? Sure, I say!

In the payoff matrix to the right, you play red and Omega plays blue. The numbers for Omega just indicate that Omega only wants to put in the million dollars if you will 1-box. If this was a normal game-theory situation, you wouldn't easily know what to do - your best move depends on Omega's move. This is where typical game theory procedure would be to say "well, that's silly, let's specify some extra nice properties the choice of both players should have so that we get a unique solution."

But the route taken in Newcomb's problem is different - we pick out a unique solution by increasing how much information the players have about each other. Omega knows what you will play, and you know that Omega knows what you will play. Now all we need to figure out what to do is some information like "If Omega has an available strategy that will definitely get it the highest possible payoff, it will take it." The best strategy, of course, is to one-box so that Omega puts in the million dollars.


Newcomb's Game vs. an Ignorant opponent

Consider another possible opponent in this game - one who has no information about what your move will be. Whereas Omega always knows your move, an ignorant opponent has no idea what you will play - they have no basis to think you're more likely to 1-box than 2-box, or vice versa.

Interestingly, for this particular payoff matrix this makes you ignorant too - you have no basis to think the ignorant opponent would rather put the money in than not, or vice versa. So you assign a 50% chance to each (probability being quantified ignorance) and find that two-boxing has the highest rewards. This didn't even require the sophistication of taking into account your own action, like the game against Omega did, since an ignorant opponent can't respond to your action.


Human opponents

Ok, so we've looked at a super-knowledgeable opponent, and a super-ignorant opponent, what does a more typical game theory situation look like? Well, it's when our opponent is more like us - someone trying to pick the strategy that gets them the best reward, with similar information to what we have. In typical games between humans, both know that the other is a human player - and they know that it's known, etc.

In terms of what we know, we know that our opponent is drawn from some distribution of opponents that are about as good as we are at games, and that they have the same information about us that we have about them.

What information do we mean we have when we say our opponent is "good at games"? I don't know. I can lay out some possibilities, but this is the crux of the post. I'll frame our possible knowledge in terms of past games, like how one could say of Newcomb's problem "you observe a thousand games, and Omega always predicts right."

Possibility 1: We know our opponent has played a lot of games against completely unknown opponents in the past, and has a good record, where "good" means "as good or better than the average opponent."

Possibility 2: We know our opponent played some games against a closed group of players who played each other, and that group collectively had a good record.

Possibility 3: We know our opponent is a neural net that's been trained in some standard way to be good at playing a variety of games, or some sort of hacked-together implementation of game theory, or a UDT agent if that's a good idea. (Seems more complicated than necessary, but on the other hand opponents are totally allowed to be complicated)


Suppose we know information set #2. I think it's the most straightforward. All we have to do to turn this information into a distribution over opponents is to figure out what mixtures of players get above-average group results, then average those together. Once we know who our opponent is on average, we just follow the strategy that on average gets the best average payoff.

Does the strategy picked this way look like what game theory would say? Not quite - it assumes that the opponent has a medium chance of being stupid.  And in some games, like the prisoner's dilemma, the best-payoff groups are actually the ones you can exploit the most.  So on closer examination, someone in a successful group isn't the game-theory opponent we're looking for.


Comments sorted by top scores.

comment by satt · 2013-11-16T16:48:56.829Z · score: 1 (1 votes) · LW(p) · GW(p)

Camerer, Ho & Chong's "cognitive hierarchy theory" of one-shot games might be of interest.

A quick sketch. Suppose the dumbest players engage in 0 steps of thought, so they just choose strategies uniformly at random. Then one can inductively define a hierarchy of ever "smarter" players who execute increasing steps of thinking: a k-step player is unaware of the existence of players with equal or higher k (Elster's "younger sibling syndrome"), but knows the relative proportion of players with lower k, and their moves are best responses to that perceived, truncated k distribution. All but the 0-step players are therefore rational, because their moves are optimal given what they believe, but because of incomplete information the game's outcome can deviate from the usual equilibria when some of the players are dumb.

comment by satt · 2013-11-16T16:10:39.566Z · score: 0 (0 votes) · LW(p) · GW(p)

If this was a normal game-theory situation, you wouldn't easily know what to do - your best move depends on Omega's move.

I would! If this were a normal game theory situation, I'd simply observe that I'm $1k better off regardless of Omega's decision if I 2-box, and I'd 2-box with a smile on my face.

Of course, Newcomb's problem complicates things. When Omega knows what I'll play, the payoff matrix in the post is no longer accurate, because Omega's ability to predict my play eliminates the ($1M, 2-box) and ($0, 1-box) moves. Hence the only possible plays are those on the main diagonal, and 1-boxing therefore dominates 2-boxing. (And, to head off the usual distraction at the pass, I'm pretty sure it doesn't matter if Omega only has a 99.9999999999% chance of correctly guessing my move; it just means you have to give up on the lossy normal form description of the game, and switch to the extensive form.)

comment by passive_fist · 2013-11-15T00:35:00.226Z · score: 0 (0 votes) · LW(p) · GW(p)

The best strategy, of course, is to one-box so that Omega puts in the million dollars.

I'm not sure I see how this follows. Care to explain?

comment by Manfred · 2013-11-15T01:08:46.248Z · score: 0 (0 votes) · LW(p) · GW(p)

You basically have two strategies - "1-box" or "2-box" (We'll ignore mixed strategies). So what you do is try to figure out what the payoff from each strategy is, given what you know about your opponent. Then you pick the strategy with the highest expected payoff.

In our Newcomb's problem game, "what you know about your opponent" includes that they know what move you take, and then get to make their own move. That is, their move actually depends on which move you make / strategy you use. So you should predict that if you use "1-box," Omega will take the action that maximizes their payoff, and if you use "2-box," Omega will take the action that maximizes their payoff.

Now that you have used your information to predict Omega's action for each of your strategies, you can read off the payoffs: $1M for "1-box" and $1k for "2-box." Then you use the strategy that results in the highest payoff.

comment by passive_fist · 2013-11-15T01:17:35.943Z · score: -1 (1 votes) · LW(p) · GW(p)

That is, their move actually depends on which move you make / strategy you use.

You're getting this the other way around. It is Omega who plays first. By the time you enter the room and the boxes are in place, it is already determined if $1000000 is in the box or not. Your move depends on Omega's, not the other way around. Omega's move depends on the knowledge it has about you.

comment by Manfred · 2013-11-15T01:59:31.608Z · score: 2 (2 votes) · LW(p) · GW(p)

Fair enough - replace"depends on" with "is ~100% correlated with."

comment by ThrustVectoring · 2013-11-15T04:21:39.366Z · score: 0 (2 votes) · LW(p) · GW(p)

Omega plays first if you take at as "what happens first", but chooses second if you take that as having the privileged position of knowing what your opponent does before you must make your choice.

comment by passive_fist · 2013-11-15T05:01:20.951Z · score: -1 (1 votes) · LW(p) · GW(p)

Not really; in Nozick's original formulation omega doesn't know what the player chooses, it can only guess. Sure it's guess may be highly likely to be correct, but it's still a guess. This distinction is important because if omega is absolutely certain what the player chooses it will always win. The player will only get $1000000 or $1000 but never $1001000.

comment by ThrustVectoring · 2013-11-15T14:57:14.238Z · score: 0 (2 votes) · LW(p) · GW(p)

Fine, replace "knowing what your opponent does" with "having a high degree of confidence in what the other player has or is going to choose". My point stands - "knowing" is pretty much just a short way of talking about the justified high degree of confidence, anyways.