Random thought: What is the optimal PD strategy under imperfect information?

post by RolfAndreassen · 2012-01-17T01:06:28.375Z · LW · GW · Legacy · 5 comments

Contents

5 comments

We know that Tit-for-Tat and variants do very well in iterated-Prisoner's-Dilemma tournaments. However, such tournaments are a bit unrealistic in that they give the agents instant and complete information about each other's actions. What if this signal is obscured? Suppose, for example, that if I press "Cooperate", there is a small chance that my action is reported to you as "Defect", presumably causing you to retaliate; and conversely, if I press "Defect" there is a chance that you see "Cooperate", thus letting me get away with cheating. Does this affect the optimal strategy? Does the probability of getting wrong information matter? What if it is asymmetric, ie P(observe C | actual D) != P(Observe D | actual C)?

5 comments

Comments sorted by top scores.

comment by Unnamed · 2012-01-17T04:03:31.538Z · LW(p) · GW(p)

Wikipedia actually has a pretty good (though brief) discussion of this. They mention three alternative strategies that could deal with noise: tit for two tats (only defect after your opponent defects twice), tit for tat with forgiveness (randomly cooperate after an opponent's defection with probability p), and contrite tit for tat (cooperate extra after you accidentally defect). The article that they cite for contrite tit for tat, Boyd (1989), looks promising. And googling contrite tit for tat turned up a relevant paper by Wu & Axelrod (1995) (pdf):

Noise, in the form of random errors in implementing a choice, is a common problem in real world interactions. Recent research has identified three approaches to coping with noise: adding generosity to a reciprocating strategy; adding contrition to a reciprocating strategy; and using an entirely different strategy, Pavlov, based on the idea of switching choice whenever the previous payoff was low. Tournament studies, ecological simulation, and theoretical analysis demonstrate: (1) A generous version of Tit for Tat is a highly effective strategy when the players it meets have not adapted to noise. (2) If the other players have adapted to noise, a contrite version of Tit for Tat is even more effective at quickly restoring mutual cooperation without the risk of exploitation. (3) Pavlov is not robust.

Boyd, R. (1989). Mistakes Allow Evolutionary Stability in the Repeated Prisoner's Dilemma Game. Journal of Theoretical Biology, 136 (1): 47-56.
Wu, J. & Axelrod, R. (1995). How to cope with noise in the iterated prisoner's dilemma. Journal of Conflict Resolution, 39, 183-189.

Replies from: bentarm
comment by bentarm · 2012-01-17T17:03:16.904Z · LW(p) · GW(p)

If I remember correctly, it matters a lot exactly what the noise parameter is. As soon as things get noisy enough, Grim (start off cooperating, then defect if the opponent has ever defected) starts to dominate all of the clever Tit for Tat variants. Obviously, if you make things noisy enough, then Always Defect becomes the best strategy, but Grim does well long before that.

We had an IPD tournament with noise at our university recently, and I entered a variant of Downing (essentially, model your opponent as some sort of Markovian process) which won quite convincingly (mostly because it could exploit Always Cooperate, which was in the initial pool of strategies, better than the TfT variants).

comment by andreas · 2012-01-17T03:51:18.962Z · LW(p) · GW(p)

The game theory textbook "A Course in Microeconomic Theory" (Kreps) addresses this situation. Quoting from page 516:

We will give an exact analysis of this problem momentarily (in smaller type), but you should have no difficulty seeing the basic trade-off; too little punishment, triggered only rarely, will give your opponent the incentive to try to get away with the noncooperative strategy. You have to punish often enough and harshly enough so that your opponent is motivated to play [cooperate] instead of [defect]. But the more often/more harsh is the punishment, the less are the gains from cooperation. And even if you punish in a fashion that leads you to know that your opponent is (in her own interests) choosing [cooperate] every time (except when she is punishing), you will have to "punish" in some instances to keep your opponent honest.

comment by Manfred · 2012-01-17T02:24:05.058Z · LW(p) · GW(p)

Two-player games don't always have such a thing as an optimal strategy. However, given a certain opponent or group of opponents/allies, then there can be an optimal strategy.

Personally, as a guess, I'd say you have some threshold of allowable defections as a cumulative average.