Prisoner's Dilemma as a Game Theory Laboratory
post by prase · 2011-08-25T14:30:29.507Z · LW · GW · Legacy · 47 commentsContents
47 comments
Last year Yvain had organised a Diplomacy game between LessWrong users to test how well we perform in practical application of game theory. At least two games had been played, but as far as I know no analysis was made afterwards. One reason is probably that few games involving complex interactions between players constitute at most anecdotal evidence for whatever hypothesis one may test. The second one is lack of comparison to outside players. Although the games were fun, their value as a game theory experiment remains rather low. Could we test our game theoretic skills in a statistically more significant way?
Only recently I have learned about Robert Axelrod's experiment in which he run a competition of different strategies playing iteraded prisoner's dilemma, and got an idea to replicate it. I have already run a similar experiment with five contestants (all being my friends) and now a second run is being prepared, with at least nine strategies in the pool. I am interested in a third run, this time with strategies nominated by LessWrongers. The contestants of the second run which has identical rules are readers of my blog and neither of them is probably familiar with specific LW ideas. Therefore, they would serve as a fairly good control group to test LW's applied rationality skills (or a subset of). After matching the strategies in both groups separately, I plan to put all of them together and see who wins.
So, if you want to participate in this contest, feel free to send me your strategy. The rules are following.
- By a strategy I mean a program sent by a contestant or coded according to his/her instructions. The strategies compete in iterated prisoner's dilemmas. A single iteration I will call a turn. In each turn each strategy has to choose between cooperating and defecting. The payoffs are:
- if both cooperate, 4 points for each
- if both defect, 1 point for each
- else 7 points for the defector and 0 points for its cooperating opponent
- By a match I mean a series of 100 iterations between the same opponents.
- There will be two different competitions, the round-robin tournament and the evolutionary tournament. Two separate final standings will be made, one for each tournament. Any received strategy has to participate in both.
- In the round-robin tournament strategies will play one match against each other (not against a copy of themselves). The winner will be the strategy which acquires the highest total number of points. Number of won or lost matches is disregarded.
- The evolutionary tournament will simulate evolution of a population of strategies. In the beginning, equal number of copies of each strategy is present in the pool. Then the strategies are paired randomly (now a strategy may be paired against a copy of itself) and each pair plays one match. In the next generation pool, the strategies will be represented in numbers proportional to the total number of points won by all its copies in the present generation. The total population will be maintained at a constant level (probably 2,000 strategies, up to rounding errors). Strategies may go extinct. The strategy with the highest number of copies after the 100th generation will be considered a winner.
- A strategy has access to results of all previous turns in its current match and the number of current turn. It can decide randomly (a pseudo-random generator will be used). It does not have access to results of other matches (including its own previous matches), the number of current generation, population sizes, number of points won by any strategy in any phase of the tournament (except the number of points already won in the current match which can be calculated from the previous turns results) and its opponent's identity (namely it doesn't know whether it plays against a copy of itself). The strategies obviously can't read their opponent's source code.
- Each person can send only one strategy. Be honest.
- The strategy can be described in any comprehensible language. If I find problems in understanding, which will probably happen if the strategy is described in Lisp or Basque, but can happen even if it is written in English, I will ask.
- You should send your strategies by private message, not in comments to this post. Your opponents shouldn't know what you have prepared.
- The strategy needn't be original, but if I get two identical strategies, I will treat them as one.
- A Fully Random strategy is automatically included in the tournament. It plays defect or cooperate randomly with 50% chance each turn.
- Names of the authors of the strategies will be published by default. If you wish your name excluded, specify it in your message, and your strategy will compete anonymously.
The simulation will probably not be run before at least eight strategies are collected and before the beginning of September. The competition is closed, no new strategies are accepted at this moment. 21 different strategies were accepted, their implementations are now being tested. Results will be probably posted on Sunday 4th September.
[Edit: Found inconsistency in using words round and turn to denote the same thing. Now turn is used everywhere.]
47 comments
Comments sorted by top scores.
comment by RobertLumley · 2011-08-25T17:38:43.500Z · LW(p) · GW(p)
I wish I had been around when the initial Diplomacy game was organized. Diplomacy is a hobby of mine. And that project would have been much better organized (and recorded) on the website where I play, WebDiplomacy.
Furthermore, if you wanted a more easily analyzed metric, you can play "gunboat" diplomacy, where you're not allowed to communicate between countries, and the only communication are by moves - you can use support holds to indicate a request for an alliance, etc. I particularly enjoy this form of diplomacy, as it's far more like seven player chess.
A lot of the people on WebDiplomacy are math/science people; the two communities (if LessWrong could get over the HUGE number of trolls on WebDip) would actually get along really well. There are a number of metrics that they collect, but below is a link to a project that one person did, wherein, among other things, he analyzed the most statistically successful openings for the various countries. Also, below that, I have a link to my personal statistics, if you'd find those interesting.
http://tinyurl.com/DiplomacyReport
http://tinyurl.com/SmileysDipStats
I figure I've promoted LessWrong on WebDip enough, I ought to promote WebDip here too... :-)
Replies from: Randaly, prase↑ comment by Randaly · 2011-08-25T18:39:03.262Z · LW(p) · GW(p)
There was another game, mostly including the NY members, which took place on WebDiplomacy. The game is here; Zvi Mowshowitz won.
Replies from: RobertLumley↑ comment by RobertLumley · 2011-08-25T19:01:26.556Z · LW(p) · GW(p)
Errrrr.... Points Per Supply Center...
Certainly excusable for LWers, but any serious play on WebDip is WTA. :-P
This is a topic highly debated on WebDiplomacy, (I could go on) but Babak said it best when he said (paraphrasing) "Playing for a strong second in PPSC in diplomacy is like shooting it in your own goal in soccer just because you want to score a goal."
↑ comment by prase · 2011-08-25T18:32:04.703Z · LW(p) · GW(p)
And that project would have been much better organized (and recorded) on the website where I play, WebDiplomacy.
Perhaps it would, but Yvain's moderation and reports after each turn gave it a unique flavour which standard WebDip games lack. Anyway, it may be easy to organise another game for LWers on WebDip.
Replies from: RobertLumley↑ comment by RobertLumley · 2011-08-25T18:35:55.218Z · LW(p) · GW(p)
I'd love to play, if there's interest.
Replies from: Vaniver↑ comment by Vaniver · 2011-08-25T19:46:03.741Z · LW(p) · GW(p)
There will definitely be tons of interest. Since I claimed I was willing to set up another game after winning one of the LW games, I probably ought to go ahead and make this happen.
Replies from: RobertLumley↑ comment by RobertLumley · 2011-08-25T20:06:50.529Z · LW(p) · GW(p)
I'm happy to help if you want. I was considering just making a post myself, but I wanted to gauge whether or not there was any interest at all.
And here I was thinking I might take a bit of a break from diplomacy after my games finish up so I can focus more on my school work...
Replies from: Vaniver↑ comment by Vaniver · 2011-08-25T21:13:29.949Z · LW(p) · GW(p)
Thanks for the offer! I'm planning on mostly doing the administrative stuff of sorting people, as I've never used WebDiplomacy before. (I imagine it's pretty easy to use, but experience generally helps). The previous post has been updated with a link, if you'd like to sign up.
comment by Eugine_Nier · 2011-09-05T14:30:22.205Z · LW(p) · GW(p)
Well, it's already September 5th. Where are the results?
comment by malthrin · 2011-08-25T18:14:30.518Z · LW(p) · GW(p)
Nifty idea. Would folks be interested in this sort of thing as a web site?
Replies from: prase↑ comment by prase · 2011-08-25T18:34:35.947Z · LW(p) · GW(p)
What exactly do you have in mind?
Replies from: malthrin↑ comment by malthrin · 2011-08-25T18:50:16.393Z · LW(p) · GW(p)
I'm envisioning:
- A textual interface where people can enter strategies in psuedocode or possibly a programming language that I can sandbox.
- A scheduled job to run games along the lines you've described above every day? hour?
- A page to display the outcome of previous tournaments.
↑ comment by prase · 2011-08-25T19:19:28.824Z · LW(p) · GW(p)
I would expect it collapse into the Nash equilibrium quite soon.
Replies from: benelliott↑ comment by benelliott · 2011-09-01T16:55:53.938Z · LW(p) · GW(p)
This would be an interesting result, since the Nash equilibrium of fixed length prisoner's dilemma is defectbot, and defectbots have not had a great rate of success in past tournaments.
Replies from: prase↑ comment by prase · 2011-09-01T17:04:01.551Z · LW(p) · GW(p)
Yes. It would depend on concrete implementation and number of players. The numbers needed for defectbots to actually thrive are very big and it would take a long time to converge into Nash, so I was perhaps a bit overconfident with my prediction.
Replies from: red75↑ comment by red75 · 2011-09-01T20:03:46.338Z · LW(p) · GW(p)
Moreover, simulations I ran using your rules for evolutionary tournament show that one strategy quickly dominates and others go extinct. Defectbot is among strategies which are fastest to go extinct (even in presence of cooperatebot) as it feeds off overaltruist strategies, which in turn fail to compete with tit-for-tat. So I doubt that at least evolutionary tournament will converge into Nash.
I predict that strategy that tit-for-tats 99 turns and defects on 100-th one will win in evolutionary tournament, given that tit-for-tat is also in the population.
ETA: I've sent another strategy.
comment by jimrandomh · 2011-08-25T16:37:36.226Z · LW(p) · GW(p)
The strategy can be described in any comprehensible language. If I find problems in understanding, which will probably happen if the strategy is described in Lisp or Basque, but can happen even if it is written in English, I will ask.
You're going to be translating these all into computer programs so you can run them, right? You should specify which language it's going to be, so we can save you some work.
Replies from: prase↑ comment by prase · 2011-08-25T16:55:30.904Z · LW(p) · GW(p)
I have already a functional code written in Wolfram Mathematica which simulates the tournaments.
If the number of strategies will not be too big, it is easier for me to code them than to write down the instructions needed for the readers to know how I represent the data, not to speak about help for those unfamiliar with Mathematica's syntax. Simple strategies are usually one-liners, e.g. tit-for-tat has 29 characters.
Replies from: jimrandomh↑ comment by jimrandomh · 2011-08-25T17:07:53.165Z · LW(p) · GW(p)
Okay, but what about complicated strategies? Even someone playing tit-for-tat will want to augment it with a special case to switch to defection rock when they're playing against Fully Random.
Replies from: prasecomment by Caerbannog · 2011-08-25T15:54:55.307Z · LW(p) · GW(p)
This should be interesting. I've sent you my strategy by private message.
comment by red75 · 2011-08-31T07:22:59.009Z · LW(p) · GW(p)
In the meantime I've run my own simulation, studying a group of strategies, which perform as tit-for-tat except that at specific turn they defect and then they use result of this turn to switch to defect stone or continue tit-for-tatting. Thus they recognize copies of itself and cooperate with them. Such strategy can be exploited by switching to defect stone before it does, or by mimicking its behavior (second defect check after first. This case I didn't analyze).
It leads to interesting results in evolutionary tournament. Second fairest (second longest period of tit-for-tatting) strategy wins. It outperforms less fair strategies by longest cooperation with fairest strategy. And it outperforms fairest strategy by exploiting it.
Replies from: prase↑ comment by prase · 2011-09-01T20:17:17.431Z · LW(p) · GW(p)
Define strategy S[n] as TfT until turn n and defect ever since. In the limit of infinite population having non-zero initial number of S[n] for each n, S[0], i.e. DefectBot, eventually dominates. Starting with equal subpopulations, initially most successful is S[99] which preys on S[100] and finally drives it to extinction. But then, S[98] gains advantage over S[99] and so on.
With not so big population however, the more defectorish strategies die out sooner than the environment becomes suitable for them. (I have done it with population of 2000 strategies and the lowest surviving after several hundred generations was S[80] or so).
Replies from: red75, Eugine_Nier↑ comment by Eugine_Nier · 2011-09-02T02:53:44.780Z · LW(p) · GW(p)
Except that in the initial state S[0] will get driven to extinction long before s[100] will.
Replies from: prasecomment by Eugine_Nier · 2011-08-26T06:03:11.384Z · LW(p) · GW(p)
Submitted a strategy.
Replies from: lessdazed↑ comment by lessdazed · 2011-08-26T07:44:00.667Z · LW(p) · GW(p)
Does it have a name?
Replies from: Eugine_Nier↑ comment by Eugine_Nier · 2011-08-27T08:00:17.400Z · LW(p) · GW(p)
Haven't thought of one.
comment by ArisKatsaris · 2011-08-26T05:45:34.883Z · LW(p) · GW(p)
I've sent a strategy, coded in Java.
comment by [deleted] · 2011-08-25T20:16:06.531Z · LW(p) · GW(p)
I have submitted a strategy and look forward to seeing the result.
comment by Vaniver · 2011-08-25T19:40:43.103Z · LW(p) · GW(p)
So, anyone want to bet on what proportion of strategies will be tit-for-tat? I'm fairly confident it'll be more than half.
(I should be clear, I mean the number of submitted strategies, as duplicate strategies are treated as one strategy.)
Replies from: benelliott, jhuffman↑ comment by benelliott · 2011-08-25T20:09:06.769Z · LW(p) · GW(p)
In a tournament with a fixed number of rounds and a known random strategy, it would be a bit weird to just use vanilla tit-for-tat, but yeah. It will be a bit boring if all the strategies are nice.
Replies from: torekp↑ comment by torekp · 2011-08-25T22:42:31.023Z · LW(p) · GW(p)
tournament with a fixed number of rounds
That's a bit disappointing, but so be it.
Replies from: benelliott↑ comment by benelliott · 2011-08-25T23:28:23.104Z · LW(p) · GW(p)
It did say so in the OP.
Besides, its not such a big deal, the paradox of induction is an interesting problem, if I was very optimistic I might suggest that this could throw some light on it.