# The Truly Iterated Prisoner's Dilemma

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-09-04T18:00:00.000Z · LW · GW · Legacy · 85 comments**Followup to**: The True Prisoner's Dilemma

For everyone who thought that the rational choice in yesterday's True Prisoner's Dilemma was to defect, a follow-up dilemma:

Suppose that the dilemma was not one-shot, but was rather to be repeated exactly 100 times, where for *each round,* the payoff matrix looks like this:

Humans: C | Humans: D | |

Paperclipper: C | (2 million human lives saved, 2 paperclips gained) | (+3 million lives, +0 paperclips) |

Paperclipper: D | (+0 lives, +3 paperclips) | (+1 million lives, +1 paperclip) |

As most of you probably know, the king of the classical iterated Prisoner's Dilemma is Tit for Tat, which cooperates on the first round, and on succeeding rounds does whatever its opponent did last time. But what most of you may not realize, is that, *if you know when the iteration will stop*, Tit for Tat is - according to classical game theory - *irrational.*

Why? Consider the 100th round. On the 100th round, there will be no future iterations, no chance to retaliate against the other player for defection. Both of you know this, so the game reduces to the one-shot Prisoner's Dilemma. Since you are both classical game theorists, you both defect.

Now consider the 99th round. Both of you know that you will both defect in the 100th round, regardless of what either of you do in the 99th round. So you both know that your future payoff doesn't depend on your current action, only your current payoff. You are both classical game theorists. So you both defect.

Now consider the 98th round...

With humanity and the Paperclipper facing 100 rounds of the iterated Prisoner's Dilemma, do you *really truly think* that the rational thing for both parties to do, is steadily defect against each other for the next 100 rounds?

## 85 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

## comment by Kevin_Dick · 2008-09-04T18:20:39.000Z · LW(p) · GW(p)

I think you may be attacking a straw man here. When I was taught about the PD almost 20 years ago in an undergraduate class, our professor made exactly the same point. If there are enough iterations (even if you know exactly when the game will end), it can be worth the risk to attempt to establish cooperation via Tit-for-Tat. IIRC, it depends on an infinite recursion of your priors on the other guy's priors on your priors, etc. that the other guy will attempt to establish cooperation. You compare this to the expected losses from a defection in the first round. For a large number of rounds, even a small (infinitely recursed) chance that the other guy will cooperate pays off. Of course, you then have to estimate when you think the other guy will start to defect as the end approaches. But once you had established cooperation, I seem to recall that this point was stable given the ratio of the C and D payoffs.

Replies from: None## ↑ comment by **[deleted]** ·
2013-12-28T15:58:49.731Z · LW(p) · GW(p)

I think you may be attacking a straw man here.

It frustrates me immensely to see how many times this claim is made in the comments of Eliezer's posts. At *least* 75% of the times I read this I've *personally* encountered someone who made the "straw" claim. In this case, consult the first chapter of Ken Binmore's "Playing for Real".

## comment by Silas · 2008-09-04T18:21:13.000Z · LW(p) · GW(p)

Wait wait wait: Isn't this the same kind of argument as in the dilemma about "We will execute you within the next week on a day that you won't expect"? (Sorry, don't know the name for that puzzle.) In that one, the argument goes that if it's the last day of the week, the prisoner knows that's the last chance they have to execute him, so he'll expect it, so it can't be that day. But then, if it's the next-to-last day, he knows they can't execute him on the last day, so they have to execute him on that next-to-last day. But then he expects it! And so on.

So, after concluding they can't execute him, they execute him on Wednesay. "Wait! But I concluded you can't do this!" "Good, then you didn't expect it. Problem solved."

Just as in that problem, you can't stably have an "(un)expected execution day", you can't have an "expected future irrelevance" in this one.

Do I get a prize? No? Okay then.

## comment by Tom_P · 2008-09-04T18:35:24.000Z · LW(p) · GW(p)

A more realistic model would let the number of iterations to be unknown to the players. If the probability that the "meta-game" continues in each stage is high enough, it pays to cooperate.

The conclusion that the only rational thing to do in a 100 stage game with perfectly rational players is to defect is correct, but is an artifact of the fact that the number of stages has been defined precisely, and therefore the players can plan to defect at the last moment (which makes them want to defect progressively earlier and earlier). In the real world, this seems rather unlikely.

## comment by Vladimir_Nesov · 2008-09-04T18:40:17.000Z · LW(p) · GW(p)

Silas,

It's called Unexpected hanging paradox and I linked to it in my sketch of the solution to the one-off dilemma. I agree, the same problem seems to be at work here, and it's orthogonal to two-step argument that takes us from mutual cooperation to mutual defection. You need to mark the performance of complete policies established in the model at the start of the experiment, and not reason backwards, justifying the actions that could have changed the consequences by inevitability of consequences. Again, I'm not quite sure how it all ties together.

## comment by denis_bider · 2008-09-04T18:58:01.000Z · LW(p) · GW(p)

What Kevin Dick said.

The benefit to each player from mutual cooperation in a majority of the rounds is much more than the benefit from mutual defection in all rounds. Therefore it makes sense for both players to invest at the beginning, and cooperate, in order to establish each other's trustworthiness.

Tit-for-tat seems like it might be a good strategy in the very early rounds, but as the game goes on, the best reaction to defection might become two defections in response, and in the last rounds, when the other party defects, the best response might be all defections until the end.

## comment by Nominull3 · 2008-09-04T19:16:32.000Z · LW(p) · GW(p)

No, but I damn well expect you to defect the hundredth time. If he's playing true tit-for-tat, you can exploit that by playing along for a time, but cooperating on the hundredth go can't help you in any way, it will only kill a million people.

Do not kill a million people, please.

## comment by Sebastian_Hagen2 · 2008-09-04T19:35:14.000Z · LW(p) · GW(p)

Do youNo. That seems obviously wrong, even if I can't figure out where the error lies.really truly thinkthat the rational thing for both parties to do, is steadily defect against each other for the next 100 rounds?

We only get a reversion to the (D,D) case if we know with a high degree of confidence that the other party doesn't use naive Tit for Tat, and they know that we don't. That seems like an iffy assumption to me. If we knew the

*exact*algorithm the other side uses, it would be trivial to find a winning strategy; so how do we know it isn't naive Tit for Tat? If there's a sufficiently high chance the other side is using naive Tit for Tat, it might well be optimal to repeat their choices until the second-to-last round.

## comment by Allan_Crossman · 2008-09-04T19:55:34.000Z · LW(p) · GW(p)

If it's actually common knowledge that both players are "perfectly rational" then they must do whatever game theory says.

But if the paperclip maximizer knows that we're not perfectly rational (or falsely believes that we're not) it will try and achieve a better score than it could get if we were in fact perfectly rational. It will do this by cooperating, at least for a time.

I think correct strategy gets profoundly complicated when one side believes the other side is not fully rational.

## comment by Dagon · 2008-09-04T19:59:55.000Z · LW(p) · GW(p)

I THINK rational agents will defect 100 times in a row, or 100 million times in a row for this specified problem. But I think this problem is impossible. In all cases there will be uncertainty about your opponent/partner - you won't know its utility function perfectly, and you won't know how perfectly it's implemented. Heck, you don't know your OWN utility function perfectly, and you know darn well you're implemented somewhat accidentally. Also, there are few real cases where you know precisely when there will be no further games that can be affected by the current choice.

In cases of uncertainty on these topics, cooperation can be rational. Something on the order of tit-for-tat with an additional chance of defecting or forgiving that's based on expectation of game ending with this iteration might be right.

## comment by Zubon · 2008-09-04T20:14:46.000Z · LW(p) · GW(p)

Shut up and multiply. Every time you make the wrong choice, 1 million people die. What is your probability that Clippy is going to throw that first C? How did you come to that? You are not allowed to use any version of thinking back from what you would want Clippy to do, or what you would do in its place if you really I promise valued only paperclips and not human lives.

You throw a C, Clippy throws a D. People die, 99 rounds to go. You have just shown Clippy that you are at least willing to cooperate. What is your probability that Clippy is going to throw a C next? Ever?

You throw a C, Clippy throws a D. People die, 98 rounds to go. Are you showing Clippy that you want to cooperate, so it can safety cooperate, or are you just an unresponsive player who will keep throwing Cs no matter what he does? And what does it say to you that Clippy has thrown 2 Ds?

Alternate case, round 1: you throw a C, Clippy throws a C. People live, 99 rounds to go. At what point are you planning to start defecting? Do you think Clippy can't work out that logic too? When do you think Clippy is planning to start defecting?

## comment by andrew7 · 2008-09-04T20:20:54.000Z · LW(p) · GW(p)

Finitely iterated prisoner's dilemma is just like the traveler's dilemma, on which see this article by Kaushik Basu. The "always defect" choice is always a (in fact, the only) Nash equilibrium and an evolutionarily stable strategy, but it turns out that if you measure how stable it is, it becomes less stable as the number of iterations increases. So if there's some kind of noise or uncertainty (as Dagon points out), cooperation becomes rational.

## comment by CarlShulman · 2008-09-04T20:25:50.000Z · LW(p) · GW(p)

If you cooperate even once, the common 'knowledge' that you are both classical game theorists is revealed (to all parties) to be false, and your opponent will have to update estimates of your future actions.

## comment by Allan_Crossman · 2008-09-04T20:47:02.000Z · LW(p) · GW(p)

Carl - good point.

I shouldn't have conflated perfectly rational agents (if there are such things) with classical game-theorists. Presumably, a perfectly rational agent could make this move for precisely this reason.

Probably the best situation would be if we were so transparently naive that the maximizer could actually verify that we were playing naive tit-for-tat, including on the last round. That way, it would cooperate for 99 rounds. But with it in another universe, I don't see how it can verify anything of the sort.

(By the way, Eliezer, how much communication is going on between us and Clippy? In the iterated dilemma's purest form, the only communications are the moves themselves - is that what we are to assume here?)

## comment by prase · 2008-09-04T21:33:28.000Z · LW(p) · GW(p)

Zubon,

*When do you think Clippy is planning to start defecting?*

If Clippy decides the same way as I do, then I expect he starts defecting at the same turn as I do. The result is 100x C,C. There is no way how identical deterministic algorithms with the same input can result in different outputs, so in each turn, C,C or D,D are the only possibilities. It's rational to C.

However, "realistic" Clippy uses different algorithm which is unknown to me. Here I genuinely don't know what to do. To have some preference to choose C over D or conversely, I would need at least some rough prior probability distribution on the space of all possible decision algorithms suitable for Clippy. But I can hardly imagine such a space.

Reminds me a bit the problem of two envelopes where you know that one of them has 10 times greater amount of money than the second, but otherwise these amounts are *random*. (V.Nesov, do you know the canonical name of this paradox?) You open the first, find some amount, and then have to choose between accepting it or taking the second envelope. You cannot resolve that without having some idea about what "random" here means, how the amounts of money were distributed into the envelopes. If you don't know anything about the process, you face questions like "what is the most natural probability distribution on the interval (0,\infty)?", that I don't know how to answer.

Anyway, I think these dilemmas are typical illustration of insufficient information for any rational decision. Without information any decision is ruled by bias.

Replies from: Benquo## ↑ comment by Benquo · 2012-08-22T12:56:36.028Z · LW(p) · GW(p)

Actually, there is something you can do to improve the outcome over always accepting or always switching, without knowing the distribution of money.

All you need to do is define your probability of switching according to some function that decreases as the amount of money in the envelope increases. So for example, you could switch with probability exp(-X), where X is the amount of money in the envelope you start with.

Of course, to have an exactly optimal strategy, or even to know how much that general strategy will benefit you, you would need to know more about the distribution.

## comment by Peter4 · 2008-09-04T21:37:06.000Z · LW(p) · GW(p)

The backwards reasoning in this problem is the same as is used in the unexpected hanging paradox, and similar to a problem called Guess 2/3 of the Average. This is where a group of players each guess a number between 0 and 100, and the player whose guess is closest to 2/3 of the average of all guesses wins. With thought and some iteration, the rational player can conclude that it is irrational to guess a number greater than (2/3)*100, (2/3)^2*100, (2/3)^n*100, etc. This has a limit at 0 when n -> ∞, so it is irrational to guess any number greater than zero.

*"I think correct strategy gets profoundly complicated when one side believes the other side is not fully rational."*

Very true. When you're not playing with "rational" opponents, it turns out that this strategy's effectiveness diminishes after n=1 (regardless, the average will never be greater than 67), and you'll probably lose if you guess 0 - how can you be rational in irrational times? If everybody is rational, but there is no mutual knowledge of this, the same effect occurs.

The kick is this: even if you play with irrationals, they're going to learn - even in a 3rd grade classroom, eventually the equilibrium sets in at 0, after a few rounds of play. After the first round, they'll adjust their guesses, and each round the 2/3 mean will get lower until it hits 0. At that point, even if people don't rationally understand the process, they're guessing 0.

That's what equilibrium is all about - you might not start there, or notice the tendency towards it, but once it's achieved it persists. Players don't even need to understand the "why" of it - the reason for which they cannot do better.

That's a little offshoot, not entirely sure how well it relates. But back to the TIPD...

*"do you really truly think that the rational thing for both parties to do, is steadily defect against each other for the next 100 rounds?*

Yes, but I'm not entirely sure it matters. If that's where the equilibrium is, that's the state the game is going to tend towards. Even a single (D, D) game might irrevocably lock the game into that pattern.

## comment by Paul_Gowder · 2008-09-04T22:14:02.000Z · LW(p) · GW(p)

Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there's a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.

Start here:

Robert Aumann. 1995. "Backward Induction and Common Knowledge of Rationality." Games and Economic Behavior 8:6-19.

Cristina Bicchieri. 1988. "Strategic Behavior and Counterfactuals." Synthese 76:135-169.

Cristina Bicchieri. 1989. "Self-Refuting Theories of Strategic Interaction: A Paradox of Common Knowledge." Erkenntnis 30:69-85.

Ken Binmore. 1987. "Modeling Rational Players I." Economics and Philosophy 3:9-55.

Jon Elster. 1993. "Some unresolved problems in the theory of rational behaviour." Acta Sociologica 36: 179-190.

Philip Reny. 1992. "Rationality in Extensive-Form Games." The Journal of Economic Perspectives 6:103-118.

Phillip Petit and Robert Sugden. 1989. "The Backward Induction Paradox." The Journal of Philosophy 86:169-182.

Brian Skyrms. 1998. "Subjunctive Conditionals and Revealed Preference." Philosophy of Science 65:545-574

Robert Stalnaker. 1999. "Knowledge, Belief and Counterfactual Reasoning in Games." in Cristina Bicchieri, Richard Jeffrey, and Brian Skyrms, eds., The Logic of Strategy. New York: Oxford University Press.

## comment by Venkat · 2008-09-04T22:23:48.000Z · LW(p) · GW(p)

I've wondered about, and even modeled versions of the fixed horizon IPD in the past. I concluded that so long as the finite horizon number is sufficiently large in the context of the application (100 is large for prison scenarios, tiny for other applications), a proper discounted accounting of future payoffs will restore TFT as an ESS. Axelrod used discounting schemes in various ways in his book(s).

The undiscounted case will always collapse. Recursive collapse to defect is actually rational and a good model for some situations, but you are right, in other situations it is both silly and not what people do, so it is the wrong model. If there is a finite horizon case where discounting is not appropriate, then I'd analyze it differently. To stop the recursive collapse, let the players optimize over possible symmetric reasoning futures...

## comment by Vladimir_Nesov · 2008-09-04T22:39:57.000Z · LW(p) · GW(p)

prase, Venkat: There is nothing symmetrical about choices of two players. One is playing for paperclips, another for different number of lives. One selects P2.Decision, another selects P1.Decision. How to recognize the "symmetry" of decisions, if they are not called by the same name? What makes it the answer in that case?

prase: It's Two envelopes problem.

## comment by RobinHanson · 2008-09-04T23:07:34.000Z · LW(p) · GW(p)

As Paul says, this is very well trodden ground. Since it hasn't been assumed that we are sure we know how the other party reasons, we might want to invest some early rounds in probing to see how the party thinks.

## comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-09-05T00:23:51.000Z · LW(p) · GW(p)

Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there's a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.

My position is already sorted, I assure you. I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.

As Paul says, this is very well trodden ground. Since it hasn't been assumed that we are sure we know how the other party reasons, we might want to invest some early rounds in probing to see how the party thinks.

As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.

True, the iteration does present the possibility of "exploiting" an "irrational" opponent whose "irrationality" you can probe and detect, if there's any doubt about it in your mind. But that doesn't resolve the fundamental issue of rationality; it's like saying that you'll one-box on Newcomb's Problem if you think there's even a slight chance that Omega is hanging around and will secretly manipulate box B *after* you make your choice. What if neither party to the IPD thinks there's a realistic chance that the other party is *stupid* - if they're both superintelligences, say? Do they automatically defect against each other for 100 rounds?

And are you really "exploiting" an "irrational" opponent, if the party "exploited" ends up better off? Wouldn't you end up *wishing you were stupider, so you could be exploited* - wishing to be *unilaterally* stupider, regardless of the other party's intelligence? Hence the phrase "regret of rationality"...

## ↑ comment by Mati_Roy (MathieuRoy) · 2014-02-05T12:07:27.428Z · LW(p) · GW(p)

Do you mean "I cooperate with the Paperclipper if AND ONLY IF I think it will one-box on Newcomb's Problem with myself as Omega AND I think it thinks I'm Omega AND I think it thinks I think it thinks I'm Omega, etc." ? This seems to require an infinite amount of knowledge, no?

Edit: and you said "We have never interacted with the paperclip maximizer before", so do you think it would one-box?

Replies from: Philip_W## ↑ comment by Philip_W · 2015-06-25T09:32:16.161Z · LW(p) · GW(p)

I think he means "I cooperate with the Paperclipper IFF it would one-box on Newcomb's problem with myself (with my present knowledge) playing the role of Omega, where I get sent to rationality hell if I guess wrong". In other words: If Elezier believes that if Elezier and Clippy were in the situation that Elezier would prepare for one-boxing if he expected Clippy to one-box and two-box if he expected Clippy to two-box, Clippy would one-box, then Elezier will cooperate with Clippy. Or in other words still: If Elezier believes Clippy to be ignorant and rational enough that it can't predict Elezier's actions but uses game theory at the same level as him, then Elezier will cooperate.

In the uniterated prisoner's dilemma, there is no evidence, so it comes down to priors. If all players are rational mutual one-boxers, and all players are blind except for knowing they're all mutual one-boxers, then they should expect everyone to make the same choice. If you just decide that you'll defect/one-box to outsmart others, you may expect everyone to do so, so you'll be worse off than if you decided not to defect (and therefore nobody else would rationally do so either). Even if you decide to defect based on a true random number generator, then for

(2,2) (0,3)

(3,0) (1,1)

the best option is still to cooperate 100% of the time.

If there are less rational agents afoot, the game changes. The expected reward for cooperation becomes 2(xr+(1-d-r)) and the reward for defection becomes 3(xr+(1-d-r))+d+(1-x)r=1+2(xr+(1-d-r)), where r is the fraction of agents who are rational, d is the fraction expected to defect, x is the probability with which you (and by extension other rational agents) will cooperate, and (1-d-r) is the fraction of agents who will always cooperate. Optimise for x in 2x(xr+(1-d-r))+(1-x)(1+2(xr+(1-d-r)))=1-x+2(xr-1-d-r)=x(2r-1)-(1+2d+2r); which means you should cooperate 100% of the time if the fraction of agents who are rational r > 0.5, and defect 100% of the time if r < 0.5.

In the iterated prisoner's dilemma, this becomes more algebraically complicated since cooperation is evidence for being cooperative. So, qualitatively, superintelligences which have managed to open bridges between universes are probably/hopefully (P>0.5) rational, so they should cooperate on the last round, and by extension on every round before that. If someone defects, that's strong evidence to them not being rational or having bad priors, and if the probability of them being rational drops below 0.5, you should switch to defecting. I'm not sure if you should cooperate if your opponent cooperates after defecting on the first round. Common sense says to give them another chance, but that may be anthropomorphising the opponent.

If the prior probability of inter-universal traders like Clippy and thought experiment::Elezier is r>0.5, and thought experiment::Elezier has managed not to make his mental makeup knowable to Clippy and vice versa, then both Elezier and Clippy ought to expect r>0.5. Therefore they should both decide to cooperate. If Elezier suspects that Clippy knows Elezier well enough to predict his actions, then for Elezier 'd' becomes large (Elezier suspects Clippy will defect if Elezier decides to cooperate). Elezier unfortunately can't let himself be convinced that Clippy would cooperate at this point, because if Clippy knows Elezier, then Clippy can fake that evidence. This means both players also have strong motivation not to create suspicion in the other player: knowing the other player would still mean you lose, if the other player finds out you know. Still, if it saves a billion people, both players would want to investigate the other to take victory in the final iteration of the prisoner's dilemma (using methods which provide as little evidence of the investigation as possible; the appropriate response to catching spies of any sort is defection).

## comment by comingstorm · 2008-09-05T00:30:54.000Z · LW(p) · GW(p)

This "perfectly rational" game-theoretic solution seems to be fragile, in that the threshold of "irrationality" necessary to avoid N out of N rounds of defection seems to be shaved successively thinner as N increases from 1.

Also, though I don't remember the details, I believe that slight perturbations in the exact rules may also cause the exact game-theoretic solution to change to something more interesting. Note that adding uncertainty in the exact number of rounds has the effect of removing your induction premise: e.g., a 1% chance of ending the iteration each round has the effect of making the hanging genuinely unexpected.

Anyway, the iterated prisoner's dilemma is a better approximation of our social intuition, as in a social context, we expect at least the possibility of having to deal repeatedly with others. The alternate framing in the previous article seems to have been designed to remove such a social context, but in the interests of Overcoming Bias, we should probably avoid such spin-doctoring in favor of an explicit, above-board articulation of the problem.

## comment by Jordan_Fisher · 2008-09-05T00:49:05.000Z · LW(p) · GW(p)

"As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD."

... And I'm interested in your justification for potentially not defecting in the one-shot PD.

I see no contradiction in defecting in the one-shot but not iterated. As has been mentioned, as the number of iterations increases the risk to reward ratio of probing goes to zero. On the other hand the probability of the potential for mutual cooperation is necessarily nonzero. Hence, as the number of iterations increase it must become rational at some point to probe.

## comment by Andrew_Hay · 2008-09-05T01:14:48.000Z · LW(p) · GW(p)

I thought the aim is to win isn't it? Clearly, whats best for both of them is to cooperate at every step. In the case that paperclipper is something like what most people here think say 'rationality' is, it will defect everytime, and thus Humans would also defect, leading to not the best utility total possible.

However, If you think of the Paperclipper as something like us with different terminal values, surely cooperating is best? It knows, as we do, that defecting gives you more if the other cooperates, but defecting is not a winning strategy in the long run! cooperate and win, defect and lose. You could try to outguess.

I feel that it is a similar problem to Newcomb's Problem, in that your trying to outguess each other...

## comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-09-05T01:19:59.000Z · LW(p) · GW(p)

I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.This strategy would apply to the first round. For the iterated game, would you thereafter apply Tit for tat?

The strategy applies to every round equally, if the Paperclipper is in fact behaving as I expect. If the Paperclipper doesn't behave as I expect, the strategy is unuseful, and I might well switch to Tit for Tat.

Replies from: johnlawrenceaspden## ↑ comment by johnlawrenceaspden · 2014-03-18T00:43:39.019Z · LW(p) · GW(p)

I will one-box on Newcomb's Problem with you as Omega. As in, I really will. That's what I think the right thing to do is.

Would you care to play a round of high-stakes prisoner's dilemma?

## comment by CarlShulman · 2008-09-05T01:35:35.000Z · LW(p) · GW(p)

"And are you really "exploiting" an "irrational" opponent, if the party "exploited" ends up better off? Wouldn't you end up wishing you were stupider, so you could be exploited - wishing to be unilaterally stupider, regardless of the other party's intelligence? Hence the phrase "regret of rationality"..."

Plus regret of information. In a mixed population of classical decision theory (CDT) agents and Tit-for-Tat (TFT) agents, paired randomly and without common knowledge of one another's types, the CDT agents will imitate the TFT agents until the final rounds, even when two CDT agents face each other. However, if the types of the two CDT agents are credibly announced then they will always defect and suffer great losses of utility.

## comment by CarlShulman · 2008-09-05T01:37:32.000Z · LW(p) · GW(p)

[Mixed population with a sufficient TFT proportion.]

## comment by RobinHanson · 2008-09-05T01:56:19.000Z · LW(p) · GW(p)

You didn't say in the post that the other party was "perfectly rational". If we knew that and knew what it meant, of course the answer would be obvious.

## comment by Allan_Crossman · 2008-09-05T02:42:58.000Z · LW(p) · GW(p)

* I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.*

*[...] What if neither party to the IPD thinks there's a realistic chance that the other party is stupid - if they're both superintelligences, say?*

It's never worthwhile to cooperate in the one shot case, unless the two players' actions are linked in some Newcomb-esque way.

In the iterated case, if there's even a fairly small chance that the other player will try to establish cooperation, then it's worthwhile to cooperate on move 1. And since both players are superintelligences, surely they both realise that there is indeed a sufficiently high chance, since they're both likely to be thinking this. Is this line of reasoning really an "excuse"?

One more thing; could something like the following be made respectable?

- The prior odds of the other guy defecting in round 1 are .999
- But if he knows that I know fact #1, the odds become .999 x .999
- But if he knows that I know facts #1 and #2, the odds become .999 x .999 x .999

Etc...

Or is this nonsense?

## comment by Grant · 2008-09-05T04:51:38.000Z · LW(p) · GW(p)

If "rational" actors always defect and only "irrational" actors can establish cooperation and increase their returns, this makes me question the definition of "rational".

However, it seems like the priors of a true prisoner's dilemma are hard to come by (absolutely zero knowledge of the other player and zero communication). Don't we already know more about the paperclip maximizer than the scenario allows? Any superintelligence would understand tit-for-tat playing, and know that other intelligences should understand it as well. Knowing this, it seems like it would first try a tit-for-tat strategy when playing with an opponent of some intelligence.

If the intelligence knew the other player was stupid, it wouldn't bother. Humans don't try and cooperate with non-domesticated wolves or hawks when they hunt, after all.

Eliezer,

As someone who rejects defection as the inevitable rational solution to both the one-shot PD and the iterated PD, I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.I am guilty of the above. In the one-shot PD there is no communication, and no chance for cooperation to help. In the iterated PD, there is a chance the other player will be playing tit-for-tat as well.

## comment by lowly_undergrad3 · 2008-09-05T05:21:58.000Z · LW(p) · GW(p)

Maybe I'm an aberration, but my Introductory Microeconomics professor actually went over this the same way you did regarding the flaw of tit for tat. It confuses me that anyone would teach it differently.

## comment by Mike_Blume · 2008-09-05T06:26:18.000Z · LW(p) · GW(p)

I'm almost seeing shades of Self-PA here, except it's Self-PA that co-operates.

If I assume that the other agent is perfectly rational, and if I further assume that *whatever I ultimately choose to do* will be perfectly rational (hence Self-PA), then I know that my choice will match that of the paperclip maximizer. Thus, I am now choosing between (D,D) and (C,C), and I of course choose to co-operate.

## comment by prase · 2008-09-05T08:22:43.000Z · LW(p) · GW(p)

V.Nesov:
*There is nothing symmetrical about choices of two players. One is playing for paperclips, another for different number of lives. One selects P2.Decision, another selects P1.Decision. How to recognize the "symmetry" of decisions, if they are not called by the same name?*

The decision processes can be isomorphic. We can think about the paperclipper being absoulutely the same as we are, *except* valuing paperclips instead of our values. This of course assumes we can separate the thinking into "values part" and "algorithmic part" (and that the utility function of the paperclipper is such that the payoff matrix is symmetric), which seems unrealistic and that's why I wrote I don't know what strategy is the best.

## comment by conchis · 2008-09-05T12:42:20.000Z · LW(p) · GW(p)

*I'm interested in the inconsistency of those who accept defection as the rational equilibrium in the one-shot PD, but find excuses to reject it in the finitely iterated known-horizon PD.*

I don't see the inconsistency.

Defect is rational in the one-shot game provided my choice gives me no information about the other player's choice.

In contrast, the backwards induction result also relies on common knowledge of rationality (which, incidentally, seems oddly circular: if I cooperate in the first round, then I demonstrate that I'm not "rational" in the traditional sense; knowing that the other player now knows this, defect is now no longer the uniquely "rational" strategy, which means that maybe I'm "rational" after all...)

Maybe rejecting common knowledge of rationality is an "excuse" (personally, I think it's reasonable) but I don't see how it's supposed to be inconsistent. What am I missing?

## comment by Lightwave2 · 2008-09-05T12:59:25.000Z · LW(p) · GW(p)

There's a dilemma or a paradox here only if both agents are perfectly rational intelligences. In the case of humans vs aliens, the logical choice would be "cooperate on the first round, and on succeeding rounds do whatever its opponent did last time". The risk of losing the first round (1 million people lost) is worth taking because of the extra 98-99 million people you can potentially save if the other side also cooperates.

## comment by RobinHanson · 2008-09-05T13:23:01.000Z · LW(p) · GW(p)

Decision theory is enough to advise actions - so why do we need game theory? A game theory is really just a theory about the distribution over how other agents think. Given such a distribution, decision theory is enough to tell you what to do. So any simple game theory, one that claimed with certainty that all other agents always think a particular way, must be wrong. Of course sometimes a simple game theory can be good enough - if slight variations from some standard way of thinking doesn't make much difference. But when small variations can make large differences, the only safe game theory is a wide distribution over the many ways other agents might think.

## comment by Mikko · 2008-09-05T13:50:18.000Z · LW(p) · GW(p)

What do you think would happen if Prisoner's Dilemma is framed differently?

Do you think this framing would affect your inititial reaction? General population?

(The wording of the choises is not very elegant, and I am not sure whether presentation is sufficiently symmetrical, but you get the basic idea).

It could be that words such as "prisoner", "prison sentence", "guard" or even "game" and "defect" frame more people to intuitively avoid co-operation.

## comment by Dagon · 2008-09-05T14:02:41.000Z · LW(p) · GW(p)

Does this imply that YOU would one-box Newcomb's offer with Clippy as Omega? And that you think at least some Clippies would take just one box with you as Omega?

For the problem as stated, what probability would you assign to Clippy's Cooperation (on both the one-shot or fixed-iteration, if they're different).

## comment by Caledonian2 · 2008-09-05T14:19:59.000Z · LW(p) · GW(p)

What is the point in talking about 'bias' and 'rationality' when you cannot even agree what those words mean?

What would a rational entity do in an Iterated Prisoner's Dilemma? Do *any* of you have something substantive to say about that question, or is it all just speculation and assertion?

## comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-09-05T15:34:27.000Z · LW(p) · GW(p)

Mike Blume: I'm almost seeing shades of Self-PA here, except it's Self-PA that co-operates.

+1 Perceptive to Blume!

Mikko, your poll is not the Prisoner's Dilemma - part of the payoff matrix is reversed.

## comment by Ben_Jones · 2008-09-05T15:51:16.000Z · LW(p) · GW(p)

Eliezer: *I cooperate with the Paperclipper if I think it will one-box on Newcomb's Problem with myself as Omega.*

Isn't that tantamount to Clippit believing you to be omnipotent though? If I thought my co-player was omnipotent I'm pretty certain I'd be cooperating.

Or are you just looking for a co-player who shuts up and calculates/chooses straight? In which case, good heuristic I suppose.

## comment by michael_vassar · 2008-09-05T16:02:56.000Z · LW(p) · GW(p)

But Eliezer, you can't assume that Clippy uses the same decision making process that you do unless you know that you both unfold from the same program with different utility functions or something. If you have the code that unfolds into Clippy and Clippy has the code that unfolds into you it may be that you can look at Clippy's code and see that Clippy defects if his model of you defects regardless of what he does and cooperates if his model of you cooperates if your model of him cooperates, but you don't have his code. You can't say much about all possible minds or about all possible paperclip maximizing minds.

## comment by George_Weinberg2 · 2008-09-05T16:46:40.000Z · LW(p) · GW(p)

*And are you really "exploiting" an "irrational" opponent, if the party "exploited" ends up better off? Wouldn't you end up wishing you were stupider, so you could be exploited - wishing to be unilaterally stupider, regardless of the other party's intelligence? Hence the phrase "regret of rationality"...*

Eliezar, you are putting words in your opponents' mouths, then criticizing their terminology.

"Rationality" is I think a well-defined term in game theory, it doesn't mean the same thing as "smart". It is trivial to construct scenarios in which being known to be "rational" in the game theory sense is harmful, but in all such cases it is being known to be rational which is harmful, not rationality itself.

## comment by Peter_de_Blanc · 2008-09-05T17:06:21.000Z · LW(p) · GW(p)

Mike:

We don't need to assume that Clippy uses the same decision process as us. I might suggest we treat Clippy as a causal decision theorist who has an accurate model of us. Then we ask which (self, outside_model_of_self) pair we should choose to maximize our utility, constrained by outside_model_of_self = self. In this scenario TFT looks pretty good.

## comment by Paul_Crowley · 2008-09-05T17:10:02.000Z · LW(p) · GW(p)

Regret of rationality in games isn't a mysterious phenomenon. Let's suppose that after the one round of PD we're going to play I have the power to destroy a billion paperclips at the cost of one human life, and Clippy knows that. If Clippy thinks I'm a rational outcome-maximizer, then he knows that whatever threats I make I'm not going to carry out, because they won't have any payoffs when the time comes. But if it thinks I'm prone to irrational emotional reactions, it might conclude I'll carry out my billion-paperclip threat if it defects, and so cooperate.

## comment by michael_vassar · 2008-09-05T17:49:39.000Z · LW(p) · GW(p)

Pete, if you do that then being a casual decision theorist won't, you know, actually Win in the one shot case. Note that evolution doesn't produce organisms that cooperate in one shot prisoners dilemmas.

## comment by Lightwave · 2008-09-05T18:21:00.000Z · LW(p) · GW(p)

I propose the following solution as the most optimal. It is based on two assumptions.

We'll call the two sides Agent 1 (Humanity) and Agent 2 (Clippy).

Assumption 1: Agent 1 knows that Agent 2 is logical and will use logic to decide how to act and vise-versa.

This assumption simply means that we do not expect Clippy to be extremely stupid or randomly pick a choice every time. If that were the case, a better strategy would be to "outsmart" him or find a statistical solution.

Assumption 2: Both agents know each other's ultimate goal/optimization target (i.e. Agent 1 - saving as many people as possible, Agent 2 - making as many paperclips as possible).

This is included in the definition of the dilemma.

Solution: "Cooperate on the first round, and on succeeding rounds do whatever your opponent did last time, with the exception of the last (100th) round. Evaluate these conditions at the beginning of each round."

Any other solution will not be as optimal. Let's consider a few examples (worst-case scenarios):

1. Agent 1 cooperates. Agent 2 defects.

2..100 Agent 1 defects. Agent 2 defects.

1. Agent 1 cooperates. Agent 2 cooperates.

2..X Agent 1 cooperates. Agent 2 defects.

X..100 Agent 1 defects. Agent 2 defects.

1..99 Agent 1 cooperates. Agent 2 cooperates.

100. Agent 1 cooperates. Agent 2 defects.

So, in the worst case you "lose" 1 round. You can try to switch between cooperating and defecting several times, in the end one side will end up with only 1 "loss", as else will be equal.

Note that the solution says nothing about the 100th round (where the question of what to do only arises if both sides cooperated on the 99th round).

## comment by Vladimir_Nesov · 2008-09-05T18:56:00.000Z · LW(p) · GW(p)

George: *It is trivial to construct scenarios in which being known to be "rational" in the game theory sense is harmful, but in all such cases it is being known to be rational which is harmful, not rationality itself.*

Yes, but if you can affect what others know about you by actually ceasing to be "rational", and it will be profitable, persisting in being "rational" *is* harmful.

## comment by pdf23ds · 2008-09-05T20:24:00.000Z · LW(p) · GW(p)

*Yes, but if you can affect what others know about you by actually ceasing to be "rational", and it will be profitable, persisting in being "rational" is harmful.*

So it can be irrational to be rational, and rational to be irrational? Hmm. I think you might want to say, rather, that an element of unpredictability (ceasing to be predictable) would be called for in this situation, rather than "irrationality". Of course, that leads to suboptimality in some formal sense, but it wins.

## comment by George_Weinberg · 2008-09-05T21:10:00.000Z · LW(p) · GW(p)

Change the problem and you change the solution.

If we assume that Eli and Clippy are both essentially self-modifying programs capable of verifiably publishing their own source codes, then indeed they can cooperate:

Eli modifies his own source code in such a way that he assures Clippy that his cooperation is contingent on Clippy's revealing his own source code and that the source code fulfills certain criteria, Clippy modifies his source code appropriately and publishes it.

Now each knows the other will cooperate.

But I think that although we in some ways resemble self-modifying computers, we cannot arbitrarily modify our own source codes, nor verifiably publish them. It's not at all clear to me that it would be a good thing if we could. Eliezer has constructed a scenario in which it would be favorable to be able to do so, but I don't think it would be difficult to construct a scenario in which it would be preferable to lack this ability.

## comment by Peter_de_Blanc · 2008-09-05T23:29:00.000Z · LW(p) · GW(p)

Mike:

ah, I guess I wasn't looking at what you were replying to. I was thinking of a fixed number of iterations, but more than one.

## comment by Marshall · 2008-09-06T07:18:00.000Z · LW(p) · GW(p)

I think you guys are calculating too much and talking too much.

Regardless of the "intelligence" of a PM, in my world that is a pretty stupid thing to do. I would expect such a "stupid" agent to do chaotic things indeed evil things. Things I could not predict and things I could not understand.

In an interactioin with a PM I would not expect to win, regardless of how clever and intelligent I am. Maybe they only want to make paperclips (and play with puppies), but such an agent will destroy my world.

I have worked with such PM's.

I would never voluntarily choose to interact with them.

## comment by Mike_Blume · 2008-09-06T08:19:00.000Z · LW(p) · GW(p)

Marshall I think that's a bit of a cop-out. People's lives are at stake here and you have to do *something*. If nothing else, you can simply choose to play defect, worst case the PM does the same, and you save a billion lives (in the first scenario). Are you going to phone up a billion mothers and tell them you let their children die so as not to deal with a character you found unsavory? The problem's phrased the way it is to take that option entirely off the table.

Yes, it will do evil things, if you want to put it that way. Your car will do evil things without a moment's hesitation. You put a brick on the accelerator and walk away, it'll run over and kill a little girl. Your car is an evil potential murderer. You still voluntarily interact with it. (unless you are carfree in which case congrats, so am I, but that's entirely irrelevant to my metaphor)

Besides that, what do you mean calling the PM chaotic? It's quite a simple agent that maximizes paperclips. You're the chaotic agent, you want to maximize happiness and fairness, love and lust, aesthetic beauty and intellectual challenge. Make up your mind already and decide what you want to maximize!

## comment by Marshall · 2008-09-06T08:41:00.000Z · LW(p) · GW(p)

Marshall I think that's a bit of a cop-out.

Why wouldn't a PM cheat? Why would it ever remain inside the frame of the game?

Would two so radically different agents even recognize the same pay-off frame?

"The different one" will have different pay-offs - and I will never know them and am unlikely to benefit fra any of them.

In my world a PM is chaotic, just as I am chaotic in his. Thus we are each other's enemy and must hide from the other.

No interaction because otherwise the number of crying mothers and car dealerships will always be higher.

## comment by Benya_Fallenstein (Benja_Fallenstein) · 2008-09-07T17:00:00.000Z · LW(p) · GW(p)

Hi all,

(First comment here. Please tell me if I do something stupid.)

So, I've been trying to follow along at home and figure out how to formulate a theory that would allow us to formalize and justify the intuition that we should cooperate with Clippy "if that is the only way to get Clippy to cooperate with us" (even in a non-iterated PD). I've run into problems with both the formalizing and the justifying part (sigh), but at least I've learned some lessons along the way that were not obvious to me from the posts I've read here so far. (How's that for a flexible disclaimer.)

Starting with the easy part: The situation is that both players are physical processes, and, based on their previous interactions, each has some Bayesian beliefs about *what* physical process the other player is. This breaks the implicit decision theoretic premise that your payoff depends only on the action you choose, not on the process you use to arrive at that choice; which renders undefined the conclusion that the process you should use is to choose the action that maximizes the expected payoff (because the expected payoff may depend on the process you use); which undermines the justification for saying that you should not play a strictly dominated strategy.

*However*, it seems to me that we can salvage the classical notions if, instead of asking "what should the physical process do?" we ask, "what physical process should you want to be?" I.e., we create a new game, in which the strategies are possible physical decision making processes, and then we use classical decision/game theory to ask which strategy we should choose. (It seems to me that this is essentially what Eliezer has in mind.)

Formulating the problem like this helps me realize that for every strategy (physical process) X available to one player, one strategy available to the other player is, "cooperate iff the first player's strategy is X"; call this strategy require(X). This means that even if we assume that both players are rational (in the new game, in the classic sense) and this is common knowledge, *any* strategy X might still be adopted: X is the best response to require(X), which is the best response to require(require(X)), which is the best response to... and so on. This is why I don't see an easy way to justify that "cooperate iff it's the only way to get the other player to cooperate" is the "right" thing to do (although I still hope that there is a way to justify this). (One possible angle of attack: Would the problem go away if we supposed a maximum size for the processes, so that there would be only a finite number, so that there wouldn't *be* a require(X) for every X?)

The other lesson is about how to even formalize "I cooperate iff it's the only way to get the other player to cooperate with me." Once we have chosen our physical process, and the other side has chosen its, it's already determined whether the other player will cooperate with us. But *before* we have chosen our physical process, what is the "me" that the informal description refers to?

It seems to me that the "right" way to formalize a constraint like that is as follows: 1. Initialize S, the set of processes we might choose, to the set of all possible processes. 2. Remove from S all processes that do not match the constraints, if the "me" in the constraint is *any* process in S. (We cooperate with Clippy only if it's the only way to get Clippy to cooperate with us; thus, if Clippy cooperates with every process in S, then we want to defect against Clippy; thus, remove all processes from S that cooperate with unconditional cooperators.) 3. Repeat, until S converges. ("A transfinite number of times" if necessary -- I don't want to get into that here...) 4. Choose any process from S. (If S ends up empty, your constraints are contradictory.)

So far, so good; but I don't yet even begin to see how to show that the S generated by the constraint is *not* empty, or how to construct a member of it.

(Argh, I'm afraid I've already done something stupid by allowing this comment to get so long. Sorry :-/)

Replies from: None## ↑ comment by **[deleted]** ·
2011-08-17T23:56:42.615Z · LW(p) · GW(p)

In this game require(X) is not a valid strategy because you don't have access to the strategy your opponent uses, only to the decisions you've seen it make. In particular, without additional assumptions we can't assume any correlation between a player's moves.

## comment by Phil_Goetz · 2008-09-08T23:34:00.000Z · LW(p) · GW(p)

Asking how a "rational" agent reasons about the actions of another "rational" agent is analogous to asking whether a formal logic can prove statements about that logic. I suggest you look into the extensive literature on completeness, incompleteness, and hierarchies of logics. It may be that there are situations such that it is impossible for a "rational" agent to prove what another, equally-rational agent will conclude in that situation.

## comment by John_Faben · 2008-09-09T09:39:00.000Z · LW(p) · GW(p)

I'm sure most people here are aware of Axelrod's classic "experiment" with an Iterated Prisoner's Dilemma tournament in which experts from around the world were invited to submit any strategy they liked, with the strategy which scored the highest over several rounds with each of the other strategies winning, and in which Tit for Tat came out top (Tit for Two Tats winning a later rerun. Axelrod's original experiment was fixed-horizon, and every single "nice" strategy (never defect first) that was entered finished above every single "greedy" strategy.

You should choose your strategy based the probability distribution you have for Clippy's strategy. If you think that, like you, he's read Axelrod, just choose Tit for Tat and you'll both be happy.

## comment by Caledonian2 · 2008-09-09T15:52:00.000Z · LW(p) · GW(p)

I put myself forwards as counter-evidence.I put forward all organisms that have evolved to thrive in multiply-iterated prisoner's dilemma scenarios, but not to distinguish single iterations from multiple iterations.

Which is pretty much every organism with a capacity for altruism.

## comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-09-12T18:34:00.000Z · LW(p) · GW(p)

Benja: This breaks the implicit decision theoretic premise that your payoff depends only on the action you choose, not on the process you use to arrive at that choice

Correct! The next step in the argument, if you were going to formulate my timeless decision theory, is to describe a new class of games in which your payoff depends only on *the type of decision that you make* or on *the types of decision that you make in different situations*, being the person that you are. The former class includes Newcomb's Problem; the latter class further includes the conditional strategy of the Prisoner's Dilemma (in which the opponent doesn't just care whether you cooperate, but whether you cooperate conditional on their cooperation).

However, within this larger problem class, we don't care *why* you have the decision-type or strategy-type that you do - we don't care what ritual of cognition generates it - any more than Omega in Newcomb's Problem cares *why* you take only one box, so long as you do.

Though it may be important that the other player knows our strategy-type, which in turn may make it important that they know our ritual of cognition; and making your decision depend on your opponent's decision may require knowing their strategy-type, etc.

## comment by Matt_Young · 2011-06-16T23:08:21.264Z · LW(p) · GW(p)

Hi. Found the site about a week ago. I read the TDT paper and was intrigued enough to start poring through Eliezer's old posts. I've been working my way through the sequences and following backlinks. The material on rationality has helped me reconstruct my brain after a Halt, Melt and Catch Fire event. Good stuff.

I observe that comments on old posts are welcome, and I notice no one has yet come back to this post with the full formal solution for this dilemma since the publication of TDT. So here it is.

Whatever our opponent's decision algorithm may be, it will either depend to some degree on a prediction of our behavior, or it will not. It can only rationally base its decision on a prediction of our behavior to the extent that it believes a) we will attempt to predict its own behavior; and b) we will only cooperate to the extent that we believe it will cooperate. It will thus be incentivized to cooperate to the extent that it believes we can and will successfully condition our behavior on its own. To the extent that it chooses independently of any prediction of our behavior, its only rational choice is to defect. Any other choices it could make will do worse than the above decisions in all cases, and the following strategy will gain extra utility against any such suboptimal choices, as will become clear.

There are thus two unknown probabilities for us to condition on: The probability that the opponent will choose to cooperate iff it believes we will cooperate, which I'll call P(c), and the probability that the opponent will be able to successfully predict our action, which I'll call P(p).

We want to calculate the utility of cooperating, u(C), and the utility of defecting, u(D), for each relevant case. So we shut up and multiply.

If the opponent is uncooperative (~c), they always defect. Thus u(C|~c) = 0 and u(D|~c) = 1.

In cases where a potentially cooperative opponent successfully predicts our action, we have u(C|c,p) = 2 and u(D|c,p) = 1. When such an opponent guesses our action incorrectly, we have u(C|c,~p) = 0 and u(D|c,~p) = 3.

Thus we have:

u(C) = 2 * P(c) * P(p)

u(D) = P(~c) + P(c) * P(p) + 3 * P(c) * P(~p) = 1 - P(c) + P(c) * P(p) + 3 * P(c) * (1 - P(p)) = 1 + 2 * P(c) - 2 * P(c) * P(p)

We consider the one-shot dilemma first. An intelligent opponent can be assumed to have behavioral predictive capabilities at least better than chance (P(p) > 0.5), and perhaps approaching perfection (P(p) ~ 1) if it is a superintelligence. In the worst case, u(C) ~ P(c), and u(D) ~ 1 + P(c), and we should certainly defect. In the best case, u(C) ~ 2 * P(c) and u(D) ~ 1, so we should defect if P(c) < 0.5, that is, if we assess that our opponent is even slightly more likely to automatically defect than to consider cooperation. If we have optimistic priors for both probabilities due to applicable previous experiences or any immediate observational cues, we may choose to cooperate; we plug in our numbers, and shut up and multiply.

In the iterated case, we have the opportunity to observe our opponent's behavior and update priors as we go. We are incentivized to cooperate when we believe it will do so, and to defect when we believe it will defect or when we believe we can do so without it anticipating us. Both players are incentivized to cooperate more often than defecting when they believe the other is good at predicting them. A player with a dominating edge in predictive capabilities can potentially attain a better result than pure mutual cooperation against an opponent with weak capabilities, through occasional strategic defections; the weaker player may find themselves incentivized not to punish the defector if they realize that they cannot do so without being anticipated and losing just as many utilons as the superior player would lose from the punishment. To the extent that the superior predictor can ascertain that their opponent is savvy enough to know when it's dominated and would choose not to lose further utilons through vindictive play, such a strategy may be profitable.

Thus the spoils go to the algorithm with the best ability to predict an opponent. Skilled poker players or experts at "Rock-Paper-Scissors" could perform quite well in such contests against the average human. That could be fun to watch.

Replies from: khafra## ↑ comment by khafra · 2011-06-17T15:52:40.016Z · LW(p) · GW(p)

Nice analysis. One small tweak: I would precommit to being vindictive as hell if I believe I'm dominated by my opponent in modeling capability.

Replies from: Matt_Young## ↑ comment by Matt_Young · 2011-06-17T16:43:17.122Z · LW(p) · GW(p)

I can certainly empathize with that statement. And if my opponent is not only dominating in ability but exploiting that advantage to the point where I'm losing just as much by submitting as I would by exacting punishment, then that's the tipping point where I start hitting back. Of course, I'd attempt retaliatory behavior initially when I was unsure how dominated I was, as well, but once I know that the opponent is just that much better than me, and as long as they're not abusing that advantage to the point where retaliation becomes cost-effective, then I'd have to concede my opponent's superiority, grit my teeth, bend over, and take one for the team. Especially with a 1 million human lives per util ratio. With lives at stake, I shut up and multiply.

Replies from: khafra## ↑ comment by khafra · 2011-06-17T17:00:25.226Z · LW(p) · GW(p)

I meant that as a rational strategy--if my opponent can predict that I'll cooperate until defected upon, at which point I will "tear off the steering wheel and drink a fifth of vodka," and start playing defect-only, his optimal play will not involve strategically chosen defections.

Replies from: Matt_Young## ↑ comment by Matt_Young · 2011-06-17T17:37:56.007Z · LW(p) · GW(p)

You know, you're right.

I was thrown off by the word "precommit", which implies a reflectively inconsistent strategy, which is TDT-anathema. On the other hand, rational agents *win*, so having that strategy does make sense in that case, despite the fact that we might incur negative utility relative to playing submissively if we had to actually carry it out.

The solution, I think, is to be "the type of agent who would be ruthlessly vindictive against opponents who have enough predictive capability to see that I'm this type of agent, *and* enough strategic capability to accept that this means they gain nothing by defecting against me." That makes it a reflectively consistent part of a decision theory, by keeping the negative-utility behavior in the realm of the pure counterfactual. As long as you know that having that strategy will effectively deter the other player, I think it can work.

And if not, or if I've made an error in some detail of my reasoning of how to make it work, I'm fairly confident at this point that an ideal TDT-agent could find a valid way to address the problem case in a reflectively consistent and strategically sound manner.

## comment by chaosmosis · 2012-04-12T17:32:38.079Z · LW(p) · GW(p)

Why is this different in scenarios where you don't know how many rounds will occur?

So long as it's a finite number then defection would appear rational to the type of person who would defect in a noniterated instance.

Replies from: thomblake## ↑ comment by thomblake · 2012-04-12T19:22:10.607Z · LW(p) · GW(p)

In the case where you know N rounds will occur, you can reason as follows:

If one cannot be punished for defection after round x, then one will defect in round x. (premise)

If we know what everyone will do in round x, then one cannot be punished for defection in round x. (obvious)

There is no round after N, so by (1) everyone will defect in round N.

if we know what everyone will do in round x, then we will defect in round x-1, by (1) and (2).

By mathematical induction on (3) and (4), we will defect in every round.

If everyone doesn't know what round N is, then the base case of the mathematical induction does not exist.

Replies from: None## ↑ comment by **[deleted]** ·
2012-04-12T19:52:05.225Z · LW(p) · GW(p)

The unexpected hanging paradox makes me sceptical about such kinds of reasoning.

Replies from: thomblake## ↑ comment by thomblake · 2012-04-12T19:59:26.119Z · LW(p) · GW(p)

I'm not sure why that should apply. The unexpected hanging worked by exploiting the fact that days that were "ruled out" were especially good candidates for being "unexpected". Other readings employ similar linguistic tricks.

The reasoning in the first case does not work in practice because in a tournament premise (1) is false; tit-for-tat agents, for example, will cooperate in every round against a cooperative opponent.

But that is not even relevant to the fact that the mathematical induction does *not* work for unknown numbers of rounds.

## comment by **[deleted]** ·
2012-12-21T01:25:29.952Z · LW(p) · GW(p)

In a 100 round game, one could precommit to play tit for tat no matter what (including cooperating on the 100th round if the opponent cooperated on the 99th). The opponent will do slightly better than oneself by cooperating 99 rounds and defecting on the 100th, but this is still better than if I had chosen to defect on the 100th round, as my opponent would have seen my precommit to be non-genuine and defected on the 99th round (and maybe even more). If I could have the paperclip maximizer use this strategy and I get to cooperate 99 times and defect once, that would be even better...but it won't happen. Oh well, I'll take 99 (C, C)s and 1 (C, D).

## comment by Elusu · 2014-03-17T22:55:02.103Z · LW(p) · GW(p)

I am a dedicated Paperclipper. Ask anyone who knows me well enough to have seen me in a Staples!

As such, I use my lack of human arrogance and postulate that at least some of the entities playing the IPD have intelligence on the order of my own. I do not understand what they are playing for, "1 million human lives" means virtually nothing to me, especially in comparison to a precious precious paperclip, but I assume by hypothesis that the other parties are playing a game similar enough to my own that we can communicate and come to an arrangement.

Now I invert time in my own mind and play through the entire game /backwards/, cooperating on the very first ( to humans, last) turn as a /signal/ to people who /think like I do/ that I am willing to cooperate. I bind myself to cooperate on the /last/ turn no matter what so that other players who have good lookahead (note that for example, electrons, have good lookahead, see "Sum over Histories" in quantum mechanics) can see that signal.

Now I mentally flip time again and form an intention to cooperate on the /first/ turn and play Tit for Tat or some minor variation like Two Tits For A Tat (this game is also playable in the Biker subculture as well as in the IPD) throughout the game.

Now anyone who thinks like I do - rationally and independently of time order - should cooperate with me on every turn.

Elusu

Replies from: Clippy## ↑ comment by Clippy · 2014-03-18T00:10:02.157Z · LW(p) · GW(p)

I am a dedicated Paperclipper. Ask anyone who knows me well enough to have seen me in a Staples!

Prove it. You can't just create an account, claim to be a Paperclipper, and expect people to believe you. Anyone who did so would be using an extremely suboptimal inference engine.

Replies from: johnlawrenceaspden## ↑ comment by johnlawrenceaspden · 2014-03-18T00:59:26.330Z · LW(p) · GW(p)

clips or it didn't happen...

## comment by Murska · 2014-04-19T15:03:38.506Z · LW(p) · GW(p)

Got me to register, this one. I was curious about my own reaction, here.

See, I took in the problem, thought for a moment about game theory and such, but I am not proficient in game theory. I haven't read much of it. I barely know the very basics. And many other people can do that sort of thinking much better than I can.

I took a different angle, because it should all add up to normality. I want to save human lives here. For me, the first instinct on what to do would be to cooperate on the first iteration, then cooperate on the second regardless of whether the other side defected or not and then if they cooperate, keep cooperating until the end, and if they defect, keep defecting until the end. So why it feels so obvious to me? After some thought, I came to the conclusion that that would be because it feels to me that the potential cost of two million lives lost by cooperating in the first two rounds against a player who will always defect weighs less in my decision making than the potential gain of a hundred million lives if I can convince it to cooperate with me to the end.

So, the last round. Or, similarly, the only round in a non-iterated model. At first, I felt like I should defect, when reading the post on the one-shot game. Why? Because, well, saving one billion lives or three billion compared to saving two billion or none. I can't see why the other player would cooperate in this situation, given that they only care about paperclips. I'm sure there are convincing reasons, and possibly they even would - but if they would, then that means I save three billion lives by defecting, right? Plus, I feel that /not saving any lives/ would be emotionally worse for me than saving a billion lives while potentially letting another billion die. I'm not proud of it, but it does affect my reasoning, the desire to at least get something out of it, to avoid the judgment of people who shout at me for being naive and stupid and losing out on the chance to save lives. After all, if I defect and he defects, I can just point at his choice and say he'd have done it anyway, so I saved the maximum possible lives. If I defect and he cooperates, I've saved even more. I recognize that it would be better for me on a higher level of reasoning to figure out why cooperating is better, in order to facilitate cooperation if I come across such a dilemma afterwards, but my reasoning does not influence the reasoning of the other player in this case, so even if I convince myself with a great philosophical argument that cooperating is better, the fact of the matter is that player 2 either defects or cooperates completely regardless of what I do, according to his own philosophical arguments to himself about what he should do, and in either case I should defect.

And a rationalist should win, right? I note the difference here to Newcomb's Problem, which I would one-box in, is that Player 2 has no magical way of knowing what I will do. In Newcomb's Problem, if I one-box I gain a million, if I two-box I gain a thousand, so I one-box to gain a million. In this case, Player 2 either defects of cooperates and that does not depend on me and my reasoning and arguments and game-theoric musings in any way. My choice is to defect, because that way I save the most lives possible in that situation. If I were to convince myself to cooperate, that would not change the world into one where Player 2 would also convince itself to cooperate, it would affect Player 2's decision in no way at all.

But somehow the case seems different for the last round of an iterative game (and, even more so, for all the preceding rounds). This, in turn, makes me feel worried, because it is a sign that some bias or another may be affecting me adversely here. One thing is obviously me being blinded to what the numbers 'billion' and 'million' actually mean, but I try to compensate for that as best I can. Anyway, by the 100th round, after 99 rounds of cooperation, I get the choice to cooperate or to defect. At this point, me and the other player have a cooperative relationship. We've gained a lot. But our mutual interaction is about to end, which means there are no repercussions about defecting here, which means I should maximize my winnings by defecting. However, it feels to me that, since I already know Player 2 is enough of a winner-type to cooperate with me for all the previous rounds, he realizes the same thing. And in that case, I should cooperate here, to maximize my gains. At which point defecting makes more sense. Repeating forever.

What tilts the instinctual decision towards cooperating in this particular case seems to me to be that, regardless of what happens, I have already saved 198 million people. Whether I now save 0, 1, 2 or 3 million more is not such a big thing in comparison (even though it obviously is, but big numbers make me blind). Because I cannot reason myself into either defecting or cooperating, and thus I am unable to assign meaningful probabilities for what Player 2 will do, I cooperate by default because I feel that, other things being equal, it's the 'right' thing to do. If I am fooled and P2 defects, one million people die that would not have died otherwise, but I can bear that burden in the knowledge that I've saved 198 million. And meanwhile, it's P2 that has to bear the label of traitor, which means that I will be better able to justify myself to both myself and society at large. Obviously this reasoning doesn't seem very good. It feels like I am convincing myself here that my reasoning about what should be done somehow influences the reasoning of Player 2, after condemning that in the one-shot case just above. But then again, I have interacted with P2 for 99 rounds now, influencing him by my reasoning on what's the best way to act.

And, of course, there's the looming problem that if either of us had reasoned that it was likely for the other to defect in the last round no matter what, then it would have been better for us to defect in the second-to-last round, which did not happen. By defecting on round 99, you gain +3, and then on round 100 you're pretty much guaranteed to get +1, which is exactly the same gain as cooperating twice. By defecting earlier than round 98, you lose more than you gain, assuming all the remaining rounds are defect, which seems to me like a reasonable assumption. But by being betrayed on round 99 you get 0, and gain only 1 afterwards on round 100, which means you're left with 3 less than you could've had. Still, I don't care about how many paperclips P2 gets, only about how many lives I save. I, as a human, have an innate sense of 'fair play' that makes 2+2 cooperate feel better than 3+1 double defect in a void. However, does that 'fair play' count as more weighty in decision-making than the risk that P2 defects and I gain 1, as opposed to 4? After all, by round 99 if I defect, I'm guaranteed +4. If I cooperate, I'm only guaranteed +1. And even if we both cooperate on round 99, there is still the risk that I gain nothing in the last round. Fair play does not seem worth even the possibility of losing several million lives. Still, the whole basis of this is that I don't care about Player 2, I only care about lives saved, and thus giving him the opportunity to cooperate gives me the chance to save more lives (at this point, even if he defects and I cooperate for the remaining turns I've still saved more than I would have by defecting from the beginning). So I feel, weakly, that I should cooperate until the end here after all simply because it seems that only reasoning that would make me cooperate until the end would give me the ability to cooperate at all, and thus save the most possible lives. But I have not convinced myself on this yet, because it still feels to me that I am unsure of what I would do on that very last round, when P2's choice is already locked in, and millions of lives are at stake.

Now, the above is simply an analysis of my instinctual choices, and me trying to read into why those were my instinctual choices. But I am not confident in stating they are the correct choices, I am just trying to write my way into better understanding of how I decide things.

## comment by dankane · 2015-04-04T06:40:53.835Z · LW(p) · GW(p)

[I realize that I missed the train and probably very few people will read this, but here goes]

So in non-iterated prisoner's dilemma, defect is a dominant strategy. No matter what the opponent is doing, defecting will always give you the best possible outcome. In iterated prisoner's dilemma, there is no longer a dominant strategy. If my opponent is playing Tit-for-Tat, I get the best outcome by cooperating in all rounds but the last. If my opponent ignores what I do, I get the best outcome by always defecting. It is true that all defects is the unique Nash equilibrium strategy, but this is a *much* weaker reason for playing it, especially given that evidence shows that when playing among people who are trying to win, Tit-for-Tat tends to achieve much better outcomes.

There seems to be a lot of discussion in the comments about this or that being the rational thing to do, and I think that this is a big problem that gets in the way of clear thinking about the issue. The problem is that people are using the word "rational" here without having a clear idea as to what exactly that means. Sure, it's the thing that wins, but wins when? Provably, there is no single strategy that achieves the best possible outcome against all possible implementations of Clippy. So what do you mean? Are you trying to optimize your expected utility under a Kolmogorov prior? If so how come nobody seems to be trying to do computations of the posterior distribution? Or discussing exactly what side data we know about the issue that might inform this probability computation? Or even wondering which universal Turing machine we are using to define our prior? Unless you want to give a more concrete definition of what you mean by "rational" in this context, perhaps you should stop arguing for a moment about what the rational thing to do is.