Newcomb's Problem as an Iterated Prisoner's Dilemma

daniel-amdurer

Newcomb's Problem as an Iterated Prisoner's Dilemma

post by Jerdle (daniel-amdurer) · 2022-01-05T22:48:57.661Z · LW · GW · 12 comments

12 comments

The intuitively paradoxical aspect of Newcomb's Problem is that it contains a loop. Your decision determines Omega's prediction, which determines the options you decide between. There are many decision theories that solve this problem, usually leading to one-boxing. However, these decision theories are less than intuitive because of their acausal nature. This is a hopefully more intuitive analysis of Newcomb's Problem that unrolls the loop and reveals the Iterated Prisoner's Dilemma within.

First, unroll the loop. To do this, have Omega run purely on past data. To keep the high quality of the prediction, the past data must also determine your decision at that point. This is easiest to implement with a bot (in the sense of PrudentBot, but much simpler). Specifically, you choose your decision, and that sets up a bot that only performs that decision. After unrolling, we now have an iterated game, as the dependence on the same round has been replaced with one on a previous round.

Now, from your perspective, your income is as follows.

	Fill B	Leave B empty
Take A and B	$1001000	$1000
Take B	$1000000	$0

Now, relabelling.

	Cooperate	Defect
Cooperate	$1000000	$0
Defect	$1001000	$1000

This is just the Prisoner's Dilemma! And as it was iterated by unrolling, it is the Iterated Prisoner's Dilemma.

Omega plays TitForTat, while you can choose between CooperateBot and DefectBot. As CooperateBot performs better against TitForTat than DefectBot, you're best off cooperating. As cooperating corresponds to one-boxing, you should one-box.

Rerolling the loop, you should one-box in the original problem.

12 comments

Comments sorted by top scores.

comment by Shmi (shminux) · 2022-01-06T02:39:18.307Z · LW(p) · GW(p)

A converse statement has been discussed over 50 years ago https://www.jstor.org/stable/2265034

Replies from: daniel-amdurer

↑ comment by Jerdle (daniel-amdurer) · 2022-01-06T02:47:31.362Z · LW(p) · GW(p)

Thought the connection seemed obvious enough that I couldn't be the first to see it! Although there are some differences. Lewis sees the one-shot PD as a really weak Newcomb (weak as in the predictor is inaccurate), while I see the iterated PD as equivalent to a far stronger Newcomb.

comment by Gunnar_Zarncke · 2022-01-07T22:25:52.921Z · LW(p) · GW(p)

You should specify both payoffs. Otherwise, it is hard to compare to the real PD.

Replies from: daniel-amdurer

↑ comment by Jerdle (daniel-amdurer) · 2022-01-08T19:28:05.825Z · LW(p) · GW(p)

It doesn't actually matter. We already know Omega's strategy choice, and it can't be changed.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-01-08T20:45:35.646Z · LW(p) · GW(p)

Yes, but it doesn't make sense to call it prisoners dilemma if it doesn't fit that payoff matrix.

Replies from: daniel-amdurer

↑ comment by Jerdle (daniel-amdurer) · 2022-01-08T21:21:46.374Z · LW(p) · GW(p)

It is half of an iterated PD, and the other half is invisible to you.

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-01-08T21:53:09.583Z · LW(p) · GW(p)

I guess you refer to the class of games having a payoff matrix. I was referring to a specific instance. See https://www.lesswrong.com/posts/KwbJFexa4MEdhJbs4/classifying-games-like-the-prisoner-s-dilemma [LW · GW]

comment by Jerdle (daniel-amdurer) · 2025-02-04T15:27:20.019Z · LW(p) · GW(p)

I have found a separate post that firms up the dodgy second paragraph. I would have linked to it, but while there's a time loop involved in the theory, there wasn't enough of one to link to a post written in 2024 in my post in 2022.

In Strategic Time, Open-Source Games Are Loopy [? · GW]

comment by JBlack · 2022-01-07T06:08:26.599Z · LW(p) · GW(p)

The essence of Prisoner's Dilemma is that it is symmetric, and both players individually have incentive to defect if the other cooperates. How does Omega gain from defecting if you cooperate? Or indeed, how does Omega gain or lose at all?

Replies from: Viliam

↑ comment by Viliam · 2022-01-07T14:29:33.553Z · LW(p) · GW(p)

Not sure whether this makes sense, but maybe Omega gets 1 utility if it correctly predicts your behavior (in the unrolled version), and 0 utility otherwise?

comment by Zach Stein-Perlman · 2022-01-06T00:50:21.463Z · LW(p) · GW(p)

I don't understand the second paragraph.

I buy (what I understand of) this if Omega makes its prediction by simulating you (and not if it makes its prediction by, say, scanning your DNA).

Replies from: daniel-amdurer

↑ comment by Jerdle (daniel-amdurer) · 2022-01-06T09:46:15.963Z · LW(p) · GW(p)

The second paragraph is a bit handwavey. It's basically the bit that turns Newcomb into an iterated game. As there's this causal loop, it can be unlooped by converting into an iterated game, and using your action in the previous round as a proxy for your action in that round. So Omega plays based on your previous action, which is the same as your next one.

Newcomb's Problem as an Iterated Prisoner's Dilemma

Contents

12 comments