Why one-box?

philosophystudent

Why one-box?

post by PhilosophyStudent · 2013-06-30T02:38:00.967Z · LW · GW · Legacy · 99 comments

99 comments

I have sympathy with both one-boxers and two-boxers in Newcomb's problem. Contrary to this, however, many people on Less Wrong seem to be staunch and confident one-boxers. So I'm turning to you guys to ask for help figuring out whether I should be a staunch one-boxer too. Below is an imaginary dialogue setting out my understanding of the arguments normally advanced on LW for one-boxing and I was hoping to get help filling in the details and extending this argument so that I (and anyone else who is uncertain about the issue) can develop an understanding of the strongest arguments for one-boxing.

One-boxer: You should one-box because one-boxing wins (that is, a person that one-boxes ends up better off than a person that two-boxes). Not only does it seem clear that rationality should be about winning generally (that a rational agent should not be systematically outperformed by irrational agents) but Newcomb's problem is normally discussed within the context of instrumental rationality, which everyone agrees is about winning.

Me: I get that and that's one of the main reasons I'm sympathetic to the one-boxing view but the two-boxers has a response to these concerns. The two-boxer agrees that rationality is about winning and they agree that winning means ending up with the most utility. The two-boxer should also agree that the rational decision theory to follow is one that will one-box on all future Newcomb's problems (those where the prediction has not yet occurred) and can also agree that the best timeless agent type is a one-boxing type. However, the two-boxer also claims that two-boxing is the rational decision.

O: Sure, but why think they're right? After all, two-boxers don't win.

M: Okay, those with a two-boxing agent type don't win but the two-boxer isn't talking about agent types. They're talking about decisions. So they are interested in what aspects of the agent's winning can be attributed to their decision and they say that we can attribute the agent's winning to their decision if this is caused by their decision. This strikes me as quite a reasonable way to apportion the credit for various parts of the winning. (Of course, it could be said that the two-boxer is right but they are playing a pointless game and should instead be interested in winning simpliciter rather than winning decisions. If this is the claim then the argument is dissolved and there is no disagreement. But I take it this is not the claim).

O: But this is a strange convoluted definition of winning. The agent ends up worse off than one-boxing agents so it must be a convoluted definition of winning that says that two-boxing is the winning decision.

M: Hmm, maybe... But I'm worried that relevant distinctions aren't being made here (you've started talking about winning agents rather than winning decisions). The two-boxer relies on the same definition of winning as you and so agrees that the one-boxing agent is the winning agent. They just disagree about how to attribute winning to the agent's decisions (rather than to other features of the agent). And their way of doing this strikes me as quite a natural one. We credit the decision with the winning that it causes. Is this the source of my unwillingness to jump fully on board with your program? Do we simply disagree about the plausibility of this way of attributing winning to decisions?

Meta-comment (a): I don't know what to say here? Is this what's going on? Do people just intuitively feel that this is a crazy way to attribute winning to decisions? If so, can anyone suggest why I should adopt the one-boxer perspective on this?

O: But then the two-boxer has to rely on the claim that Newcomb's problem is "unfair" to explain why the two-boxing agent doesn't win. It seems absurd to say that a scenario like Newcomb's problem is unfair.

M: Well, the two-boxing agent means something very particular by "unfair". They simply mean that in this case the winning agent doesn't correspond to the winning decision. Further, they can explain why this is the case without saying anything that strikes me as crazy. They simply say that Newcomb's problem is a case where the agent's winnings can't entirely be attributed to the agent's decision (ignoring a constant value). But if something else (the agent's type at time of prediction) also influences the agent's winning in this case, why should it be a surprise that the winning agent and the winning decision come apart? I'm not saying the two-boxer is right here but they don't seem to me to be obviously wrong either...

Meta-comment (b): Interested to know what response should be given here.

O: Okay, let's try something else. The two-boxer focuses only on causal consequences but in doing so they simply ignore all the logical non-causal consequences of their decision algorithm outputting a certain decision. This is an ad hoc, unmotivated restriction.

M: Ah hoc? I'm not sure I see why. Think about the problem with evidential decision theory. The proponent of EDT could say a similar thing (that the proponent of two-boxing ignores all the evidential implications of their decision). The two-boxer will respond that these implications just are not relevant to decision making. When we make decisions we are trying to bring about the best results, not get evidence for these results. Equally, they might say, we are trying to bring about the best results, not derive the best results in our logical calculations. Now I don't know what to make of the point/counter-point here but it doesn't seem to me that the one-boxing view is obviously correct here and I'm worried that we're again going to end up just trading intuitions (and I can see the force of both intuitions here).

Meta-comment: Again, I would love to know whether I've understood this argument and whether something can be said to convince me that the one-boxing view is the clear cut winner here.

End comments: That's my understanding of the primary argument advanced for one-boxing on LW. Are there other core arguments? How can these arguments be improved and extended?

99 comments

Comments sorted by top scores.

comment by Qiaochu_Yuan · 2013-06-30T02:59:41.650Z · LW(p) · GW(p)

Two-boxers think that decisions are things that can just fall out of the sky uncaused. (This can be made precise by a suitable description of how two-boxers set up the relevant causal diagram; I found Anna Salamon's explanation of this particularly clear.) This is a view of how decisions work driven by intuitions that should be dispelled by sufficient knowledge of cognitive and / or computer science. I think acquiring such background will make you more sympathetic to the perspective that one should think in terms of winning agent types and not winning decisions.

I also think there's a tendency among two-boxers not to take the stakes of Newcomb's problem seriously enough. Suppose that instead of offering you a million dollars Omega offers to spare your daughter's life. Now what do you do?

Replies from: framsey, PhilosophyStudent, None, buybuydandavis, Dan_Moore

↑ comment by framsey · 2013-07-01T16:25:02.018Z · LW(p) · GW(p)

Two-boxers think that decisions are things that can just fall out of the sky uncaused.

But don't LW one-boxers think that decision ALGORITHMS are things that can just fall out of the sky uncaused?

As an empirical matter, I don't think humans are psychologically capable of time-consistent decisions in all cases. For instance, TDT implies that one should one-box even in a version of Newcomb's in which one can SEE the content of the boxes. But would a human being really leave the other box behind, if the contents of the boxes were things they REALLY valued (like the lives of close friends), and they could actually see their contents? I think that would be hard for a human to do, even if ex ante they might wish to reprogram themselves to do so.

Replies from: None, None, ChristianKl

↑ comment by [deleted] · 2013-07-05T12:18:00.603Z · LW(p) · GW(p)

For instance, TDT implies that one should one-box even in a version of Newcomb's in which one can SEE the content of the boxes. But would a human being really leave the other box behind, if the contents of the boxes were things they REALLY valued (like the lives of close friends), and they could actually see their contents?

Probably not, and thus s/he would probably never see the second box as anything but empty. His/her loss.

↑ comment by [deleted] · 2013-07-05T12:16:53.107Z · LW(p) · GW(p)

For instance, TDT implies that one should one-box even in a version of Newcomb's in which one can SEE the content of the boxes. But would a human being really leave the other box behind, if the contents of the boxes were things they REALLY valued (like the lives of close friends), and they could actually see their contents?

Probably not, and thus they would probably never see the second box as anything but empty. His/her loss.

↑ comment by ChristianKl · 2013-07-02T12:07:16.341Z · LW(p) · GW(p)

I think that would be hard for a human to do, even if ex ante they might wish to reprogram themselves to do so.

I think it's hard because most human's don't live their lives according to principles. They care more about the lives of close friends than they care about their principles.

In the end reprograming yourself in that way is about being a good stoic.

↑ comment by PhilosophyStudent · 2013-06-30T03:14:09.972Z · LW(p) · GW(p)

Thanks for the reply, more interesting arguments.

Two-boxers think that decisions are things that can just fall out of the sky uncaused.

I'm not sure that's a fair description of two-boxers. Two-boxers think that the best way to model the causal effects of a decision are by intervention or something similar. At no point do two-boxers need to deny that decisions are caused. Rather, they just need to claim that the way you figure out the causal effects of an action are by intervention like modelling.

I also think there's a tendency among two-boxers not to take the stakes of Newcomb's problem seriously enough. Suppose that instead of offering you a million dollars Omega offers to spare your daughter's life. Now what do you do?

I don't claim to be a two-boxer so I don't know. But I don't think this point really undermines the strength of the two-boxing arguments.

Replies from: Qiaochu_Yuan

↑ comment by Qiaochu_Yuan · 2013-06-30T03:30:19.477Z · LW(p) · GW(p)

Two-boxers think that the best way to model the causal effects of a decision are by intervention or something similar.

Yes, that's what I mean by decisions falling out of the sky uncaused. When a two-boxer models the causal effects of deciding to two-box even if Omega predicts that they one-box, they're positing a hypothetical in which Omega's prediction is wrong even though they know this to be highly unlikely or impossible depending on the setup of the problem. Are you familiar with how TDT sets up the relevant causal diagram?

But I don't think this point really undermines the strength of the arguments I outline above.

I think it undermines their attractiveness. I would say unhesitatingly that one-boxing is the correct decision in that scenario because it's the one that saves my daughter, and I would furthermore say this even if I didn't have a decision theory that returned that as the correct decision.

If I write down a long argument that returns a conclusion I know is wrong, I can conclude that there's something wrong with my argument even if I can't point to a particular step in my argument I know to be wrong.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T03:40:49.029Z · LW(p) · GW(p)

Yes, that's what I mean by decisions falling out of the sky uncaused. When a two-boxer models the causal effects of deciding to two-box even if Omega predicts that they one-box, they're positing a hypothetical in which Omega's prediction is wrong even though they know this to be highly unlikely or impossible depending on the setup of the problem.

The two-boxer claims that causal consequences are what matters. If this is false, the two-boxer is already in trouble but if this is true then it seems unclear (to me) that the fact that the correct way of modelling causal consequences involves interventions should be a problem. So I'm unclear as to whether there's really an independent challenge here. But I will have to think on this more so don't have anything more to say for now (and my opinion may change on further reflection as I can see why this argument feels compelling).

And yes, I'm aware of how TDT sets up the causal diagrams.

I think it undermines their attractiveness. I would say unhesitatingly that one-boxing is the correct decision in that scenario because it's the one that saves my daughter, and I would furthermore say this even if I didn't have a decision theory that returned that as the correct decision.

In response, the two-boxer would say that it isn't your decision that saves your daughter (it's your agent type) and they're not talking about agent type. Now I'm not saying they're right to say this but I don't think that this line advances the argument (I think we just end up where we were before).

Replies from: Qiaochu_Yuan, ChristianKl

↑ comment by Qiaochu_Yuan · 2013-06-30T03:46:49.177Z · LW(p) · GW(p)

Okay, but why does the two-boxer care about decisions when agent type appears to be what causes winning (on Newcomblike problems)? Your two-boxer seems to want to split so many hairs that she's willing to let her daughter die for it.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T03:52:51.195Z · LW(p) · GW(p)

No argument here. I'm very open to the suggestion that the two-boxer is answering the wrong question (perhaps they should be interested in rational agent type rather than rational decisions) but it is often suggested on LW that two-boxers are not answering the wrong question but rather are getting the wrong answer (that is, it is suggested that one-boxing is the rational decision, not that it is uninteresting whether this is the case).

Replies from: Qiaochu_Yuan

↑ comment by Qiaochu_Yuan · 2013-06-30T03:57:55.616Z · LW(p) · GW(p)

One-boxing is the rational decision; in LW parlance "rational decision" means "the thing that you do to win." I don't think splitting hairs about this is productive or interesting.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T04:05:36.814Z · LW(p) · GW(p)

I agree. A semantic debate is uninteresting. My original assumption about the differences between two-boxing philosophers and one-boxing LWers was that the two groups used words differently and were engaged in different missions.

If you think the difference is just:

(a) semantic; (b) a difference of missions; (c) a different view of which missions are important

then I agree and I also agree that a long hair splitting debate is uninteresting.

However, my impression was that some people on LW seem to think there is more than a semantic debate going on (for example, my impression was that this is what Eliezer thought). This assumption is what motivated the writing of this post. If you think this assumption is wrong, it would be great to know as if this is the case, I now understand what is going on.

Replies from: Qiaochu_Yuan

↑ comment by Qiaochu_Yuan · 2013-06-30T04:10:44.326Z · LW(p) · GW(p)

There is more than a semantic debate going on to the extent that two-boxers are of the opinion that if they faced an actual Newcomb's problem, then what they should actually do is to actually two-box. This isn't a disagreement about semantics but about what you should actually do in a certain kind of situation.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T04:27:02.298Z · LW(p) · GW(p)

Okay. Clarified, so to return to:

Okay, but why does the two-boxer care about decisions when agent type appears to be what causes winning (on Newcomblike problems)?

The two-boxer cares about decisions because they use the word decision to refer to those things we can control. So they say that we can't control our past agent type but can control our taking of the one or two boxes. Of course, a long argument can be held about what notion of "control" we should appeal to here but it's not immediately obvious to me that the two-boxer is wrong to care about decisions in their sense. So they would say that what thing we care about depends not only on what things can cause the best outcome but also on whether we can exert control over these things. The basic claim here seems reasonable enough.

Replies from: Qiaochu_Yuan

↑ comment by Qiaochu_Yuan · 2013-06-30T05:01:55.920Z · LW(p) · GW(p)

The basic claim here seems reasonable enough.

Yes, and then their daughters die. Again, if a long argument outputs a conclusion you know is wrong, you know there's something wrong with the argument even if you don't know what it is.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T05:06:18.592Z · LW(p) · GW(p)

It's not clear to me that the argument outputs the wrong conclusion. Their daughters die because of their agent type at time of prediction not because of their decision and they can't control their agent type at this past time so they don't try to. It's unclear that someone is irrational for exerting the best influence they can. Of course, this is all old debate so I don't think we're really progressing things here.

Replies from: Qiaochu_Yuan

↑ comment by Qiaochu_Yuan · 2013-06-30T05:18:51.361Z · LW(p) · GW(p)

they can't control their agent type at this past time so they don't try to.

But if they didn't think this, then their daughters could live. You don't think, in this situation, you would even try to stop thinking this way? I'm trying to trigger a shut up and do the impossible intuition here, but if you insist on splitting hairs, then I agree that this conversation won't go anywhere.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T05:27:57.971Z · LW(p) · GW(p)

Yes, if the two boxer had a different agent type in the past then their daughters would live. No disagreement there. But I don't think I'm splitting hairs by thinking this doesn't immediately imply that one-boxing is the rational decision (rather, I think you're failing to acknowledge the possibility of potentially relevant distinctions).

I'm not actually convinced by the two-boxing arguments but I don't think they're as obviously flawed as you seem to. And yes, I think we now agree on one thing at least (further conversation will probably not go anywhere) so I'm going to leave things at that.

Replies from: CoffeeStain

↑ comment by CoffeeStain · 2013-07-02T06:36:03.736Z · LW(p) · GW(p)

As the argument goes, you can't control your past selves, but that isn't the form of the experiment. The only self that you're controlling is the one deciding whether to one-box (equivalently, whether to be a one-boxer).

See, that is the self that past Omega is paying attention to in order to figure out how much money to put in the box. That's right, past Omega is watching current you to figure out whether or not to kill your daughter / put money in the box. It doesn't matter how he does it, all that matters is whether or not your current self decides to one box.

To follow a thought experiment I found enlightening here, how is it that past Omega knows whether or not you're a one-boxer? In any simulation he could run of your brain, the simulated you could just know it's a simulation and then Omega wouldn't get the correct result, right? But, as we know, he does get the result right, almost all of the time. Ergo, the simulated you looks outside, it sees a bird on a tree. If it uses the bathroom, the toilet might clog. Any giveaway might make the selfish you try to two-box while still one-boxing in real life.

The point? How do you know that current you isn't the simulation past Omega is using to figure out whether to kill your daughter? Are philosophical claims about the irreducibility of intentionality enough to take the risk?

↑ comment by ChristianKl · 2013-07-02T10:43:16.642Z · LW(p) · GW(p)

In response, the two-boxer would say that it isn't your decision that saves your daughter (it's your agent type) and they're not talking about agent type.

I think that's again about decisions falling out of the sky. The agent type causes decisions to happen. People can't make decisions that are inconsistent with their own agent type.

↑ comment by [deleted] · 2013-07-01T14:22:42.914Z · LW(p) · GW(p)

Thank you for referencing Anna Salamon's diagrams. I would have one boxed in the first place, but I really think that those help make it much more clear in general.

↑ comment by buybuydandavis · 2013-06-30T08:58:39.953Z · LW(p) · GW(p)

Two-boxers think that decisions are things that can just fall out of the sky uncaused.

Yes, every two boxer I've ever known has said exactly that a thousand times.

↑ comment by Dan_Moore · 2013-07-01T15:47:40.140Z · LW(p) · GW(p)

Two-boxers think that decisions are things that can just fall out of the sky uncaused.

It seems that 2-boxers make this assumption, whereas some 1-boxers (including me) apply a Popperian approach to selecting a model of reality consistent with the empirical evidence.

comment by paulfchristiano · 2013-06-30T09:14:43.639Z · LW(p) · GW(p)

Basically: EDT/UDT has simple arguments in its favor and seems to perform well. There don't seem to be any serious arguments in favor of CDT, and the human intuition in its favor seems quite debunkable. So it seems like the burden of proof is on CDT, to justify why it isn't crazy. It may be that CDT has met that burden, but I'm not aware of it.

A. The dominance arguments in favor of two-boxing seem quite weak. They tend to apply verbatum to playing prisoner's dilemmas against a mirror (If the mirror cooperates you'd prefer defect, if the mirror defects you'd prefer defect, so regardless of the state of nature you'd prefer defect). So why do you not accept the dominance argument for a mirror, but accept it in the case of Newcomb-like problems? To discriminate the cases it seems you need to make an assumption of no causal connection, or a special role for time, in your argument.

This begs the question terribly---why is a causal connection privileged? Why is the role of time privileged? As far as I can tell these two things are pretty arbitrary and unimportant. I'm not aware of any strong philosophical arguments for CDT, besides "it seems intuitively sensible to a human," and see below for the debunking of those intuitions. (Again, maybe there are better arguments here, but I've never encountered one. Basically I'm looking for any statement of a kind of dominance principle over states of nature, which doesn't look completely arbitrary and is also at all plausible.)

B. A sophisticated interpretation of EDT (called UDT around here) seems to perform well in all cases we've considered, in the sense that an agent making good decisions will achieve good outcomes. I think this is strong evidence in favor of a theory which purports to say which actions are good, since good decisions ought to lead to good outcomes; I agree its not a knock-down argument, but again I know of no serious counterarguments.

C. It seems that EDT is supported by the simplest philosophical arguments. We need to choose between outcomes in which we make decision A vs. decision B. It makes sense to choose between outcomes which we consider to be possible (in which we make decision A or decision B). CDT doesn't do this, and considers outcomes which are inconsistent with our knowledge of the situation. This isn't enough to pin down EDT uniquely (though further arguments can), but it does seem like a strong point in favor of EDT over CDT.

D. An agent living in an environment like humans' will do fine by using CDT, because the only effects of their decisions are causal. CDT is much simpler to run than EDT because it doesn't rely on a strong self-model (doing EDT without a good self-model results in worse decisions than CDT in reasonable situations; this is basically what the claims that EDT performs badly in such-and-such a situation amount to, at least the ones I have seen). So it seems like we can pretty easily explain why humans have an intuition in favor of CDT, and it seems like extremely weak evidence against EDT/UDT.

Replies from: Qiaochu_Yuan, Robert_Unwin, Protagoras

↑ comment by Qiaochu_Yuan · 2013-06-30T09:23:27.020Z · LW(p) · GW(p)

I'm happy to learn that you consider UDT a variant of EDT, because after thinking about these issues for awhile my current point of view is that some form of EDT is obviously the correct thing to do, but in standard examples of EDT failing the relevant Bayesian updates are being performed incorrectly. The problem is that forcing yourself into a reference class by performing an action doesn't make it reasonable for you to reason as if you were a random sample from that reference class, because you aren't: you introduced a selection bias. Does this agree with your thoughts?

↑ comment by Robert_Unwin · 2013-06-30T21:16:04.317Z · LW(p) · GW(p)

"why is a causal connection privileged?" I agree with everything here. What follows is merely history.

Historically, I think that CDT was meant to address the obvious shortcomings of choosing to bring about states that were merely correlated with good outcomes (as in the case of whitening one's teeth to reduce lung cancer risk). When Pearl advocates CDT, he is mainly advocating acting based on robust connections that will survive the perturbation of the system caused by the action itself. (e.g. Don't think you'll cure lung cancer by making your population brush their teeth, because that is a non-robust correlation that will be eliminated once you change the system). The centrality of causality in decision making was obvious intuitively but wasn't reflected in formal Bayesian decision theory. This was because of the lack of a good formalism linking probability and causality (and some erroneous positivistic scruples against the very idea of causality). Pearl and SGS's work on causality has done much to address this, but I think there is much to be done.

There is a very annoying historical accident where EDT was taken to be the 'one-boxing' decision theory. First, any use of probability theory in the NP with infallible predictor is suspicious, because the problem can be specified in a logically complete way with no room for empirical uncertainty. (This is why dominance reasoning is brought in for CDT. What should the probabilities be?). Second, EDT is not easy to make coherent given an agent who knows they follow EDT. (The action that EDT disfavors will have probability zero and so the agent cannot condition on it in traditional probability theory). Third, EDT just barely one-boxes. It doesn't one-box on Double Transparent Newcomb, nor on Counterfactual Mugging. It's also obscure what it does on PD. (Again, I can play the PD against a selfish clone of myself, with both agents having each other's source code. There is no empirical uncertainty here, and so applying probability theory immediate raises deep foundational problems).

If TDT/UDT had come first (including the logical models and deep connections to Godel's theorem), the philosophy discussion of NP would have been very different. EDT (which brings into the NP very dubious empirical probability distributions) would not have been considered at all for NP. I don't see that CDT would have held much interest if its alternative was not as feeble as EDT.

It is important to understand why economists have done so much work with Nash Equilibria (e.g. on the PD) rather than invent UDT. This is explained by the fact that the assumption of logical correlation and perfect empirical knowledge between agents in the PD is not the practical reality. This doesn't mean that UDT is not relevant to practical situations, but only that these situations involve many additional elements that may be complex to deal with in UDT. Causal based theories would have been interesting independently, for the reasons noted above concerning robust correlations.

EDIT: I realize the comment by Paul Christiano sometimes describes UDT as a variant of EDT. When I used the term "EDT" I mean the theory discussed in the philosophy literature which involves choosing the action that maximizes P(outcomes / action). This is a theory which essentially makes use of vanilla conditional probability. In what I say, I assume that UDT/TDT, despite some similarity to EDT in spirit, are not limited to regular conditioning and do not fail on smoking lesion.

↑ comment by Protagoras · 2013-07-01T03:56:37.115Z · LW(p) · GW(p)

I wonder if David Lewis (perhaps the most notorious philosophical two-boxer) was skeptical that any human had a sufficiently strong self-model. I think there are very who few have better self-models than he did, so it's quite interesting if he did think this. His discussion of the "tickle defence" in his paper "Causal Decision Theory" may point that way.

comment by Shmi (shminux) · 2013-06-30T04:39:32.567Z · LW(p) · GW(p)

There are no two-boxers in foxholes.

comment by ThrustVectoring · 2013-06-30T03:00:18.491Z · LW(p) · GW(p)

The intuition pump that got me to be a very confident one-boxer is the idea of submitting computer code that makes a decision, rather than just making a decision.

In this version, you don't need an Omega - you just need to run the program. It's a lot more obvious that you ought to submit a program that one-boxes than it is obvious that you ought to one-box. You can even justify this choice on causal decision-theory grounds.

With the full Newcomb problem, the causality is a little weird. Just think of yourself as a computer program with partial self-awareness. Deciding whether to one-box or two-box updates the "what kind of decision-making agent am I" node, which also caused Omega to either fill or not fill the opaque box.

Yes, it's wonky causality - usually the future doesn't seem to affect the past. Omega is just so unlikely that given that you're talking to Omega, you can justify all sorts of slightly less unlikely things.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T03:18:35.506Z · LW(p) · GW(p)

Okay. As a first point, it's worth noting that the two-boxer would agree that you should submit one-boxing code because they agree that one-boxing is the rational agent type. However, they would disagree that one-boxing is the rational decision. So I agree that this is a good intuition pump but it is not one that anyone denies.

But you go further, you follow this claim up by saying that we should think of causation in Newcomb's problem as being a case where causality is weird (side note: Huw Price presents an argument of this sort, arguing for a particular view of causation in these cases). However, I'm not sure I feel any "intuition pump" force here (I don't see why I should just intuitively find these claims plausible).

Replies from: ThrustVectoring

↑ comment by ThrustVectoring · 2013-06-30T03:26:30.925Z · LW(p) · GW(p)

it's worth noting that the two-boxer would agree that you should submit one-boxing code because they agree that one-boxing is the rational agent type.

Running one-boxing code is analogous to showing Omega your decision algorithm and then deciding to one-box. If you think you should run code that one-boxes, then by analogy you should decide to one-box.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T03:29:34.874Z · LW(p) · GW(p)

Yes. Personally, I think the analogy is too close to pump intuitions (or it doesn't pump my intuitions though perhaps this is just my failure).

The two-boxer will say that if you can choose what code to submit, you should submit one-boxing code but that you shouldn't later run this code. This is the standard claim that you should precommit to one-boxing but should two-boxing in Newcomb's problem itself.

Replies from: Creutzer

↑ comment by Creutzer · 2013-06-30T07:52:57.531Z · LW(p) · GW(p)

But the very point is that you can't submit one piece of code and run another. You have to run what you submitted. That, again, is the issue that decisions don't fall from the sky uncaused. The reason why CDT can't get Newcomb's right is that due to its use of surgery on the action node, it cannot conceive of its own choice as predetermined. You are precommitted already just in virtue of what kind of agent/program you are.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T08:28:55.169Z · LW(p) · GW(p)

But the very point is that you can't submit one piece of code and run another. You have to run what you submitted.

Yes. So the two-boxer says that you should precommit to later making an irrational decision. This does not require them to say that the decision you are precommitting to is later rational. So the two-boxer would submit the one-boxing code despite the fact that one unfortunate effect of this would be that they would later irrationally run the code (because there are other effects which counteract this).

I'm not saying your argument is wrong (nor am I saying it's right). I'm just saying that the analogy is too close to the original situation to pump intuitions. If people don't already have the one-boxing intuition in Newcomb's problem then the submitting code analogy doesn't seem to me to make things any clearer.

Replies from: pjeby

↑ comment by pjeby · 2013-06-30T23:01:06.445Z · LW(p) · GW(p)

the two-boxer says that you should precommit to later making an irrational decision

I think the piece that this hypothetical two-boxer is missing is that they are acting as though the problem is cheating, or alternatively, that the premises can be cheated. That is, that you are able to make a decision that wasn't predictable beforehand. If your decision is predictable, two boxing is irrational, even considered as a single decision.

Try this analogy: instead of predicting your decision in advance, Omega simply scans your brain to determine what to put in the boxes, at the very moment you make the decision.

Does your hypothetical two-boxer still argue that one-boxing in this scenario is "irrational"?

If so, I cannot make sense of their answer. But if not, then the burden falls on the two boxer to explain how this scenario is any different from a prediction made a fraction of a millisecond sooner. How far before or after the point of decision does the decision become "rational" or "irrational" in their mind? (I use quotes here because I cannot think of any coherent definition of those terms that's still consistent with the hypothetical usage.)

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T23:38:03.773Z · LW(p) · GW(p)

The two-boxer never assumes that the decision isn't predictable. They just say that the prediction can no longer be influenced and so you may as well gain the $1000 from the transparent box.

In terms of your hypothetical scenario, the question for the two-boxer will be whether the decision causally influences the result of this brain scan. If yes, then, the two-boxer will one-box (weird sentence). If no, the two-boxer will two-box.

Replies from: pjeby

↑ comment by pjeby · 2013-07-01T00:02:41.985Z · LW(p) · GW(p)

the question for the two-boxer will be whether the decision causally influences the result of this brain scan. If yes, then, the two-boxer will one-box (weird sentence). If no, the two-boxer will two-box.

How would it not causally influence the brain scan? Are you saying two-boxers can make decisions without using their brains? ;-)

In any event, you didn't answer the question I asked, which was at what point in time does the two-boxer label the decision "irrational". Is it still "irrational" in their estimation to two-box, in the case where Omega decides after they do?

Notice that in both cases, the decision arises from information already available: the state of the chooser's brain. So even in the original Newcomb's problem, there is a causal connection between the chooser's brain state and the boxes' contents. That's why I and other people are asking what role time plays: if you are using the correct causal model, where your current brain state has causal influence over your future decision, then the only distinction two-boxers can base their "irrational" label on is time, not causality.

The alternative is to argue that it is somehow possible to make a decision without using your brain, i.e., without past causes having any influence on your decision. You could maybe do that by flipping a coin, but then, is that really a "decision", let alone "rational"?

If a two-boxer argues that their decision cannot cause a past event, they have the causal model wrong. The correct model is one of a past brain state influencing both Omega's decision and your own future decision.

For me, the simulation argument made it obvious that one-boxing is the rational choice, because it makes clear that your decision is algorithmic. "Then I'll just decide differently!" is, you see, still a fixed algorithm. There is no such thing as submitting one program to Omega and then running a different one, because you are the same program in both cases -- and it's that program that is causal over both Omega's behavior and the "choice you would make in that situation". Separating the decision from the deciding algorithm is incoherent.

As someone else mentioned, the only way the two-boxer's statements make any sense is if you can separate a decision from the algorithm used to arrive at that decision. But nobody has presented any concrete theory by which one can arrive at a decision without using some algorithm, and whatever algorithm that is, is your "agent type". It doesn't make any sense to say that you can be the type of agent who decides one way, but when it actually comes to deciding, you'll decide another way.

How does your hypothetical two-boxer respond to simulation or copy arguments? If you have no way of knowing whether you're the simulated version of you, or the real version of you, which decision is rational then?

To put it another way, a two-boxer is arguing that they ought to two-box while simultaneously not being the sort of person who would two-box -- an obvious contradiction. The two-boxer is either arguing for this contradiction, or arguing about the definitions of words by saying "yes, but that's not what 'rational' means".

Indeed, most two-boxers I've seen around here seem to alternate between those two positions, falling back to the other whenever one is successfully challenged.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-07-01T00:40:58.373Z · LW(p) · GW(p)

In any event, you didn't answer the question I asked, which was at what point in time does the two-boxer label the decision "irrational". Is it still "irrational" in their estimation to two-box, in the case where Omega decides after they do?

Time is irrelevant to the two-boxer except as a proof of causal independence so there's no interesting answer to this question. The two-boxer is concerned with causal independence. If a decision cannot help but causally influence the brain scan then the two-boxer would one-box.

Notice that in both cases, the decision arises from information already available: the state of the chooser's brain. So even in the original Newcomb's problem, there is a causal connection between the chooser's brain state and the boxes' contents. That's why I and other people are asking what role time plays: if you are using the correct causal model, where your current brain state has causal influence over your future decision, then the only distinction two-boxers can base their "irrational" label on is time, not causality.

Two-boxers use a causal model where your current brain state has causal influence on your future decisions. They are interested in the causal effects of the decision not the brain state and hence the causal independence criterion does distinguish the cases in their view and they need not appeal to time.

If a two-boxer argues that their decision cannot cause a past event, they have the causal model wrong. The correct model is one of a past brain state influencing both Omega's decision and your own future decision.

They have the right causal model. They just disagree about which downstream causal effects we should be considering.

For me, the simulation argument made it obvious that one-boxing is the rational choice, because it makes clear that your decision is algorithmic. "Then I'll just decide differently!" is, you see, still a fixed algorithm. There is no such thing as submitting one program to Omega and then running a different one, because you are the same program in both cases -- and it's that program that is causal over both Omega's behavior and the "choice you would make in that situation". Separating the decision from the deciding algorithm is incoherent.

No-one denies this. Everyone agrees about what the best program is. They just disagree about what this means about the best decision. The two-boxer says that unfortunately the best program leads us to make a non-optimal decision which is a shame (but worth it because the benefits outweigh the cost). But, they say, this doesn't change the fact that two-boxing is the optimal decision (while acknowledging that the optimal program one-boxes).

How does your hypothetical two-boxer respond to simulation or copy arguments? If you have no way of knowing whether you're the simulated version of you, or the real version of you, which decision is rational then?

I suspect that different two-boxers would respond differently as anthropic style puzzles tend to elicit disagreement.

To put it another way, a two-boxer is arguing that they ought to two-box while simultaneously not being the sort of person who would two-box -- an obvious contradiction. The two-boxer is either arguing for this contradiction, or arguing about the definitions of words by saying "yes, but that's not what 'rational' means".

Well, they're saying that the optimal algorithm is a one-boxing algorithm while the optimal decision is two-boxing. They can explain why as well (algorithms have different causal effects to decisions). There is no immediate contradiction here (it would take serious argument to show a contradiction like, for example, an argument showing that decisions and algorithms are the same thing). For example, imagine a game where I choose a colour and then later choose a number between 1 and 4. With regards to the number, if you pick n, you get $n. With regards to the colour, if you pick red, you get $0, if you pick blue you get $5 but then don't get a choice about the number (you are presumed to have picked 1). It is not contradictory to say that the optimal number to pick is 1 but the optimal colour to pick is blue. The two-boxer is saying something pretty similar here.

What "ought" you do, according to the two-boxer. Well that depends what decision you're facing. If you're facing a decision about what algorithm to adopt, then adopt the optimal algorithm (which one-boxers on all future versions of NP though not ones where the prediction has occurred). If you are not able to choose between algorithms but are just choosing a decision for this occasion then choose two-boxing. They do not give contradictory advice.

Replies from: pjeby

↑ comment by pjeby · 2013-07-01T18:48:27.772Z · LW(p) · GW(p)

two-boxing is the optimal decision

Taboo "optimal".

The problem here is that this "optimal" doesn't cash out to anything in terms of real world prediction, which means it's alberzle vs. bargulum all over again. A and B don't disagree about predictions of what will happen in the world, meaning they are only disagreeing over which definition of a word to use.

In this context, a two boxer has to have some definition of "optimal" that doesn't cash out the same as LWers cash out that word. Because our definition is based on what it actually gets you, not what it could have gotten you if the rules were different.

If you're facing a decision about what algorithm to adopt, then adopt the optimal algorithm (which one-boxers on all future versions of NP though not ones where the prediction has occurred). If you are not able to choose between algorithms but are just choosing a decision for this occasion then choose two-boxing.

And what you just described is a decision algorithm, and it is that algorithm which Omega will use as input to decide what to put in the boxes. "Decide to use algorithm X" is itself an algorithm. This is why it's incoherent to speak of a decision independently - it's always being made by an algorithm.

"Just decide" is a decision procedure, so there's actually no such thing as "just choosing for this occasion".

And, given that algorithm, you lose on Newcomb's problem, because what you described is a two-boxing decision algorithm: if it is ever actually in the Newcomb's problem situation, an entity using that decision procedure will two-box, because "the prediction has occurred". It is therefore trivial for me to play the part of Omega here and put nothing under the box when I play against you. I don't need any superhuman predictive ability, I just need to know that you believe two boxing is "optimal" when the prediction has already been made. If you think that way, then your two-boxing is predictable ahead of time, and there is no temporal causation being violated.

Barring some perverse definition of "optimal", you can't think two-boxing is coherent unless you think that decisions can be made without using your brain - i.e. that you can screen off the effects of past brain state on present decisions.

Again, though, this is alberzle vs bargulum. It doesn't seem there is any argument about the fact that your decision is the result of prior cause and effect. The two-boxer in this case seems to be saying "IF we lived in a world where decisions could be made non-deterministically, then the optimal thing to do would be to give every impression of being a one-boxer until the last minute." A one boxer agrees that this conditional statement is true... but entirely irrelevant to the problem at hand, because it does not offer such a loophole.

So, as to the question of whether two boxing is optimal, we can say it's alberzle-optimal but not bargulum-optimal, at which point there is nothing left to discuss.

comment by wedrifid · 2013-06-30T06:09:48.000Z · LW(p) · GW(p)

So I'm turning to you guys to ask for help figuring out whether I should be a staunch one-boxer too.

Only if you like money.

comment by Ishaan · 2013-06-30T07:52:54.457Z · LW(p) · GW(p)

The optimal thing would be to have Omega think that you will one-box, but you actually two box. You'd love to play Omega for a fool, but the problem explicitly tells you that you can't, and that Omega can somehow predict you.

Omega has extremely good predictions. if you've set your algorithm in such a state that Omega will predict that you one-box, you will be unable to do anything but one-box - your neurons are set in place, causal lines have already insured your decision, and free will doesn't exist in the sense that you can change your decision after the fact.

Replies from: Decius

↑ comment by Decius · 2013-07-03T06:10:14.644Z · LW(p) · GW(p)

In the strictest sense, that requires breaking the speed barrier to information. Otherwise I'm going to bring in a cosmic ray detector and two box iff the time between the second and third detection is less than the time between the first and second.

comment by David_Gerard · 2013-06-30T22:22:13.603Z · LW(p) · GW(p)

The problem is no free lunch. Any decision theory is going to fail somewhere. The case for privileging Newcomb as a success goal over all other considerations has not, in fact, been made.

Replies from: None

↑ comment by [deleted] · 2013-06-30T22:47:08.519Z · LW(p) · GW(p)

So I raised this problem too, and I got a convincing answer to it. The way I raised it was to say that it isn't fair to fault CDT for failing to maximise expected returns in Newcomb's problem, because Newcomb's problem was designed to defeat CDT and we can design a problem to defeat any decision theory. So that can't be a standard.

The response I got (at least, my interpretation of it) was this: It is of course possible to construct a problem in which any decision theory is defeated, but not all such problems are equal. We can distinguish in principle between problems that can defeat any decision procedure (such as 'omega gives you an extra million for not using X', where X is the decision procedure you wish to defeat) and problems which defeat certain decision procedures but cannot be constructed so as to defeat others. Call the former type 1 problems, and the latter type 2 problems. Newcomb's problem is a type 2 problem, as is the prisoner's dilemma against a known psychological twin. Both defeat CDT, but not TDT, and cannot be constructed so as to defeat TDT without becoming type 1. TDT is aimed (though I think not yet successful) at being able to solve all type 2 problems.

So if we have two decision theories, both of which fail type 1 problems, but only one of which fails type 2 problems, we should prefer the one that never fails type 2 problems. This would privilege Newcomb's problem (as a type 2 problem) over any type 1 problems. It would cease to be an argument for the privileging of type 2 problems over type 1 problems if it turned out that every decision theory will always fail some set of type 2 problems. But that would, I think, be a hard case to make.

Replies from: Decius

↑ comment by Decius · 2013-07-03T05:59:21.829Z · LW(p) · GW(p)

Can you construct a problem that defeats TDT that cannot be constructed to defeat CDT? (I think I can- The Pirates' problem against psychological twins).

Replies from: None

↑ comment by [deleted] · 2013-07-03T09:17:30.369Z · LW(p) · GW(p)

No, I don't have any such thing in mind. Could you explain how TDT and CDT get different results?

Replies from: Decius

↑ comment by Decius · 2013-07-03T22:17:16.631Z · LW(p) · GW(p)

The CDT result is pretty well known: the first pirate gets almost everything.

The TDT result is hard for me, but if the first pirate gets anything then more than half of the other pirates had a strategy that could be trivially improved.

comment by Ronak · 2013-06-30T19:12:47.490Z · LW(p) · GW(p)

[Saying same thing as everyone else, just different words. Might work better, might not.]

Suppose once Omega explains everything to you, you think 'now either the million dollars are there or aren't and my decision doesn't affect shit.' True, your decision now doesn't affect it - but your 'source code' (neural wiring) contains the information 'will in this situation think thoughts that support two-boxing and accept them.' So, choosing to one-box is the same as being the type of agent who'll one-box.
The distinction between agent type and decision is artificial. If your decision is to two-box, you are the agent-type who will two-box. There's no two ways about it. (As others have pointed out, this has been formalised by Anna Salomon.)

The only way you can get out of this is if you believe in free will as something that exists in some metaphysical sense. Then to you, Omega being this accurate is beyond the realm of possibility and therefore the question is unfair.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T23:35:38.073Z · LW(p) · GW(p)

Two-boxing definitely entails that you are a two-boxing agent type. That's not the same claim as the claim that the decision and the agent type are the same thing. See also my comment here. I would be interested to know your answer to my questions there (particularly the second one).

Replies from: Ronak

↑ comment by Ronak · 2013-07-03T20:47:29.411Z · LW(p) · GW(p)

When I said 'A and B are the same,' I meant that it is not possible for one of A and B to have a different truth-value from the other. Two-boxing entails you are a two-boxer, but being a two-boxer also entails that you'll two-box. But let me try and convince you based on your second question, treating the two as at least conceptually distinct.

Imagine a hypothetical time when people spoke about statistics in terms of causation rather than correlation (and suppose no one had done Pearl's work). As you can imagine, the paradoxes would write themselves. At one point, someone would throw up his/her arms and tell everyone to stop talking about causation. And then the causalists would rebel, because causality is a sacred idea. The correlators would reply probably by constructing a situation where a third, unmeasured C caused both A and B.
Newcomb's is that problem for decision theory. CDT is in a sense right when it says one-boxing doesn't cause there to be a million dollars in the box, that what does cause the money to be there is being a one-boxer. But, it ignores the fact that the same thing that caused there to be the million dollars also causes you to one-box - so, there may not be a causal link there very definitely is a correlation.
'C causing both A and B' is an instance of the simplest and most intuitive way in which correlation can be not causation, and CDT fails. EDT is looking at correlations between decisions and consequences and using that to decide.

Aside: You're right, though, that the LW idea of a decision is somewhat different from the CDT idea. You define it as "a proposition that the agent can make true or false at will." That definition has this really enormous black box called will - and if Omega has an arbitrarily high predictive accuracy, then it must be the case that that black box is a causal link going from Omega's raw material for prediction (brain state) to decision. CDT, when it says that you ought to only look at causal arrows that begin at the decision, assumes that there can be no causal arrow that points to the decision (because the moment you admit that there can be a causal arrow that begins somewhere and ends at your decision, you have to admit that there can exist C that causes both your decision and a consequence without your decision actually causing the consequence).
In short, the new idea of what a decision is itself causes the requirement for a new decision theory.

comment by DanielLC · 2013-06-30T04:22:43.144Z · LW(p) · GW(p)

My problem with causal decision theory is that it treats the past different from the future for no good reason. If you read the quantum physics sequence, particularly the part about timeless physics, you will find that time is most likely not even an explicit dimension. The past is more likely to be known, but it's not fundamentally different from the future.

The probability of box A having money in it is significantly higher given that you one box then the probability given that you do not. What more do you need to know?

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T04:35:11.606Z · LW(p) · GW(p)

This seems like an interesting point. If either time or causation doesn't work in the way we generally tend to think it does then the intuitions in favour of CDT fall pretty quickly. However, timeless physics is hardly established science and various people are not very positive about the QM sequence. So while this seems interesting I don't know that it helps me personally to come to a final conclusion on the matter.

Replies from: DanielLC

↑ comment by DanielLC · 2013-06-30T05:00:27.341Z · LW(p) · GW(p)

Consider this altered form of the problem:

Omega offers you two boxes. One is empty and the other has one thousand dollars. He offers you a choice of taking just the empty box or both boxes. If you just take the empty box, he will put a million dollars in it. You decide that you can't change the big bang, and given the big bang his choice of whether or not to put a million dollars in the box is certain, so you can't influence his decision to put the money in the box. As such, you might as well take both boxes.

How can you have control over the future but not the past if the two are correlated?

comment by Richard_Kennaway · 2013-07-01T08:39:54.374Z · LW(p) · GW(p)

Does Omega one-box against Omega?

comment by Sly · 2013-06-30T08:48:09.632Z · LW(p) · GW(p)

Here is another way to think about this problem.

Imagine if instead of Omega you were on a futuristic game show. As you go onto the show, you enter a future-science brain scanner that scans your brain. After scanning, the game show hosts secretly put the money into the various boxes behind stage.

You now get up on stage and choose whether to one or two box.

Keep in mind that before you got up on the show, 100 other contestants played the game that day. All of the two-boxers ended up with less money than the one-boxers. As an avid watcher of the show, you clearly remember that in every previous broadcast (one a day for ten years) the one-boxers did better than the two-boxers.

Can you honestly tell me that the superior move here is two-boxing? Where does the evidence point? If one strategy clearly and consistently produces inferior results compared to another strategy, that should be all we need to discard it as inferior.

Replies from: Decius

↑ comment by Decius · 2013-07-03T06:07:43.470Z · LW(p) · GW(p)

If one strategy clearly and consistently produc[ed] inferior results compared to another strategy, that should be all we need to discard it as inferior.

I disagree. Just because Rock lost every time it was played doesn't mean that it's inferior to Paper or Scissors, to use a trivial example.

Replies from: Sly

↑ comment by Sly · 2013-07-04T03:17:12.795Z · LW(p) · GW(p)

I disagree.

If rock always lost when people used it, that would be evidence against using rock.

Just like if you flip a coin 1000000 times and keep getting heads that is evidence of a coin that won't be coming up tails anytime soon.

Replies from: Decius

↑ comment by Decius · 2013-07-04T06:29:51.303Z · LW(p) · GW(p)

Playing your double: Evidence that your opponent will not use rock is evidence that you should not use paper. If you don't use rock, and don't use paper, then you must use scissors and tie with your opponent who followed the same reasoning.

Updating on evidence that rock doesn't win when it is used means rock wins.

EDIT: consider what you would believe if you tried to call a coin a large number of times and were always right. Then consider what you would believe if you were always wrong.

Replies from: Sly

↑ comment by Sly · 2013-07-04T09:06:28.501Z · LW(p) · GW(p)

"Rock lost every time it was played "

"rock doesn't win when it is used means rock wins."

One of these things is not like the other.

Replies from: Decius

↑ comment by Decius · 2013-07-05T01:28:02.715Z · LW(p) · GW(p)

Those aren't both things that I said.

For rock to lose consistently means that somebody isn't updating properly, or is using a failing strategy, or a winning strategy.

For example, if I tell my opponent "I'm going to play only paper", and I do, rock will always lose when played. That strategy can still win over several moves, if I am not transparent; all I have to do is correctly predict that my opponent will predict that the current round is the one in which I change my strategy.

If they believe (through expressed preferences, assuming that they independently try to win each round) that rock will lose against me, rock will win against them.

Replies from: Sly

↑ comment by Sly · 2013-07-07T05:11:04.329Z · LW(p) · GW(p)

Don't edit your post and then say you didn't say what you said. I literally just copy pasted what you wrote and added quotes around it.

Replies from: Decius

↑ comment by Decius · 2013-07-07T05:49:20.582Z · LW(p) · GW(p)

"I literally just ... edit your post ... and then say ... you said ... what ... you didn't say."

I can play the selective quotation game too. It doesn't make it valid.

What I originally wrote was "Just because Rock lost every time it was played doesn't mean that it's inferior to Paper or Scissors"

What you misquoted was the statement Updating on evidence that rock doesn't win when it is used means rock wins. (emphasis on added context)

That's standard behavior in the simple simultaneous strategy games; figure out what your opponent's move is and play the maneuver which counters it. If you are transparent enough that I can correctly determine that you will play the maneuver that would have won the most prior rounds, I can beat you in the long run. The correct update to seeing RPS is to update the model of your opponent's strategy, and base the winning percentages off what you believe your opponent's strategy is.

That's why I can win with "I always throw rock", stated plainly at the start. Most people (if they did the reasoning), would have very a very low prior that I was telling the truth, and the first round ties. The next round I typically win, with a comment of "I see what you did there".

What are your priors that my actual strategy, given that I had said I would always throw rock and threw rock the first time, would fall into either category: "Throw rock for N rounds and then change" or "Throw rock until it loses N times (in a row) and then change"? (Keep in mind conservation of probability: The sum of all N across each possible strategy must total 1)

If you don't ascribe a significant chance of me telling the truth, there is some N at which you stop throwing paper, even while it is working. The fact that throwing scissors would have lost you every prior match is not strong evidence that it will lose the next one.

Replies from: Sly

↑ comment by Sly · 2013-07-09T05:12:07.150Z · LW(p) · GW(p)

"I can play the selective quotation game too. It doesn't make it valid."

Except I didn't break things up with ellipses to make things up like you just did. Nice false equivocation.

Either rock always wins or it doesn't. I was pointing out the lack of consistency in what you said.

If you are proposing that rock does actually win, then that is completely different that what I setup in my scenario. A more accurate representation would be if paper was ALWAYS thrown by your opponents.

Then you come along and say that "no rock will actually win guys! Look at my theory that says so" before you get up and predictably lose. Just like everyone before you.

Replies from: wedrifid, Decius

↑ comment by wedrifid · 2013-07-09T06:35:35.716Z · LW(p) · GW(p)

Except I didn't break things up with ellipses to make things up like you just did. Nice false equivocation.

Your quotations were of sentence fragments that did not preserve meaning. There was exaggeration for emphasis but no false equivocation.

Replies from: Sly

↑ comment by Sly · 2013-07-10T08:22:37.834Z · LW(p) · GW(p)

I don't think this is even close to accurate.

His post was a blatant misrepresentation, a joke of an example.

My post took the exact words posted in order, showing a direct contradiction in his scenario. He then edited the quote that I had and removed it.

Beforehand it said that Rock always lost. After his edit that line was entirely removed, and then he said that I misquoted him. Sure, of course it looks like much more of a misquote after an edit. But I think that is highly deceptive, so I said so.

Beforehand he said that Rock always lost, and then said that Rock didn't actually lose. If his second statement was correct, then his first statement would be trivially false.

Let's dig further.

Original line: "Just because Rock lost every time it was played doesn't mean that it's inferior to Paper or Scissors"

My quote: ""Rock lost every time it was played "

Showing that he was talking about a scenario where Rock lost every time it was played. I highlighted the relevant part. The part about determining inferiority is irrelevant to the scenario.

Second Original Quote: "Updating on evidence that rock doesn't win when it is used means rock wins."

Second My Quote: "rock doesn't win when it is used means rock wins."

He is outlining a situation in which he thinks that Rock does win, even though the scenario contradicts that.

Comparing: "I literally just ... edit your post ... and then say ... you said ... what ... you didn't say."

And saying it is equivalent is ludicrous.

↑ comment by Decius · 2013-07-09T12:23:00.208Z · LW(p) · GW(p)

Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)

Replies from: Sly

↑ comment by Sly · 2013-07-10T08:08:34.629Z · LW(p) · GW(p)

"Suppose your opponent has thrown paper N (or X%) times and won every time they did. Is that evidence for, or evidence against, the proposition that they will play paper in the next trial? (or does the direction of evidence vary with N or X?)"

All of this is irrelevant.

So I will admit I am frustrated here. I don't think that your analogy is even close to equivalent,

I think you are thinking about this in the wrong way.

So let's say you were an adviser advising one of the players on what to choose. Every time you told him to throw rock over the last million games, he lost. Yet every time you told him to throw Scissors he won. Now you have thought very much about this problem, and all of your theorizing keeps telling you that your player should play Rock (the theorycrafting has told you this for quite a while now).

At what point is this evidence that you are reasoning incorrectly about the problem, and really you should just tell the player to play scissors? Would you actually continue to tell him to throw Rock if you were losing $1 every time the player you advised lost?

Now if this advising situation had been a game that you played with your strategy and I had separately played with my strategy, who would have won?

Replies from: Decius

↑ comment by Decius · 2013-07-10T19:03:06.395Z · LW(p) · GW(p)

Suppose my strategy made an equal (enough) number of suggestions for each option over the last 1m trials, while the opponent played paper every time. My current strategy suggests that playing rock on the next game is the best move. The opponent's move is defined to not be dependent on my prior moves (because otherwise things get too complicated for brief analysis)

There are two major competing posterior strategies at this point: "Scissors for the first 1M trials, then rock" and "Scissors for the first 1M trials" It is not possible for my prior probability for "Scissors for the first N, then rock" to be higher than my probability for "Scissors forever" for an infinite number of N, so there is some number of trials after which any legal prior probability distribution favors "Scissors forever", if it loses only a finite number of times.

At this point I'm going to try to save face by pointing out that for each N, there is a legal set of prior probabilities of the optimum strategy to suggest each option an equal number of time. They would have to be arranged such that "My opponent will play paper X times then something else" is more likely than "My opponent will play paper X times then play paper again" for 2/3 of X from 0 to N. Given that "My opponent will always play paper" is a superset of the latter, and each time I am wrong I must eliminate a probability space larger than it from consideration, and that I have been wrong 700k times, I obviously must have assigned less than ~1e-6 initial probability to all estimates that my opponent will play paper 1M+1 times in a row, but higher than that to ~700k cases of supersets of "my opponent will play paper X times in a row then change" where X is less than 1M. While a legal set of priors, I think it would be clearly unreasonable in practice to fail to adapt to a strategy of strict paper within 10.

Strangely, many of the strategies which are effective against humans for best-of-seven seem to be ineffective against rational agents for long-term performance. Would it be interesting to have a RPS best-of competition between programs with and without access to their opponent's source code, or even just between LW readers who are willing to play high-stakes RPS?

Replies from: Sly

↑ comment by Sly · 2013-07-13T19:52:14.958Z · LW(p) · GW(p)

Cool, sounds like we are converging.

I would be interested in seeing a RPS competition between programs, sounds interesting.

Replies from: Decius

↑ comment by Decius · 2013-07-13T22:24:36.137Z · LW(p) · GW(p)

Unweighted random wins 1/3 of the time; nobody can do better than that versus unweighted random. The rules would have to account for that.

I saw a long time ago a website that would play RPS against you using a genetic algorithm; it had something like a 80% win rate against casual human players.

comment by Manfred · 2013-06-30T05:23:34.262Z · LW(p) · GW(p)

Didn't we have a thread about this really recently?

Anyhow, to crib from the previous thread - an important point is reflective equilibrium. I shouldn't be able to predict that I'll do badly - if I know that, and the problem is "fair" in that it's a decision-determined, I can just make the other decision. Or if I'm doing things a particular way, and I know that another way of doing things would be better, and the problem is "fair" in that I can choose how to do things, I can just do things the better way. To sit and stew and lose anyhow is just cray talk.

A totally different and very good example problem where this shows up was covered by Wei Dai here.

Replies from: Nornagest

↑ comment by Nornagest · 2013-06-30T06:22:30.901Z · LW(p) · GW(p)

Yeah, CarlSchulman put up a couple of threads on Newcomb a couple weeks ago, here and here. The original Newcomb's Problem and Regret of Rationality thread has also been getting some traffic recently.

Offhand I don't see anything in this thread that hasn't been covered by those, but I may be missing relevant subtleties; I don't find this debate especially interesting past the first few rounds.

comment by Odis AQW (odis-aqw) · 2025-02-23T00:30:49.769Z · LW(p) · GW(p)

Based on your discussion of agent types, you seem to be aware that two-boxers (apparently) make these assumptions:
1) that your decision at game time has no causal impact on the agent type at the time of prediction
2) that your agent type at the time of prediction causes the prediction

To see why the two boxers assume the first point, consider if it were false that two boxers assume this. Then it would mean that two boxers entertain the possibility that your decision at game time has a causal impact on the agent type you display at the time of prediction. But if your decision at game time causally impacts your agent type at the time of prediction, then by assumption #2, it would mean that your decision at game time causally impacts the very thing that causes the prediction, in which case two boxers would, by their own logic, commit to one boxing, since your decision to one box would causally impact your ability to get a million dollars. But they are in fact two boxers, so they must assume the first point after all.

As for why two boxers assume the second point, I don't think I'll get much pushback on the core idea, though maybe I could word it better. Basically, as you argued in your article, two boxers believe the predictor sizes your agent type up, and makes a prediction off of that.

OK, so now the big question is whether these assumptions are sound.

It seems quite intuitive that your decision at game time has no causal impact on your agent type at the time of prediction, because to believe otherwise would mean that a decision in the future has a causal impact on your agent type in the past. But the past is unchangeable. If the past is unchangeable, then future events cannot influence past events, for if they did, then the future could cause the past to change, which would contradict the notion that the past is unchangeable.

But notice, then, that the question of whether to one box or two box boils down to the more fundamental question of whether retro-causality is possible.

Quantum mechanics suggests that it might be. However, many interpretations of the evidence have been created precisely to sidestep the implication that retro-causality is a feature of the universe. So now a two boxer has to believe those interpretations of quantum mechanics that are consistent with ruling out retro-causality. And this is totally a feasible thing to do. I just wanted to bring this up as an empirical basis outside of philosophizing to suggest that retro-causality is possible (and hence that the first assumption of two boxers is mistaken).

To really see that assumption #1 might be mistaken, imagine someone is a two-boxer agent type at the time of prediction. (By assumption #2, this means that the prediction is made accordingly to only place a total of $1000.)

Further imagine that this person who was a two boxer agent type at prediction time somehow decides to one box at game time anyway. Is it really possible to one box at game time even though you were a two boxer agent type at prediction time? I wager that, yes, this is completely possible because your agent type at game time could have evolved to be a different agent type at game time than the agent type you were at prediction time.

I mean, perhaps in the time they were mulling over which decision to make, they took the time to read some ardent one boxers' posts in these LessWrong forums, and became convinced to one box, maybe even because of this very post.

So it is possible for someone to one box despite having been, at prediction time, of the two boxing agent type.

But then, ask yourself this: if someone was the type of person who would look up information about Newcomb's problem, and furthermore have been the type of person who, upon examining the information that they looked up, would end up one boxing, then wouldn't that mean that they were of the one boxing agent type all along? Then that means that the act of choosing to one box caused them to be the type of person who was a one boxer all along!

"Not so fast," the two boxer says. The character traits that caused this person to not only seek out LessWrong forums, but also be receptive to being convinced by them, were present at prediction time, even if they hadn't been manifested yet. So that would mean that, contrary to our initial assumption, they were never truly a two boxer agent type to begin with. They always had the traits of a one-boxer-in-the-making.

And that is precisely my point. If the initially two-boxing agent-type player somehow one boxes anyway, then that decision rests on reasons that can eventually be traced back to something fundamental in their character as of prediction time. But then the very decision they make at game time reveals what agent type they were all along.

The two boxer might object by saying instead that you can't decide to do otherwise than what your agent type "initially" was. But if you believe this, then that's all the more reason to agree with me that your decision at game time reveals what agent type you were all along. But if your decision at game time reveals what agent type you were all along, then whatever decision you make ends up being who you were all along.

comment by redlizard · 2013-07-04T19:26:06.213Z · LW(p) · GW(p)

Okay, those with a two-boxing agent type don't win but the two-boxer isn't talking about agent types. They're talking about decisions. So they are interested in what aspects of the agent's winning can be attributed to their decision and they say that we can attribute the agent's winning to their decision if this is caused by their decision. This strikes me as quite a reasonable way to apportion the credit for various parts of the winning.

Do I understand it correctly that you're trying to evaluate the merits of a decision (to two-box) in isolation of the decision procedure that produced it? Because that's simply incoherent if the payoffs of the decision depend on your decision procedure.

comment by Username · 2013-07-02T04:09:27.697Z · LW(p) · GW(p)

To put it succinctly, Omega knows me far better than I know myself. I'm not going to second guess him/her.

comment by MrMind · 2013-07-01T07:56:32.106Z · LW(p) · GW(p)

The case of CDT vs Newcomb-like problems to me has a lot of similarity with the different approaches to probability theory.
In CDT you are considering only one type of information, i.e. causal dependency, to construct decision trees. This is akin to define probability as the frequency of some process, so that probability relations become causal ones. Other approaches like TDT construct decisions using causal and logical dependency, as the induction logic approach to probability does.
The Newcomb is not designed to be "unfair" to CDT, it is designed to show the limits of causal approach, exactly like calculating past sampling distribution from future extractions is a problem solvable only from the second approach (see the third chapter of Jaynes' book).

That said, we can argue about what rationality should really be: just the correct execution of whatever type of agency a system has? Or the general principle that an agent should be able to reason about whatever situation is at hand and correctly deal with it, using all the available information?
My sympathy goes to the second approach, not just because it seems to be more intuitively appealing, but also because it will be a fundamental necessity of a future AI.

comment by fubarobfusco · 2013-06-30T10:27:39.338Z · LW(p) · GW(p)

Okay, those with a two-boxing agent type don't win but the two-boxer isn't talking about agent types. They're talking about decisions.

The problem doesn't care whether you are the type of agent who talks about agent types or the type of agent who talks about decisions. The problem only cares about which actions you choose.

Replies from: Creutzer

↑ comment by Creutzer · 2013-06-30T12:48:19.697Z · LW(p) · GW(p)

The problem only cares about which actions you choose.

The problem does care about what kind of agent you are, because that's what determined Omega's prediction. It's just that kinds of agents are defined by what you (would) do in certain situations.

Replies from: None

↑ comment by [deleted] · 2013-06-30T19:14:08.320Z · LW(p) · GW(p)

Right. If you can be a one-boxer without one-boxing, that's obviously what you do. Problem is, Omega is a superintelligence and you aren't.

Replies from: Creutzer

↑ comment by Creutzer · 2013-07-01T08:30:27.459Z · LW(p) · GW(p)

I don't see how being a superintelligence would help. Even a superintelligence can't do logically impossible things: you can't be a one-boxer without one-boxing, because one-boxing is what constitutes being a one-boxer.

Replies from: None

↑ comment by [deleted] · 2013-07-01T18:54:58.292Z · LW(p) · GW(p)

Omega is just a superintelligence. Presumably, he can't see the future and he's not omniscient; so it's hypothetically possible to trick him, to make him think you'll one-box when in reality you're going to two-box.

I'm not sure if I have the vocabulary yet to solve the problem of identity vs. action, and I study philosophy, not decision theory, so for me that's a huge can of worms. (I've already had to prevent myself from connecting the attempted two-boxer distinction between 'winning' and 'rational' to Nietzsche's idea of a Hinterwelt -- but that's totally something that could be done, by someone less averse to sounding pretentious.) But I think that, attempting to leave the can closed, the distinction I drew above between one-boxing and being a one-boxer really refers to the distinction between actually one-boxing when it comes time to open the box and making Omega think you'll one-box -- which may or may not be identical to making Omega think you're the sort of person who will one-box.

And the problem I raised above is that nobody's managed to trick him yet, so by simple induction, it's not reasonable to bet a million dollars on your being able to succeed where everyone else failed. So maybe the superintelligence thing doesn't even enter into it...? (Would it make a difference if it were just a human game show, that still displayed the same results? Would anyone one-box for Omega but two-box in the game show?)

comment by anotherblackhat · 2013-07-04T17:54:07.938Z · LW(p) · GW(p)

Consider the following two mechanisms for a Newcomb-like problem.

A. T-Omega offers you the one or two box choice. You know that T-Omega used a time machine to see if you picked one or two boxes, and used that information to place/not place the million dollars.

B. C-Omega offers you the one or two box choice. You know that C-Omega is con man, that pretends great predictive powers on each planet he visits. Usually he fails, but on Earth he gets lucky. C-Omega uses a coin flip to place/not place the million dollars.

I claim the correct choice is to one box T-Omega, and two box C-Omega.

Can someone explain how it is in the “original” problem?
That is, what mechanism does the “real” Omega use for making his decision?

Replies from: shminux

↑ comment by Shmi (shminux) · 2013-07-04T18:04:09.366Z · LW(p) · GW(p)

Usually he fails, but on Earth he gets lucky. C-Omega uses a coin flip to place/not place the million dollars.

There is a contradiction here between "lucky" and "coin flip". Why does he get lucky on Earth?

Can someone explain how it is in the “original” problem?

In the original problem Omega runs a simulation of you, which is equivalent to T-Omega.

Replies from: anotherblackhat

↑ comment by anotherblackhat · 2013-07-05T06:51:57.550Z · LW(p) · GW(p)

There is a contradiction here between "lucky" and "coin flip". Why does he get lucky on Earth?

I don't see the contradiction. C-Omega tries the same con on billions and billions of planets, and it happens that out of those billions of trials, on Earth his predictions all came true.

Asking why Earth is rather like asking why Regina Jackson won the lottery - it was bound to happen somewhere, where ever that was you could ask the same question.

In the original problem Omega runs a simulation of you, which is equivalent to T-Omega.

I could not find the word "simulation" mentioned in any of the summaries nor the full restatements that are found on LessWrong, in particular Newcomb's problem. Nor was I able to find that word in the formulation as it appeared in Martin Gardner's column published in Scientific American, nor in the rec.puzzles archive. Perhaps it went by some other term?

Can you cite something that mentions simulation as the method used (or for that matter, explicitly states any method Omega uses)?

comment by Strilanc · 2013-06-30T08:45:53.104Z · LW(p) · GW(p)

[Two boxers] are interested in what aspects of the agent's winning can be attributed to their decision and they say that we can attribute the agent's winning to their decision if this is caused by their decision. This strikes me as quite a reasonable way to apportion the credit for various parts of the winning.

What do you mean by "the agent's winning can be attributed to their decision"? The agent isn't winning! Calling losing winning strikes me as a very unreasonable way to apportion credit for winning.

It would be helpful to me if you defined how you're attributing winning to decisions. Maybe taboo the words winning and decision. At the moment I really can't get my head around what you're trying to say.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T09:20:22.028Z · LW(p) · GW(p)

I was using winning to refer to something that comes in degrees.

The basic idea is that each agent ends up with a certain amount of utility (or money) and the question is which bits of this utility can you attribute to the decision. So let's say you wanted to determine how much of this utility you can attribute to the agent having blue hair. How would you do so? One possibility (that used by the two-boxer) is that you ask what causal effect the agent's blue hair had on the amount of utility received. This doesn't seem an utterly unreasonable way of determining how the utility received should be attributed to the agent's hair type.

Replies from: Strilanc

↑ comment by Strilanc · 2013-06-30T09:27:25.236Z · LW(p) · GW(p)

I still don't follow. The causal effect of two-boxing is getting 1000$ instead of 1000000$. That's bad. How are you interpreting it, so that it's good? Because they're following a rule of thumb that's right under different circumstances?

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T09:31:54.985Z · LW(p) · GW(p)

One-boxers end up with 1 000 000 utility Two-boxers end up with 1 000 utility

So everyone agrees that one-boxers are the winning agents (1 000 000 > 1 000)

The question is, how much of this utility can be attributed to the agent's decision rather than type. The two-boxer says that to answer this question we ask about what utility the agent's decision caused them to gain. So they say that we can attribute the following utility to the decisions:

One-boxing: 0 Two-boxing: 1000

And the following utility to the agent's type (there will be some double counting because of overlapping causal effects):

One-boxing type: 1 000 000 Two-boxing type: 1 000

So the proponent of two-boxing says that the winning decision is two-boxing and the winning agent type is a one-boxing type.

I'm not interpreting it so that it's good (for a start, I'm not necessarily a proponent of this view, I'm just outlining it). All I'm discussing is the two-boxer's response to the accusation that they don't win. They say they are interested not in winning agents but winning decisions and that two boxing is the winning decision (because 1000 > 0).

Replies from: Robert_Unwin, Creutzer, Strilanc, Qiaochu_Yuan

↑ comment by Robert_Unwin · 2013-06-30T20:43:55.785Z · LW(p) · GW(p)

The LW approach has focused on finding agent types that win on decision problems. Lots of the work has been in trying to formalize TDT/UDT, providing sketches of computer programs that implement these informal ideas. Having read a fair amount of the philosophy literature (including some of the recent stuff by Egan, Hare/Hedden and others), I think that this agent/program approach has been extremely fruitful. It has not only given compelling solutions to a large number of problems in the literature (Newcomb's, trivial coordination problems like Stag Hunt that CDT fails on, PD playing against a selfish copy of yourself) but it also has elucidated the deep philosophical issues that the Newcomb Problem dramatizes (concerning pre-commitment, free will / determinism and uncertainty about purely apriori/logical question). The focus on agents as programs has brought to light the intricate connection between decision making, computability and logic (esp. Godelian issues) --- something merely touched on in the philosophy literature.

These successes provide a sufficient reason to push the agent-centered approach (even if there were no compelling foundational argument that the 'decision' centered approach was incoherent). Similarly, I think there is no overwhelming foundational argument for Bayesian probability theory but philosophers should study it because of its fruitfulness in illuminating many particular issues in the philosophy of science and the foundations of statistics (not to mention its success in practical machine learning and statistics).

This response may not be very satisfying but I can only recommend the UDT posts (http://wiki.lesswrong.com/wiki/Updateless_decision_theory) and the recent MIRI paper http://intelligence.org/files/RobustCooperation.pdf.)

Rough arguments against the decision-centered approach:

Point 1

Suppose I win the lottery after playing 10 times. My decision of which numbers to pick on the last lottery was the cause of winning money. (Whereas previous decisions over numbers produced only disutility). But it's not clear there's anything interesting about this distinction. If I lost money on average, the important lesson is the failing of my agent-type (i.e. the way my decision algorithm makes decisions on lottery problems).

And yet in many practical cases that humans face, it is very useful to look back at which decisions led to high utility. If we compare different algorithms playing casino games, or compare following the advice of a poker expert vs. a newbie, we'll get useful information by looking at the utility caused by each decision. But this investigation of decisions that cause high utility is completely explainable from the agent-centered approach. When simulation and logical correlations between agents are not part of the problem, the optimal agent will make decisions that cause the most utility. UDT/TDT and variants all (afaik) act like CDT in these simple decision problems. If we came upon a Newcomb problem without being told the setup (and without any familiarity with these decision theory puzzles), we would see that the CDTer's decisions were causing utility and the EDTer's decisions were not causing any utility. The EDTer would look like lunatic with bizarrely good luck. Here we are following a local causal criterion in comparing actions. While usually fine, we would clearly be missing out on an important part of the story in the Newcomb problem.

Point 2

In AI, we want to build decision making agents that win. In life, we want to improve our decision making so that we win. Thinking about the utility caused by individual decisions may be a useful subgoal in coming up with winning agents, but it seems hard to see it as the central issue. The Newcomb problem (and the counterfactual mugging and Parfit's Hitchhiker) make clear that a local Markovian criterion (e.g. choose the action that will cause the highest utility, ignoring all previous actions/commitments) is inadequate for winning.

Point 3

The UDT one-boxer's agent type does not cause utility in the NP. However it does logically determine the utility. (More specifically, we could examine the one-boxing program as a formal system and try to isolate which rules/axioms lead to its one boxing in this type of problem). Similarly, if two people were using different sets of axioms (where one set is inconsistent), we might point to one of the axioms and say that its inclusion is what determines the inconsistency of the system. This is a mere sketch, but it might be possible to develop a local criterion by which "responsibility" for utility gains can be assigned to particular aspects of an agent.

It's clear that we can learn about good agent types by examining particular decisions. We don't have to always work with a fully specified program. (And we don't have the code of any AI that can solve decision problems the way humans can). So the more local approach may have some value.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T23:31:55.626Z · LW(p) · GW(p)

Generally agree. I think there are good arguments for focusing on decision types rather than decisions. A few comments:

Point 1: That's why rationality of decisions is evaluated in terms of expected outcome, not actual outcome. So actually, it wasn't just your agent type that was flawed here but also your decisions. But yes, I agree with the general point that agent type is important.

Point 2: Agree

Point 3: Yes. I agree that there could be ways other than causation to attribute utility to decisions and that these ways might be superior. However, I also think that the causal approach is one natural way to do this and so I think claims that the proponent of two-boxing doesn't care about winning are false. I also think it's false to say they have a twisted definition of winning. It may be false but I think it takes work to show that (I don't think they are just obviously coming up with absurd definitions of winning).

↑ comment by Creutzer · 2013-06-30T18:23:20.847Z · LW(p) · GW(p)

The question is, how much of this utility can be attributed to the agent's decision rather than type.

That's the wrong question, because it presupposes that the agent's decision and type are separable.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T23:24:02.261Z · LW(p) · GW(p)

By decision, the two-boxer means something like a proposition that the agent can make true or false at will (decisions don't need to be analysed in terms of propositions but it makes the point fairly clearly). In other words, a decision is a thing that an agent can bring about with certainty.

By agent type, in the case of Newcomb's problem, the two-boxer is just going to mean *the thing that Omega based their prediction on". Let's say the agent's brain state at the time of prediction.

Why think these are the same thing?

If these are the same thing, CDT will one-box. Given that, is there any reason to think that the LW view is best presented as requiring a new decision theory rather than as requiring a new theory of what constitutes a decision?

Replies from: Creutzer

↑ comment by Creutzer · 2013-07-01T08:24:57.638Z · LW(p) · GW(p)

Why think these are the same thing?

They are not the same thing, but they aren't independent. And they are not only causally dependent, but logically - which is why CDT intervention at the action node, leaving the agent-type node untouched, makes no sense. CDT behaves as if it were possible to be one agent type for the purpose of Omega's prediction, and then take an action corresponding to another agent type, even though that is logically impossible. CDT is unable to view its own action as predetermined, but its action is predetermined by the algorithm that is the agent. TDT can take this into account and reason with it, which is why it's such a beautiful idea.

↑ comment by Strilanc · 2013-06-30T12:49:25.198Z · LW(p) · GW(p)

In that case: the two-boxer isn't just wrong, they're double-wrong. You can't just come up with some related-but-different function ("caused gain") to maximize. The problem is about maximizing the money you receive, not "caused gain".

For example, I've seen some two-boxers justify two-boxing as a moral thing. They're willing to pay 999000$ for the benefit of throwing being predicted in the predictors face, somehow. Fundamentally, they're making the same mistake: fighting the hypothetical by saying the payoffs are different than what was stated in the problem.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T23:15:49.345Z · LW(p) · GW(p)

The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). "Caused gain" is a view about how we should attribute the gaining of money (utility) to different things.

So they agree that the problem is about maximising money (utility) and not "caused gain". But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.

Replies from: Robert_Unwin

↑ comment by Robert_Unwin · 2013-07-01T01:07:19.364Z · LW(p) · GW(p)

An additional point (discussed intelligence.org/files/TDT.pdf‎) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).

Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it's very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large 'free' cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor's infallibility. Generally, I think grasping verbal arguments doesn't "modify" humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)

EDIT: fixeds some typos.

↑ comment by Qiaochu_Yuan · 2013-06-30T09:40:25.383Z · LW(p) · GW(p)

The question is, how much of this utility can be attributed to the agent's decision rather than type.

To many two-boxers, this isn't the question. At least some two-boxing proponents in the philosophical literature seem to distinguish between winning decisions and rational decisions, the contention being that winning decisions can be contingent on something stupid about the universe. For example, you could live in a universe that specifically rewards agents who use a particular decision theory, and that says nothing about the rationality of that decision theory.

Replies from: PhilosophyStudent

↑ comment by PhilosophyStudent · 2013-06-30T09:50:30.890Z · LW(p) · GW(p)

I'm not convinced this is actually the appropriate way to interpret most two-boxers. I've read papers that say things that sound like this claim but I think the distinction that it generally being gestured at is the distinction I'm making here (with different terminology). I even think we get hints of that with the last sentence of your post where you start to talk about agent's being rewards for their decision theory rather than their decision.

comment by halcyon · 2013-07-01T07:36:22.297Z · LW(p) · GW(p)

The one problem I had with Yudkowsky's TDT paper (which I didn't read very attentively, mind you, so correct me if I'm wrong) was the part where he staged a dramatic encounter where a one-boxer was pleading with a wistful two-boxing agent who wished he was a one-boxer to change his algorithm to choose just one box. It occurred to me that even if the two-boxer listened to her, then his algorithm would have been altered by totally external factors. For the superintelligence setting up the problem to have predicted his change of mind, he would have had to simulate not a single agent, but the whole two agent system in order to predict this particular scenario correctly. Future settings might be different.

PS. One-boxers tend to make errors like this quite often, actually, illegitimately reducing the whole problem to one algorithm run by a single agent. (Actually a lot of decision theorists do that IMO. I've just been introduced to the world of decision theory by one-boxers.)

Why one-box?

Contents

99 comments