Extracting Money from Causal Decision Theorists 2021-01-28T17:58:44.129Z
Moral realism and AI alignment 2018-09-03T18:46:44.266Z
The law of effect, randomization and Newcomb’s problem 2018-02-15T15:31:56.033Z
Naturalized induction – a challenge for evidential and causal decision theory 2017-09-22T08:15:09.999Z
A survey of polls on Newcomb’s problem 2017-09-20T16:50:08.802Z
Invitation to comment on a draft on multiverse-wide cooperation via alternatives to causal decision theory (FDT/UDT/EDT/...) 2017-05-29T08:34:59.311Z
Are causal decision theorists trying to outsmart conditional probabilities? 2017-05-16T08:01:27.426Z
Publication on formalizing preference utilitarianism in physical world models 2015-09-22T16:46:54.934Z
Two-boxing, smoking and chewing gum in Medical Newcomb problems 2015-06-29T10:35:58.162Z
Request for feedback on a paper about (machine) ethics 2014-09-28T12:03:05.500Z


Comment by Caspar42 on Formalizing Objections against Surrogate Goals · 2021-09-07T20:12:56.296Z · LW · GW

Not very important, but: Despite having spent a lot of time on formalizing SPIs, I have some sympathy for a view like the following:

> Yeah, surrogate goals / SPIs are great. But if we want AI to implement them, we should mainly work on solving foundational issues in decision and game theory with an aim toward AI. If we do this, then AI will implement SPIs (or something even better) regardless of how well we understand them. And if we don't solve these issues, then it's hopeless to add SPIs manually. Furthermore, believing that surrogate goals / SPIs work (or, rather, make a big difference for bargaining outcomes) shouldn't change our behavior much (for the reasons discussed in Vojta's post).

On this view, it doesn't help substantially to understand / analyze SPIs formally.

But I think there are sufficiently many gaps in this argument to make the analysis worthwhile. For example, I think it's plausible that the effective use of SPIs hinges on subtle aspects of the design of an agent that we might not think much about if we don't understand SPIs sufficiently well.

Comment by Caspar42 on Formalizing Objections against Surrogate Goals · 2021-09-07T19:47:39.842Z · LW · GW

Great to see more work on surrogate goals/SPIs!

>Personally, the author believes that SPI might “add up to normality” --- that it will be a sort of reformulation of existing (informal) approaches used by humans, with similar benefits and limitations.

I'm a bit confused by this claim. To me it's a bit unclear what you mean by "adding up to normality". (E.g.: Are you claiming that A) humans in current-day strategic interactions shouldn't change their behavior in response to learning about SPIs (because 1) they are already using them or 2) doing things that are somehow equivalent to them)? Or are you claiming that B) they don't fundamentally change game-theoretic analysis (of any scenario/most scenarios)? Or C) are you saying they are irrelevant for AI v. AI interactions? Or D) that the invention of SPIs will not revolutionize human society, make peace in the middle east, ...) Some of the versions seem clearly false to me. (E.g., re C, even if you think that the requirements for the use of SPIs are rarely satisfied in practice, it's still easy to construct simple, somewhat plausible scenarios / assumptions (see our paper) under which SPIs do seem do matter substantially for game-theoretic analysis.) Some just aren't justified at all in your post. (E.g., re A1, you're saying that (like myself) you find this all confusing and hard to say.) And some are probably not contrary to what anyone else believes about surrogate goals / SPIs. (E.g., I don't know anyone who makes particularly broad or grandiose claims about the use of SPIs by humans.)

My other complaint is that in some places you state some claim X in a way that (to me) suggests that you think that Tobi Baumann or Vince and I (or whoever else is talking/writing about surrogate goals/SPIs) have suggested that X is false, when really Tobi, Vince and I are very much aware of X and have (although perhaps to an insufficient extent) stated X. Here are three instances of this (I think these are the only three), the first one being most significant.

The main objection of the post is that while adopting an SPI, the original players must keep a bunch of things (at least approximately) constant(/analogous to the no-SPI counterfactual) even when they have an incentive to change that thing, and they need to do this credibly (or, rather, make it credible that they aren't making any changes). You argue that this is often unrealistic. Well, the initial reaction of mine was: "Sure, I know these things!" (Relatedly: while I like the bandit v caravan example, this point can also be illustrated with any of the existing examples of SPIs and surrogate goals.) I also don't think the assumption is that unrealistic. It seems that one substantial part of your complaint is that besides instructing the representative/self-modifying the original player/principal can do other things about the threat (like advocating a ban on real or water guns). I agree that this is important. If in 20 years I instruct an AI to manage my resources, it would be problematic if in the meantime I make tons of decisions (e.g., about how to train my AI systems) differently based on my knowledge that I will use surrogate goals anyway. But it's easy to come up scenarios where this is not a problem. E.g., when an agent considers immediate self-modification, *all* her future decisions will be guided by the modified u.f. Or when the SPI is applied to some isolated interaction. When all is in the representative's hand, we only need to ensure that the *representative* always acts in whatever way the representative acts in the same way it would act in a world where SPIs aren't a thing.

And I don't think it's that difficult to come up with situations in which the latter thing can be comfortably achieved. Here is one scenario. Imagine the two of us play a particular game G with SPI G'. The way in which we play this is that we both send a lawyer to a meeting and then the lawyers play the game in some way. Then we could could mutually commit (by contract) to pay our lawyers in proportion to the utilities they obtain in G' (and to not make any additional payments to them). The lawyers at this point may know exactly what's going on (that we don't really care about water guns, and so on) -- but they are still incentivized to play the SPI game G' to the best of their ability. You might even beg your lawyer to never give in (or the like), but the lawyer is incentivized to ignore such pleas. (Obviously, there could still be various complications. If you hire the lawyer only for this specific interaction and you know how aggressive/hawkish different lawyers are (in terms of how they negotiate), you might be inclined to hire a more aggressive one with the SPI. But you might hire the lawyer you usually hire. And in practice I doubt that it'd be easy to figure out how hawkish different lawyers are.

Overall I'd have appreciated more detailed discussion of when this is realistic (or of why you think it rarely is realistic). I don't remember Tobi's posts very well, but our paper definitely doesn't spend much space on discussing these important questions.

On SPI selection, I think the point from Section 10 of our paper is quite important, especially in the kinds of games that inspired the creation of surrogate goals in the first place. I agree that in some games, the SPI selection problem is no easier than the equilibrium selection problem in the base game. But there are games where it does fundamentally change things because *any* SPI that cannot further be Pareto-improved upon drastically increases your utility from one of the outcomes.

Re the "Bargaining in SPI" section: For one, the proposal in Section 9 of our paper can still be used to eliminate the zeroes!

Also, the "Bargaining in SPI" and "SPI Selection" sections to me don't really seem like "objections". They are limitations. (In a similar way as "the small pox vaccine doesn't cure cancer" is useful info but not an objection to the small pox vaccine.)

Comment by Caspar42 on Can you control the past? · 2021-08-29T20:40:31.100Z · LW · GW

Nice post! As you can probably imagine, I agree with most of the stuff here.

>VII. Identity crises are no defense of CDT

On 1 and 2: This is true, but I'm not sure one-boxing / EDT alone solves this problem. I haven't thought much about selfish agents in general, though.

Random references that might be of interest:

>V. Monopoly money

As far as I can tell, this kind of point was first made on p. 108 here:

Gardner, Martin (1973). “Free will revisited, with a mind-bending prediction paradox by William Newcomb”. In: Scientific American 229.1, pp. 104–109.


>the so-called “Tickle Defense” of EDT.

I have my own introduction to the tickle defense, aimed more at people in this community than at philosophers:

>Finally, consider a version of Newcomb’s problem in which both boxes are transparent

There's a lot of discussion of this in the philosophical literature. From what I can tell, the case was first proposed in Sect. 10 of:

Gibbard, Allan and William L. Harper (1981). “Counterfactuals and Two Kinds of Expected Utility”. In: Ifs. Conditionals, Belief, Decision, Chance and Time. Ed. by William L. Harper, Robert Stalnaker, and Glenn Pearce. Vol. 15. The University of Western Ontario Series in Philosophy of Science. A Series of Books in Philosophy of Science, Methodology, Epistemology, Logic, History of Science, and Related Fields. Springer, pp. 153–190. doi: 10.1007/978-94-009-9117-0_8

>There is a certain broad class of decision theories, a number of which are associated with the Machine Intelligence Research Institute (MIRI), that put resolving this type of inconsistency in favor of something like “the policy you would’ve wanted to adopt” at center stage.

Another academic, early discussion of updatelessness is:

Gauthier, David (1989). “In the Neighbourhood of the Newcomb-Predictor (Reflections on Rationality)”. In: Proceedings of the Aristotelian Society, New Series, 1988–1989. Vol. 89, pp. 179–194.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-02-12T17:06:45.140Z · LW · GW

Sorry for taking some time to reply!

>You might wonder why am I spouting a bunch of wrong things in an unsuccessful attempt to attack your paper.

Nah, I'm a frequent spouter of wrong things myself, so I'm not too surprised when other people make errors, especially when the stakes are low, etc.

Re 1,2: I guess a lot of this comes down to convention. People have found that one can productively discuss these things without always giving the formal models (in part because people in the field know how to translate everything into formal models). That said, if you want mathematical models of CDT and Newcomb-like decision problems, you can check the Savage or Jeffrey-Bolker formalizations. See, for example, the first few chapters of Arif Ahmed's book, "Evidence, Decision and Causality". Similarly, people in decision theory (and game theory) usually don't specify what is common knowledge, because usually it is assumed (implicitly) that the entire problem description is common knowledge / known to the agent (Buyer). (Since this is decision and not game theory, it's not quite clear what "common knowledge" means. But presumably to achieve 75% accuracy on the prediction, the seller needs to know that the buyer understands the problem...)

3: Yeah, *there exist* agent models under which everything becomes inconsistent, though IMO this just shows these agent models to be unimplementable. For example, take the problem description from my previous reply (where Seller just runs an exact copy of Buyer's source code). Now assume that Buyer knows his source code and is logically omniscient. Then Buyer knows what his source code chooses and therefore knows the option that Seller is 75% likely to predict. So he will take the other option. But of course, this is a contradiction. As you'll know, this is a pretty typical logical paradox of self-reference. But to me it just says that this logical omniscience assumption about the buyer is implausible and that we should consider agents who aren't logically omniscient. Fortunately, CDT doesn't assume knowledge of its own source code and such.

Perhaps one thing to help sell the plausibility of this working: For the purpose of the paper, the assumption that Buyer uses CDT in this scenario is pretty weak, formally simple and doesn't have much to do with logic. It just says that the Buyer assigns some probability distribution over box states (i.e., some distribution over the mutually exclusive and collectively exhaustive s1="money only in box 1", s2= "money only in box 2", s3="money in both boxes"); and that given such distribution, Buyer takes an action that maximizes (causal) expected utility. So you could forget agents for a second and just prove the formal claim that for all probability distributions over three states s1, s2, s3, it is for i=1 or i=2 (or both) the case that
(P(si)+P(s3))*$3 - $1 > 0.
I assume you don't find this strange/risky in terms of contradictions, but mathematically speaking, nothing more is really going on in the basic scenario.

The idea is that everyone agrees (hopefully) that orthodox CDT satisfies the assumption. (I.e., assigns some unconditional distribution, etc.) Of course, many CDTers would claim that CDT satisfies some *additional* assumptions, such as the probabilities being calibrated or "correct" in some other sense. But of course, if "A=>B", then "A and C => B". So adding assumptions cannot help the CDTer avoid the loss of money conclusion if they also accept the more basic assumptions. Of course, *some* added assumptions lead to contradictions. But that just means that they cannot be satisfied in the circumstances of this scenario if the more basic assumption is satisfied and if the premises of the Adversarial Offer help. So they would have to either adopt some non-orthodox CDT that doesn't satisfy the basic assumption or require that their agents cannot be copied/predicted. (Both of which I also discuss in the paper.)

>you assume that Buyer knows the probabilities that Seller assigned to Buyer's actions.

No, if this were the case, then I think you would indeed get contradictions, as you outline. So Buyer does *not* know what Seller's prediction is. (He only knows her prediction is 75% accurate.) If Buyer uses CDT, then of course he assigns some (unconditional) probabilities to what the predictions are, but of course the problem description implies that these predictions aren't particularly good. (Like: if he assigns 90% to the money in box 1, then it immediately follows that *no* money is in box 1.)

Comment by Caspar42 on How to formalize predictors · 2021-02-02T16:50:21.128Z · LW · GW

As I mentioned elsewhere, I don't really understand...

>I think (1) is a poor formalization, because the game tree becomes unreasonably huge

What game tree? Why represent these decision problems as any kind of trees or game trees in particular? At least some problems of this type can be represented efficiently, using various methods to represent functions on the unit simplex (including decision trees)... Also: Is this decision-theoretically relevant? That is, are you saying, a good decision theory doesn't have to deal with 1 because it is cumbersome to write out (some) problems of this type? But *why* is this decision-theoretically relevant?

>some strategies of the predictor (like "fill the box unless the probability of two-boxing is exactly 1") leave no optimal strategy for the player.

Well, there are less radical ways of addressing this. E.g., expected utility-type theories just assign a preference order to the set of available actions. We could be content with that and accept that in some cases, there is no optimal action. As long as our decision theory ranks the available options in the right order... Or we could restrict attention to problems where an optimal strategy exists despite this dependence.

>And (3) seems like a poor formalization because it makes the predictor work too hard. Now it must predict all possible sources of randomness you might use, not just your internal decision-making.

For this reason, I always assume that predictors in my Newcomb-like problems are compensated appropriately and don't work on weekends! Seriously, though: what does "too hard" mean here? Is this just the point that it is in practice easy to construct agents that cannot be realistically predicted in this way when they don't want to be predicted? If so: I find that at least somewhat convincing, though I'd still be interested in developing theory that doesn't hinge on this ability.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-02-02T16:42:33.669Z · LW · GW

On the more philosophical points. My position is perhaps similar to Daniel K's. But anyway...

Of course, I agree that problems that punish the agent for using a particular theory (or using float multiplication or feeling a little wistful or stuff like that) are "unfair"/"don't lead to interesting theory". (Perhaps more precisely, I don't think our theory needs to give algorithms that perform optimally in such problems in the way I want my theory to "perform optimally" Newcomb's problem. Maybe we should still expect our theory to say something about them, in the way that causal decision theorists feel like CDT has interesting/important/correct things to say about Newcomb's problem, despite Newcomb's problem being designed to (unfairly, as they allege) reward non-CDT agents.)

But I don't think these are particularly similar to problems with predictions of the agent's distribution over actions. The distribution over actions is behavioral, whereas performing floating point operations or whatever is not. When randomization is allowed, the subject of your choice is which distribution over actions you play. So to me, which distribution over actions you choose in a problem with randomization allowed, is just like the question of which action you take when randomization is not allowed. (Of course, if you randomize to determine which action's expected utility to calculate first, but this doesn't affect what you do in the end, then I'm fine with not allowing this to affect your utility, because it isn't behavioral.)

I also don't think this leads to uninteresting decision theory. But I don't know how to argue for this here, other than by saying that CDT, EDT, UDT, etc. don't really care whether they choose from/rank a set of distributions or a set of three discrete actions. I think ratificationism-type concepts are the only ones that break when allowing discontinuous dependence on the chosen distribution and I don't find these very plausible anyway.

To be honest, I don't understand the arguments against predicting distributions and predicting actions that you give in that post. I'll write a comment on this to that post.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-02-02T16:30:21.797Z · LW · GW

Let's start with the technical question:

>Can your argument be extended to this case?

No, I don't think so. Take the class of problems. The agent can pick any distribution over actions. The final payoff is determined only as a function of the implemented action and some finite number of samples generated by Omega from that distribution. Note that the expectation is continuous in the distribution chosen. It can therefore be shown (using e.g. Kakutani's fixed-point theorem) that there is always at least one ratifiable distribution. See Theorem 3 at .

(Note that the above is assuming the agent maximizes expected vNM utility. If, e.g., the agent maximizes some lexical utility function, then the predictor can just take, say, two samples and if they differ use a punishment that is of a higher lexicality than the other rewards in the problem.)

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T20:03:45.998Z · LW · GW

Excellent - we should ask THEM about it.

Yes, that's the plan.

Some papers that express support for CDT:

In case you just want to know why I believe support for CDT/two-boxing to be wide-spread among academic philosophers, see , which is a survey of academic philosophers, where more people preferred two-boxing than one-boxing in Newcomb's problem, especially among philosophers with relevant expertise. Some philosophers have filled out this survey publicly, so you can e.g. go to , click on a name and then on "My Philosophical Views" to find individuals who endorse two-boxing. (I think there's also a way to download the raw data and thereby get a list of two-boxers.)

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T18:41:18.286Z · LW · GW

Note that while people on this forum mostly reject orthodox, two-boxing CDT, many academic philosophers favor CDT. I doubt that they would view this problem as out of CDT's scope, since it's pretty similar to Newcomb's problem.

How does this CDT agent reconcile a belief that the seller's prediction likelihood is different from the buyer's success likelihood?

Good question!

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T18:36:02.024Z · LW · GW

I agree with both of Daniel Kokotajlo's points (both of which we also make in the paper in Sections IV.1 and IV.2): Certainly for humans it's normal to not be able to randomize; and even if it was a primarily hypothetical situation without any obvious practical application, I'd still be interested in knowing how to deal with the absence of the ability to randomize.

Besides, as noted in my other comment insisting on the ability to randomize doesn't get you that far (cf. Sections IV.1 and IV.4 on Ratificationism): even if you always have access to some nuclear decay noise channel, your choice of whether to consult that channel (or of whether to factor the noise into your decision) is still deterministic. So you can set up scenarios where if you are punished for randomizing. In the particular case of the Adversarial Offer, the seller might remove all money from both boxes if she predicts the buyer to randomize.

The reason why our main scenario just assumes that randomization isn't possible is that our target of attack in this paper is primarily CDT, which is fine with not being allowed to randomize.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T18:18:27.116Z · LW · GW

I think some people may have their pet theories which they call CDT and which require randomization. But CDT as it is usually/traditionally described doesn't ever insist on randomizing (unless randomizing has a positive causal effect). In this particular case, even if a randomization device were made available, CDT would either uniquely favor one of the boxes or be indifferent between all distributions over . Compare Section IV.1 of the paper.

What you're referring to are probably so-called ratificationist variants of CDT. These would indeed require randomizing 50-50 between the two boxes. But one can easily construct scenarios which trip these theories up. For example, the seller could put no money in any box if she predicts that the buyer will randomize. Then no distribution is ratifiable. See Section IV.4 for a discussion of Ratificationism.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T18:08:56.201Z · LW · GW

Yeah, basically standard game theory doesn't really have anything to say about the scenarios of the paper, because they don't fit the usual game-theoretical models.

By the way, the paper has some discussion of what happens if you insist on having access to an unpredictable randomization device, see Sections IV.1 and the discussion of Ratificationism in Section IV.4. (The latter may be of particular interest because Ratificationism is somewhat similar to Nash equilibrium. Unfortunately, the section doesn't explain Ratificationism in detail.)

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T17:42:32.263Z · LW · GW

>I think information "seller's prediction is accurate with probability 0,75" is supposed to be common knowledge.

Yes, correct!

>Is it even possible for a non-trivial probabilistic prediction to be a common knowledge? Like, not as in some real-life situation, but as in this condition not being logical contradiction? I am not a specialist on this subject, but it looks like a logical contradiction. And you can prove absolutely anything if your premise contains contradiction.

Why would it be a logical contradiction? Do you think Newcomb's problem also requires a logical contradiction? Note that in neither of these cases does the predictor tell the agent the result of a prediction about the agent.

>What kinds of mistakes does seller make?

For the purpose of the paper it doesn't really matter what beliefs anyone has about how the errors are distributed. But you could imagine that the buyer is some piece of computer code and that the seller has an identical copy of that code. To make a prediction, the seller runs the code. Then she flips a coin twice. If the coin does not come up Tails twice, she just uses that prediction and fills the boxes accordingly. If the coin does come up Tails twice, she uses a third coin flip to determine whether to (falsely) predict one of the two other options that the agent can choose from. And then you get the 0.75, 0.125, 0.125 distribution you describe. And you could assume that this is common knowledge.

Of course, for the exact CDT expected utilities, it does matter how the errors are distributed. If the errors are primarily "None" predictions, then the boxes should be expected to contain more money and the CDT expected utilities of buying will be higher. But for the exploitation scheme, it's enough to show that the CDT expected utilities of buying are strictly positive.

>When you write "$1−P (money in Bi | buyer chooses Bi ) · $3 = $1 − 0.25 · $3 = $0.25.", you assume that P(money in Bi | buyer chooses Bi )=0.75.

I assume you mean that I assume P(money in Bi | buyer chooses Bi )=0.25? Yes, I assume this, although really I assume that the seller's prediction is accurate with probability 0.75 and that she fills the boxes according to the specified procedure. From this, it then follows that P(money in Bi | buyer chooses Bi )=0.25.

>That is, if buyer chooses the first box, seller can't possibly think that buyer will choose none of the boxes.

I don't assume this / I don't see how this would follow from anything I assume. Remember that if the seller predicts the buyer to choose no box, both boxes will be filled. So even if all false predictions would be "None" predictions (when the buyer buys a box), then it would still be P(money in Bi | buyer chooses Bi )=0.25.

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T17:16:12.363Z · LW · GW

>Then the CDTheorist reasons:

>(1-0.75) = .25

>.25*3 = .75

>.75 - 1 = -.25

>'Therefore I should not buy a box - I expect to lose (expected) money by doing so.'

Well, that's not how CDT as it is typically specified reasons about this decision. The expected value 0.25*3=0.75 is the EDT expected amount of money in box  for both  and . That is, it is the expected content of box , conditional on taking . But when CDT assigns an expected utility to taking box  it doesn't condition on taking . Instead, because it cannot causally affect how much money is in box , it uses its unconditional estimate of how much is in box . As I outlined in the post, this must be at least .

Comment by Caspar42 on Extracting Money from Causal Decision Theorists · 2021-01-29T17:08:11.641Z · LW · GW

>If I win I get $6. If I lose, I get $5.

I assume you meant to write: "If I lose, I lose $5."

Yes, these are basically equivalent. (I even mention rock-paper-scissors bots in a footnote.)

Comment by Caspar42 on Predictors exist: CDT going bonkers... forever · 2021-01-28T17:05:17.153Z · LW · GW

Apologies, I only saw your comment just now! Yes, I agree, CDT never strictly prefers randomizing. So there are agents who abide by CDT and never randomize. As our scenarios show, these agents are exploitable. However, there could also be CDT agents who, when indifferent between some set of actions (and when randomization is not associated with any cost), do randomize (and choose the probability according to some additional theory -- for example, you could have the decision procedure: "follow CDT, but when indifferent between multiple actions, choose a distribution over these actions that is ratifiable".). The updated version of our paper -- which has now been published Open Access in The Philosophical Quarterly -- actually contains some extra discussion of this in Section IV.1, starting with the paragraph "Nonetheless, what happens if we grant the buyer in Adversarial Offer access to a randomisation device...".

Comment by Caspar42 on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy · 2020-09-11T22:52:49.385Z · LW · GW

Sorry for taking an eternity to reply (again).

On the first point: Good point! I've now finally fixed the SSA probabilities so that they sum up to 1, which really they should, to really have a version of EDT.

>prevents coordination between agents making different observations.

Yeah, coordination between different observations is definitely not optimal in this case. But I don't see an EDT way of doing it well. After all, there are cases where given one observation, you prefer one policy and given another observation you favor another policy. So I think you need the ex ante perspective to get consistent preferences over entire policies.

>(Oh, I ignored the splitting up of probabilities of trajectories into SSA probabilities and then adding them back up again, which may have some intuitive appeal but ends up being just a null operation. Does anyone see a significance to that part?)

The only significance is to get a version of EDT, which we would traditionally assume to have self-locating beliefs. From a purely mathematical point of view, I think it's nonsense.

Comment by Caspar42 on Predictors exist: CDT going bonkers... forever · 2020-01-27T15:50:35.764Z · LW · GW

>Caspar Oesterheld and Vince Conitzer are also doing something like this

That paper can be found at . And yes, it is structurally essentially the same as the problem in the post.

Comment by Caspar42 on Pavlov Generalizes · 2019-05-17T00:16:04.817Z · LW · GW

Not super important but maybe worth mentioning in the context of generalizing Pavlov: the strategy Pavlov for the iterated PD can be seen as an extremely shortsighted version of the law of effect, which basically says: repeat actions that have worked well in the past (in similar situations). Of course, the LoE can be applied in a wide range of settings. For example, in their reinforcement learning textbook, Sutton and Barto write that LoE underlies all of (model-free) RL.

Comment by Caspar42 on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy · 2019-01-16T23:54:51.593Z · LW · GW

Elsewhere, I illustrate this result for the absent-minded driver.

Comment by Caspar42 on CDT=EDT=UDT · 2019-01-16T23:50:26.486Z · LW · GW

> I tried to understand Caspar’s EDT+SSA but was unable to figure it out. Can someone show how to apply it to an example like the AMD to help illustrate it?

Sorry about that! I'll try to explain it some more. Let's take the original AMD. Here, the agent only faces a single type of choice -- whether to EXIT or CONTINUE. Hence, in place of a policy we can just condition on when computing our SSA probabilities. Now, when using EDT+SSA, we assign probabilities to being a specific instance in a specific possible history of the world. For example, we assign probabilities of the form , which denotes the probability that given I choose to CONTINUE with probability , history (a.k.a. CONTINUE, EXIT) is actual and that I am the instance intersection (i.e., the first intersection). Since we're using SSA, these probabilities are computed as follows:

That is, we first compute the probability that the history itself is actual (given ). Then we multiply it by the probability that within that history I am the instance at , which is just 1 divided by the number of instances of myself in that history, i.e. 2.

Now, the expected value according to EDT + SSA given can be computed by just summing over all possible situations, i.e. over all combinations of a history and a position within that history and multiplying the probability of that situation with the utility given that situation:

And that's exactly the ex ante expected value (or UDT-expected value, I suppose) of continuing with probability . Hence, EDT+SSA's recommendation in AMD is the ex ante optimal policy (or UDT's recommendation, I suppose). This realization is not original to myself (though I came up with it independently in collaboration with Johannes Treutlein) -- the following papers make the same point:

  • Rachael Briggs (2010): Putting a value on Beauty. In Tamar Szabo Gendler and John Hawthorne, editors, Oxford Studies in Epistemology: Volume 3, pages 3–34. Oxford University Press, 2010.
  • Wolfgang Schwarz (2015): Lost memories and useless coins: revisiting the absentminded driver. In: Synthese.

My comment generalizes these results a bit to include cases in which the agent faces multiple different decisions.

Comment by Caspar42 on Dutch-Booking CDT · 2019-01-16T22:37:51.098Z · LW · GW
Caspar Oesterheld is working on similar ideas.

For anyone who's interested, Abram here refers to my work with Vincent Conitzer which we write about here.

ETA: This work has now been published in The Philosophical Quarterly.

Comment by Caspar42 on Reflexive Oracles and superrationality: prisoner's dilemma · 2018-11-26T00:28:57.072Z · LW · GW

My paper "Robust program equilibrium" (published in Theory and Decision) discusses essentially NicerBot (under the name ϵGroundedFairBot) and mentions Jessica's comment in footnote 3. More generally, the paper takes strategies from iterated games and transfers them into programs for the corresponding program game. As one example, tit for tat in the iterated prisoner's dilemma gives rise to NicerBot in the "open-source prisoner's dilemma".

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2018-05-29T14:48:23.686Z · LW · GW

Link to relevant agent foundations forum comment

Comment by Caspar42 on Idea: OpenAI Gym environments where the AI is a part of the environment · 2018-04-14T18:27:26.760Z · LW · GW

I list some relevant discussions of the "anvil problem" etc. here. In particular, Soares and Fallenstein (2014) seem to have implemented an environment in which such problems can be modeled.

Comment by Caspar42 on Announcement: AI alignment prize winners and next round · 2018-03-28T12:19:44.214Z · LW · GW

For this round I submit the following entries on decision theory:

Robust Program Equilibrium (paper)

The law of effect, randomization and Newcomb’s problem (blog post) (I think James Bell's comment on this post makes an important point.)

A proof that every ex-ante-optimal policy is an EDT+SSA policy in memoryless POMPDs (IAFF comment) (though see my own comment to that comment for a caveat to that result)

Comment by Caspar42 on Causal Universes · 2018-02-16T17:09:26.439Z · LW · GW

(RobbBB seems to refer to what philosophers call the B-theory of time, whereas CronoDAS seems to refer to the A-theory of time.)

Comment by Caspar42 on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy · 2018-02-11T19:10:10.000Z · LW · GW

Since Briggs [1] shows that EDT+SSA and CDT+SIA are both ex-ante-optimal policies in some class of cases, one might wonder whether the result of this post transfers to EDT+SSA. I.e., in memoryless POMDPs, is every (ex ante) optimal policy also consistent with EDT+SSA in a similar sense. I think it is, as I will try to show below.

Given some existing policy , EDT+SSA recommends that upon receiving observation we should choose an action from (For notational simplicity, I'll assume that policies are deterministic, but, of course, actions may encode probability distributions.) Here, if and otherwise. is the SSA probability of being in state of the environment trajectory given the observation and the fact that one uses the policy .

The SSA probability is zero if and otherwise. Here, is the number of times occurs in . Note that this is the minimal reference class version of SSA, also known as the double-halfer rule (because it assigns 1/2 probability to tails in the Sleeping Beauty problem and sticks with 1/2 if it's told that it's Monday). is the (regular, non-anthropic) probability of the sequence of states , given that is played and is observed at least once. If (as in the sum above) is observed at least once in , we can rewrite this as Importantly, note that is constant in , i.e., the probability that you observe at least once cannot (in the present setting) depend on what you would do when you observe .

Inserting this into the above, we get where the first sum on the right-hand side is over all histories that give rise to observation at some point. Dividing by the number of agents with observation in a history and setting the policy for all agents at the same time cancel each other out, such that this equals Obviously, any optimal policy chooses in agreement with this. But the same disclaimers apply; if there are multiple observations, then multiple policies might satisfy the right-hand side of this equation and not all of these are optimal.

[1] Rachael Briggs (2010): Putting a value on Beauty. In Tamar Szabo Gendler and John Hawthorne, editors, Oxford Studies in Epistemology: Volume 3, pages 3–34. Oxford University Press, 2010.

Comment by Caspar42 on In memoryless Cartesian environments, every UDT policy is a CDT+SIA policy · 2018-02-11T19:09:11.000Z · LW · GW

Caveat: The version of EDT provided above only takes dependences between instances of EDT making the same observation into account. Other dependences are possible because different decision situations may be completely "isomorphic"/symmetric even if the observations are different. It turns out that the result is not valid once one takes such dependences into account, as shown by Conitzer [2]. I propose a possible solution in . Roughly speaking, my solution is to identify with all objects in the world that are perfectly correlated with you. However, the underlying motivation is unrelated to Conitzer's example.

[2] Vincent Conitzer: A Dutch Book against Sleeping Beauties Who Are Evidential Decision Theorists. Synthese, Volume 192, Issue 9, pp. 2887-2899, October 2015.

Comment by Caspar42 on Prisoner's dilemma tournament results · 2018-02-02T16:03:03.186Z · LW · GW

I tried to run this with racket and #lang scheme (as well as #lang racket) but didn't get it to work (though I didn't try for very long), perhaps because of backward compatibility issues. This is a bit unfortunate because it makes it harder for people interested in this topic to profit from the results and submitted programs of this tournament. Maybe you or Alex could write a brief description of how one could get the program tournament to run?

Comment by Caspar42 on The Absent-Minded Driver · 2018-01-26T11:13:44.499Z · LW · GW

I wonder what people here think about the resolution proposed by Schwarz (2014). His analysis is that the divergence from the optimal policy also goes away if one combines EDT with the halfer position a.k.a. the self-sampling assumption, which, as shown by Briggs (2010), appears to be the right anthropic view to combine with EDT, anyway.

Comment by Caspar42 on A model I use when making plans to reduce AI x-risk · 2018-01-20T10:14:15.591Z · LW · GW

I think this is a good overview, but most of the views proposed here seem contentious and the arguments given in support shouldn't suffice to change the mind of anyone who has thought about these questions for a bit or who is aware of the disagreements about them within the community.

Getting alignment right accounts for most of the variance in whether an AGI system will be positive for humanity.

If your values differ from those of the average human, then this may not be true/relevant. E.g., I would guess that for a utilitarian current average human values are worse than, e.g., 90% "paperclipping values" and 10% classical utilitarianism.

Also, if gains from trade between value systems are big, then a lot of value may come from ensuring that the AI engages in acausal trade ( ). This is doubly persuasive if you already see your own policies as determining what agents with similar decision theories but different values do elsewhere in the universe. (See, e.g., section 4.6.3 of "Multiverse-wide Cooperation via Correlated Decision Making".)

Given timeline uncertainty, it's best to spend marginal effort on plans that assume / work in shorter timelines.
Stated simply: If you don't know when AGI is coming, you should make sure alignment gets solved in worlds where AGI comes soon.

I guess the question is what "soon" means. I agree with the argument provided in the quote. But there are also some arguments to work on longer timelines, e.g.:

  • If it's hard and most value comes from full alignment, then why even try to optimize for very short timelines?
  • Similarly, there is a "social" difficulty of getting people in AI to notice your (or the AI safety community's) work. Even if you think you could write down within a month a recipe for increasing the probability of AI being aligned by a significant amount, you would probably need much more than a month to make it significantly more likely to get people to consider applying your recipe.

It seems obvious that most people shouldn't think too much about extremely short timelines (<2 years) or the longest plausible timelines (>300 years). So, these arguments together probably point to something in the middle of these and the question is where. Of course, it also depends on one's beliefs about AI timelines.

To me it seems that the concrete recommendations (aside from the "do AI safety things") don't have anything to do with the background assumptions.

As one datapoint, fields like computer science, engineering and mathematics seem to make a lot more progress than ones like macroeconomics, political theory, and international relations.

For one, "citation needed". But also: the alternative to doing technical AI safety work isn't to do research in politics but to do political activism (or lobbying or whatever), i.e. to influence government policy.

As your "technical rather than political" point currently stands, it's applicable to any problem, but it is obviously invalid at this level of generality. To argue plausibly that technical work on AI safety is more important than AI strategy (which is plausibly true), you'd have to refer to some specifics of the problems related to AI.

Comment by Caspar42 on Prediction Markets are Confounded - Implications for the feasibility of Futarchy · 2018-01-16T14:20:22.938Z · LW · GW

The issue with this example (and many similar ones) is that to decide between interventions on a variable X from the outside, EDT needs an additional node representing that outside intervention, whereas Pearl-CDT can simply do(X) without the need for an additional variable. If you do add these variables, then conditioning on that variable is the same as intervening on the thing that the variable intervenes on. (Cf. section 3.2.2 "Interventions as variables" in Pearl's Causality.)

Comment by Caspar42 on Niceness Stealth-Bombing · 2018-01-14T11:48:39.364Z · LW · GW

This advice is very similar to Part, 1, ch. 3; Part 3, ch. 5; Part 4, ch. 1, 6 in Dale Carnegie's classic How to Win Friends and Influence People.

Comment by Caspar42 on Writing Down Conversations · 2017-12-31T08:50:52.069Z · LW · GW

Another classic on this topic by a community member is Brian Tomasik's Turn Discussions Into Blog Posts.

Comment by Caspar42 on The expected value of the long-term future · 2017-12-30T23:40:40.615Z · LW · GW

I looked at the version 2017-12-30 10:48:11Z.

Overall, I think it's a nice, systematic overview. Below are some comments.

I should note that I'm not very expert on these things. This is also why the additional literature I mention is mostly weakly related stuff from FRI, the organization I work for. Sorry about that.

An abstract would be nice.

Locators in the citations would be useful, i.e. "Beckstead (2013, sect. XYZ)" instead of just "Beckstead (2013)" when you talk about some specific section of the Beckstead paper. (Cf. section “Pageless Documentation” of the humurous Academic Citation Practice: A Sinking Sheep? by Ole Bjørn Rekdal.)

>from a totalist, consequentialist, and welfarist (but not necessarily utilitarian) point of view

I don't think much of your analysis assumes welfarism (as I understand it)? Q_w could easily denote things other than welfare (e.g., how virtue ethical, free, productive, autonomous, natural, the mean person is), right? (I guess some of the discussion sections are fairly welfarist, i.e. they talk about suffering, etc., rather than freedom and so forth.)

>an existential risk as one where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.

Maybe some people would interpret this definition as excluding some of the "shrieks" and "whimpers", since in some of them, "humanity's potential is realized" in that it colonizes space, but not in accordance with, e.g., the reader's values. Anyway, I think this definition is essentially a quote from Bostrom (maybe use quotation marks?), so it's alright.

>The first is the probability P of reaching time t.

Maybe say more about why you separate N_w(t) (in the continuous model) into P(t) and N(t)?

I also don't quite understand whether equation 1 is intended as the expected value of the future or as the expected value of a set of futures w that all have the same N_w(t) and Q_w(t). The problem is that if it's the expected value of the future, I don't get how you can simplify something like

into the right side of your equation 1. (E.g., you can't just let N(t) and Q(t) denote expected numbers of moral patients and expected mean qualities of life, because the mean qualities in larger worlds ought to count for more, right?)

I suspect that when reading the start of sect. 3.1, a lot of readers will wonder whether you endorse all the assumptions underlying your model of P(t). In particular, I would guess that people would disagree with the following two assumptions:

-> Short term x-risk reduction (r_1) doesn't have any effect on long-term risk (r). Perhaps this is true for some fairly specific work on preventing extinction but it seems less likely for interventions like building up the UN (to avoid all kinds of conflict, coordinate against risks, etc.).

-> Long-term extinction risk is constant. I haven't thought much about these issues but I would guess that extinction risk becomes much lower, once there is a self-sustaining colony on Mars.

Reading further, I see that you address these in sections 3.2 and 3.3. Maybe you could mention/refer to these somewhere near the start of sect. 3.1.

On page 3, you say that the derivative of -P(t) w.r.t. r_1 denotes the value of reducing r_1 by one unit. This is true in this case because P(t) is linear in r_1. But in general, the value of reducing r_1 by one unit is just P(t,r_1-1)-P(t,r_1), right?

Is equation 3, combined with the view that the cost of one unit of f1 is constant, consistent with Ord's "A plausible model would be that it is roughly as difficult to halve the risk per century, regardless of its starting probability, and more generally, that it is equally difficult to reduce it by some proportion regardless of its absolute value beforehand."? With your model, it looks like bringing f_1 from 0 to 0.5 and thus halfing r_1 is just as expensive as bringing f_1 from 0.5 to 1.

On p. 7, "not to far off" -- probably you mean "too"?

>For example, perhaps we will inevitably develop some hypothetical weapons that give so large an advantage to offence over defence that civilisation is certain to be destroyed.

AI risk is another black ball that will become more accessible. But maybe you would rather not model it as extinction. At least AI risk doesn't necessarily explain the Fermi paradox and AIs may create sentient beings.

>Ord argues that we may be able to expect future generations to be more interested in risk reduction, implying increasing f_i

I thought f_i was meant to model the impact that we can have on r_i? So, to me it seems more sensible to model the involvement of future generations, to the extent that we can't influence it, as a "a kind of event E" (as you propose) or, more generally, as implying that the non-intervention risk levels r_i decrease.

>This would only reinforce the case for extinction risk reduction.

It seems that future generations caring about ERR makes short-term ERR more important (because the long-term future is longer and thus can contain more value). But it makes long-term ERR less important, because future generations will, e.g., do AI safety research anyway. (In section "Future resources" of my blog post Complications in evaluating neglectedness, I make the general point that for evaluating the neglectedness of an intervention, one has to look at how many resources future generations will invest into that intervention.)

>There is one case in which it clearly is not: if space colonisation is in fact likely to involve risk-independent islands. Then high population goes with low risk, increasing the value of the future relative to the basic model

(I find risk-independent islands fairly plausible.)

>The expected number of people who will live in period t is

You introduced N(t) as the number of morally relevant beings (rather than "people").

>However, this increase in population may be due to stop soon,

Although it is well-known that some predict population to stagnate at 9 billion or so, a high-quality citation would be nice.

>The likelihood of space colonisation, a high-profile issue on which billions of dollars is spent per year (Masters, 2015), also seems relatively hard to affect. Extinction risk reduction, on the other hand, is relatively neglected (Bostrom, 2013; Todd, 2017), so it could be easier to achieve progress in this area.

I have only briefly (in part due to the lack of locators) checked the two sources, but it seems that this varies strongly between different extinction risks. For instance, according to Todd (2017), >300bn (and thus much more than on space colonization) is spent on climate change, 1-10bn on nuclear security, 1bn on extreme pandemic prevention. So, overall much more money goes into extinction risk reduction than into space colonization. (This is not too surprising. People don't want to die, and they don't want their children or grandchildren to die. They don't care nearly as much about whether some elite group of people will live on Mars in 50 years.)

Of course, there a lot of complications to this neglectedness analysis. (All three points I discuss in Complications in evaluating neglectedness seem to apply.)

>Some people believe that it’s nearly impossible to have a consistent impact on Q(t) so far into the future.

Probably a reference would be good. I guess to the extent that we can't affect far future Q(t), we also can't affect far future r_i.

>However, this individual may be biased against ending things, for instance because of the survival instinct, and so could individuals or groups in the future. The extent of this bias is an open question.

It's also a bit unclear (at least based on hat you write) what legitimizes calling this a bias, rather than simply a revealed preference not to die (even in cases in which you or I as outside observers might think it to be preferable not to live) and thus evidence that their lives are positive. Probably one has to argue via status quo bias or sth like that.

>We may further speculate that if the future is controlled by altruistic values, even powerless persons are likely to have lives worth living. If society is highly knowledgeable and technologically sophisticated, and decisions are made altruistically, it’s plausible that many sources of suffering would eventually be removed, and no new ones created unnecessarily. Selfish values, on the other hand, do not care about the suffering of powerless sentients.

This makes things sound a more binary than they actually are. (I'm sure you're aware of this.) In the usual sense of the word, people could be "altruistic" but in a non-consequentialist way. There may be lots of suffering in such worlds. (E.g., some libertarians may be regard intervening in the economy as unethical even if companies start creating slaves. A socialist, on the other hand, may view capitalism as fundamentally unjust, try to regulate/control the economy and thus cause a lot of poverty.) Also, even if someone is altruistic in a fairly consequentialist way, they may still not care about all beings that you/I/the reader cares about. E.g., economists tend to be consequentialists but rarely consider animal welfare.

I think for the animal suffering (both wild animals and factory farming) it is worth noting that it seems fairly unlikely that this will be economically efficient in the long term, but that the general underlying principles (Darwinian suffering and exploiting the powerless) might carry over to other beings (like sentient AIs).

Another way in which the future may be negative would be the Malthusian trap btw. (Of course, some would regard at least some Malthusian trap scenarios as positive, see, e.g., Robin Hanson's The Age of Em.) Presumably this belongs to 5.2.1, since it's a kind of coordination failure.

As you say, I think the option value argument isn't super persuasive, because it seems unlikely that the people in power in a million years share my (meta-)values (or agree with the way I do compromise).

Re 5.2.3: Another relevant reference on why one should cooperate -- which is somewhat separate from the point that if mutual cooperation works out the gains from trade are great -- is Brian Tomasik's Reasons to Be Nice to Other Value Systems.

>One way to increase Q(t) is to advocate for positive value changes in the direction of greater consideration for powerless sentients, or to promote moral enhancement (Persson and Savulescu, 2008). Another approach might be to work to improve political stability and coordination, making conflict less likely as well as increasing the chance that moral progress continues.


Comment by Caspar42 on Announcing the AI Alignment Prize · 2017-12-21T16:10:18.655Z · LW · GW

You don't mention decision theory in your list of topics, but I guess it doesn't hurt to try.

I have thought a bit about what one might call the "implementation problem of decision theory". Let's say you believe that some theory of rational decision making, e.g., evidential or updateless decision theory, is the right one for an AI to use. How would you design an AI to behave in accordance with such a normative theory? Conversely, if you just go ahead and build a system in some existing framework, how would that AI behave in Newcomb-like problems?

There are two pieces that I uploaded/finished on this topic in November and December. The first is a blog post noting that futarchy-type architectures would, per default, implement evidential decision theory. The second is a draft titled "Approval-directed agency and the decision theory of Newcomb-like problems".

For anyone who's interested in this topic, here are some other related papers and blog posts:

So far, my research and the papers by others I linked have focused on classic Newcomb-like problems. One could also discuss how existing AI paradigms related to other issues of naturalized agency, in particular self-locating beliefs and naturalized induction, though here it seems more as though existing frameworks just lead to really messy behavior.

Send comments to firstnameDOTlastnameATfoundational-researchDOTorg. (Of course, you can also comment here or send you a LW PM.)

Comment by Caspar42 on Superintelligence via whole brain emulation · 2017-12-17T08:49:34.764Z · LW · GW

I wrote a summary of Hansons's The Age of Em, in which I focus on the bits of information that may be policy-relevant for effective altruists. For instance, I summarize what Hanson says about em values and also have a section about AI safety.

Comment by Caspar42 on Intellectual Hipsters and Meta-Contrarianism · 2017-11-11T08:56:34.630Z · LW · GW

Great post, obviously.

You argue that signaling often leads to distribution of intellectual positions following this pattern: in favor of X with simple arguments / in favor of Y with complex arguments / in favor of something like X with simple arguments

I think it’s worth noting that the pattern of position often looks different. For example, there is: in favor of X with simple arguments / in favor of Y with complex arguments / in favor of something like X with surprising and even more sophisticated and hard-to-understand arguments

In fact, I think many of your examples follow the latter pattern. For example, the market efficiency arguments in favor of libertarianism seem harder-to-understand and more sophisticated than most arguments for liberalism. Maybe it fits your pattern better if libertarianism is justified purely on the basis of expert opinion.

Similarly, the justification for the “meta-contrarian” position in "don't care about Africa / give aid to Africa / don't give aid to Africa" is more sophisticated than the reasons for the contrarian or naive positions.

But as has been pointed out, along with the gigantic cost, death does have a few small benefits. It lowers overpopulation, it allows the new generation to develop free from interference by their elders, it provides motivation to get things done quickly.

I’m not sure whether the overpopulation is a good example. I think in many circles that point would signal naivety and people would respond by something deep-sounding about how life is sacred. (The same is true for “it’s good if old people die because that saves money and allows the government to build more schools”.) Here, too, I would argue that your pattern doesn’t quite describe the set of commonly held positions, as it omits the naive pro-death position.

Comment by Caspar42 on Are causal decision theorists trying to outsmart conditional probabilities? · 2017-10-06T16:00:55.732Z · LW · GW

I agree that in situations where A only has outgoing arrows, p(s | do(a)) = p(s | a), but this class of situations is not the "Newcomb-like" situations.

What I meant to say is that the situations where A only has outgoing arrows are all not Newcomb-like.

Maybe we just disagree on what "Newcomb-like" means? To me what makes a situation "Newcomb-like" is your decision algorithm influencing the world through something other than your decision (as happens in the Newcomb problem via Omega's prediction). In smoking lesion, this does not happen, your decision algorithm only influences the world via your action, so it's not "Newcomb-like" to me.

Ah, okay. Yes, in that case, it seems to be only a terminological dispute. As I say in the post, I would define Newcomb-like-ness via a disagreement between EDT and CDT which can mean either that they disagree about what the right decision is, or, more naturally, that their probabilities diverge. (In the latter case, the statement you commented on is true by definition and in the former case it is false for the reason I mentioned in my first reply.) So, I would view the Smoking lesion as a Newcomb-like problem (ignoring the tickle defense).

Comment by Caspar42 on Principia Compat. The potential Importance of Multiverse Theory · 2017-10-06T14:59:01.055Z · LW · GW

Yes, the paper is relatively recent, but in May I published a talk on the same topic. I also asked on LW whether someone would be interested in giving feedback a month or so before actually the paper.

Do you think your proof/argument is also relevant for my multiverse-wide superrationality proposal?

Comment by Caspar42 on Are causal decision theorists trying to outsmart conditional probabilities? · 2017-10-06T14:53:44.219Z · LW · GW

So, the class of situations in which p(s | do(a)) = p(s | a) that I was alluding to is the one in which A has only outgoing arrows (or all the values of A’s predecessors are known). (I guess this could be generalized to: p(s | do(a)) = p(s | a) if A d-separates its predecessors from S?) (Presumably this stuff follows from Rule 2 of Theorem 3.4.1 in Causality.)

All problems in which you intervene in an isolated system from the outside are of this kind and so EDT and CDT make the same recommendations for intervening in a system from the outside. (That’s similar to the point that Pearl makes in section 3.2.2 of Causality: You can model the do-interventions by adding action nodes without predecessors and conditioning on these action nodes.)

The Smoking lesion is an example of a Newcomb-like problem where A has an inbound arrow that leads p(s | do(a)) and p(s | a) to differ. (That said, I think the smoking lesion does not actually work as a Newcomb-like problem, see e.g. chapter 4 of Arif Ahmed’s Evidence, Decision and Causality.)

Similarly, you could model Newcomb’s problem by introducing a logical node as a predecessor of your decision and the result of the prediction. (If you locate “yourself” in the logical node and the logical node does not have any predecessors, then CDT and EDT agree again.)

Of course, in the real world, all problems are in theory Newcomb-like because there are always some ingoing arrows into your decision. But in practice, most problems are nearly non-Newomb-like because, although there may be an unblocked path from my action to the value of my utility function, that path is usually too long/complicated to be useful. E.g., if I raise my hand now, that would mean that the state of the world 1 year ago was such that I raise my hand now. And the world state 1 year ago causes how much utility I have. But unless I’m in Arif Ahmed’s “Betting on the Past”, I don’t know which class of world states 1 year ago (the ones that lead to me raising my hand or the ones that cause me not to raise my hand) causes me to have more utility. So, EDT couldn't try to exploit that way of changing the past.

Comment by Caspar42 on A survey of polls on Newcomb’s problem · 2017-09-28T07:15:02.673Z · LW · GW

I claim that one-boxers do not believe b and c are possible because Omega is cheating or a perfect predictor (same thing)

Note that Omega isn't necessarily a perfect predictor. Most one-boxers would also one-box if Omega is a near-perfect predictor.

Aside from "lizard man", what are the other reasons that lead to two-boxing?

I think I could pass an intellectual Turing test (the main arguments in either direction aren't very sophisticated), but maybe it's easiest to just read, e.g., p. 151ff. of James Joyce's The Foundations of Causal Decision Theory and note how Joyce understands the problem in pretty much the same way that a one-boxer would.

In particular, Joyce agrees that causal decision theorists would want to self-modify to become one-boxers. (I have heard many two-boxers admit to this.) This doesn't make sense if they don't believe in Omega's prediction abilities.

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-28T06:58:23.514Z · LW · GW

I hadn’t seen these particular discussions, although I was aware of the fact that UDT and other logical decision theories avoid building phenomenological bridges in this way. I also knew that others (e.g., the MIRI people) were aware of this.

I didn't know you preferred a purely evidential variant of UDT. Thanks for the clarification!

As for the differences between LZEDT and UDT:

  • My understanding was that there is no full formal specification of UDT. The counterfactuals seem to be given by some unspecified mathematical intuition module. LZEDT, on the other hand, seems easy to specify formally (assuming a solution to naturalized induction). (That said, if UDT is just the updateless-evidentialist flavor of logical decision theory, it should be easy to specify as well. I haven’t seen people UDT characterize in this way, but perhaps this is because MIRI’s conception of UDT differs from yours?)
  • LZEDT isn’t logically updateless.
  • LZEDT doesn’t do explicit optimization of policies. (Explicit policy optimization is the difference between UDT1.1 and UDT1.0, right?)

(Based on a comment you made on an earlier past post of mine, it seems that UDT and LZEDT reason similarly about medical Newcomb problems.)

Anyway, my reason for writing this isn’t so much that LZEDT differs from other decision theories. (As I say in the post, I actually think LZEDT is equivalent to the most natural evidentialist logical decision theory — which has been considered by MIRI at least.) Instead, it’s that I have a different motivation for proposing it. My understanding is that the LWers’ search for new decision theories was not driven by the BPB issue (although some of the motivations you listed in 2012 are related to it). Instead it seems that people abandoned EDT — the most obvious approach — mainly for reasons that I don’t endorse. E.g., the TDT paper seems to give medical Newcomb problems as the main argument against EDT. It may well be that looking beyond EDT to avoid naturalized induction/BPB leads to the same decision theories as these other motivations.

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-25T23:55:56.610Z · LW · GW

Yes, I share the impression that the BPB problem implies some amount of decision theory relativism. That said, one could argue that decision theories cannot be objectively correct, anyway. In most areas, statements can only be justified relative to some foundation. Probability assignments are correct relative to a prior, the truth of theorems depends on axioms, and whether you should take some action depends on your goals (or meta-goals). Priors, axioms, and goals themselves, on the other hand, cannot be justified (unless you have some meta-priors, meta-axioms, etc., but I think the chain as to end at some point, see ). Perhaps decision theories are similar to priors, axioms and terminal values?

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-23T06:06:41.620Z · LW · GW

No, I actually mean that world 2 doesn't exist. In this experiment, the agent believes that either world 1 or world 2 is actual and that they cannot be actual at the same time. So, if the agent thinks that it is in world 1, world 2 doesn't exist.

Comment by Caspar42 on Are causal decision theorists trying to outsmart conditional probabilities? · 2017-09-23T04:01:51.370Z · LW · GW

(Sorry again for being slow to reply to this one.)

"Note that in non-Newcomb-like situations, P(s|do(a)) and P(s|a) yield the same result, see ch. 3.2.2 of Pearl’s Causality."

This is trivially not true.

Is this because I define "Newcomb-ness" via disagreement about the best action between EDT and CDT in the second paragraph? Of course, the d(P(s|do(a)),P(s|a)) could be so small that EDT and CDT agree on what action to take. They could even differ in such a way that CDT-EV and EDT-EV are the same.

But it seems that instead of comparing the argmaxes or the EVs, one could also use the term Newcomb-ness on the basis of the probabilities themselves. Or is there some deeper reason why the sentence is false?

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-23T03:52:29.501Z · LW · GW

I apologize for not replying to your earlier comment. I do engage with comments a lot. E.g., I recall that your comment on that post contained a link to a ~1h talk that I watched after reading it. There are many obvious reasons that sometimes cause me not reply to comments, e.g. if I don't feel like I have anything interesting to say, or if the comment indicates lack of interest in discussion (e.g., your "I am not actually here, but ... Ok, disappearing again"). Anyway, I will reply your comment now. Sorry again for not doing so earlier.

Comment by Caspar42 on Naturalized induction – a challenge for evidential and causal decision theory · 2017-09-22T20:57:59.260Z · LW · GW

I just remembered that in Naive TDT, Bayes nets, and counterfactual mugging, Stuart Armstrong made the point that it shouldn't matter whether you are simulated (in a way that you might be the simulation) or just predicted (in such a way that you don't believe that you could be the simulation).

Comment by Caspar42 on Splitting Decision Theories · 2017-09-22T20:45:27.038Z · LW · GW

Interesting post! :)

I think the process is hard to formalize because specifying step 2 seems to require specifying a decision theory almost directly. Recall that causal decision theorists argue that two-boxing is the right choice in Newcomb’s problem. Similarly, some would argue that not giving the money in counterfactual mugging is the right choice from the perspective of the agent who already knows that it lost, whereas others argue for the opposite. Or take a look at the comments on the Two-Boxing Gene. Generally, the kind of decision problems that put decision theories to a serious test also tend to be ones in which it is non-obvious what the right choice is. The same applies to meta-principles. Perhaps people agree with the vNM axioms, but desiderata that could shed a light on Newcomblike problems appear to be more controversial. For example, irrelevance of impossible outcomes and reflective stability both seem desirable but actually contradict each other.

TL;DR: It seems to be really hard to specify what it means for a decision procedure to "win"/fail in a given thought experiment.