# Timeless Decision Theory and Meta-Circular Decision Theory

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-08-20T22:07:46.662Z · score: 27 (29 votes) · LW · GW · Legacy · 37 comments(This started as a reply to Gary Drescher's comment here in which he proposes a Metacircular Decision Theory (MCDT); but it got way too long so I turned it into an article, which also contains some amplifications on TDT which may be of general interest.)

*Part 1:* How timeless decision theory does under the sort of problems that Metacircular Decision Theory talks about.

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action--specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent's repertoire of actions is A1, ...An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: "Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7." But that inference, while sound, contradicts the fact that A6's value is 6.

This is why timeless decision theory is a causality-based decision theory. I don't recall if you've indicated that you've studied Pearl's synthesis of Bayesian networks and causal graphs(?) (though if not you should be able to come up to speed on them pretty quickly).

So in the (standard) formalism of causality - just causality, never mind decision theory as yet - causal graphs give us a way to formally compute counterfactuals: We set the value of a particular node *surgically*. This means we *delete *the structural equations that would ordinarily give us the value at the node N_i as a function of the parent values P_i and the background uncertainty U_i at that node (which U_i must be uncorrelated to all other U, or the causal graph has not been fully factored). We delete this structural equation for N_i and make N_i parentless, so we don't send any likelihood messages up to the former parents when we update our knowledge of the value at N_i. However, we do send prior-messages from N_i to all of *its* descendants, maintaining the structural equations for the children of which N_i is a parent, and their children, and so on.

That's the standard way of computing counterfactuals in the Pearl/Spirtes/Verma synthesis of causality, as found in "Causality: Models, Reasoning, and Inference" and "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference".

Classical causal decision theory says that your expected utility formula is over the *counterfactual *expectation of your *physical* act. Now, although the CDTs I've read have *not *in fact talked about Pearl - perhaps because it's a relatively recent mathematical technology, or perhaps because I last looked into the literature a few years back - and have just taken the counterfactual distribution as intuitively obvious mana rained from heaven - nonetheless it's pretty clear that their intuitions are operating pretty much the Pearlian way, via counterfactual surgery on the physical act.

So in *calculating *the "expected utility" of an act - the computation that classical CDT uses to *choose *an action - CDT assumes the act to be *severed from its physical causal parents*. Let's say that there's a Smoking Lesion problem, where the same gene causes a taste for cigarettes and an increased probability of cancer. Seeing someone else smoke, we would infer that they have an increased probability of cancer - this sends a likelihood-message upward to the node which represents the probability of having the gene, and this node in turns sends a prior-message downward to the node which represents the probability of getting cancer. But the counterfactual surgery that CDT performs on its physical acts, means that it calculates the expected utility as though the physical act is severed from its parent nodes. So CDT calculates the expected utility as though it has the base-rate probability of having the cancer gene regardless of its act, and so chooses to smoke, since it likes cigarettes. This is the common-sense and reflectively consistent action, so CDT appears to "win" here in terms of giving the winning answer - but it's worth noting that the *internal *calculation performed is *wrong*; if you act to smoke cigarettes, your probability of getting cancer is *not *the base rate.

And on Newcomb's Problem this internal error comes out into the open; the inside of CDT's counterfactual expected utility calculation, expects box B to contain a million dollars at the base rate, since it surgically severs the act of taking both boxes from the parent variable of your source code, which correlates to your previous source code at the moment Omega observed it, which correlates to Omega's decision whether to leave box B empty.

Now turn to timeless decision theory, in which the (Godelian diagonal) expected utility formula is written as follows:

Argmax[A in Actions] in Sum[O in Outcomes](Utility(O)*P(

this computationyields A []-> O|rest of universe))

The interior of this formula performs counterfactual surgery to sever the *logical output *of the expected utility formula, from the *initial conditions *of the expected utility formula. So we do *not *conclude, *in the inside of the formula as it performs the counterfactual surgery*, that if-counterfactually A_6 is chosen over A_7 then A_6 must have higher expected utility. If-evidentially A_6 is chosen over A_7, then A_6 has higher expected utility - but this is not what the interior of the formula computes. As we *compute* the formula, the logical output is divorced from all parents; we cannot infer anything about its immediately logical precedents. This counterfactual surgery may be *necessary*, in fact, to stop an infinite regress in the formula, as it tries to model its own output in order to decide its own output; and this, arguably, is exactly *why *the decision counterfactual has the form it does - it is *why *we have to talk about counterfactual surgery within decisions in the first place.

*Descendants *of the logical output, however, continue to update their values within the counterfactual, which is why TDT one-boxes on Newcomb's Problem - both your current self's physical act, and Omega's physical act in the past, are logical-causal *descendants *of the computation, and are recalculated accordingly inside the counterfactual.

If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the *fixed initial conditions *of the computation - the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells's "tickle".) If you can't observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

So from the perspective of Metacircular Decision Theory - what is done with various facts - timeless decision theory can state very definitely how it treats the various facts, within the interior of its expected utility calculation. It does not *update *any physical or logical parent of the logical output - rather, it *conditions* on the initial state of the computation, in order to screen off outside influences; then no further inferences about them are made. And if you already know anything about the consequences of your logical output - its descendants in the logical causal graph - you will *re*compute what they *would have been* if you'd had a different output.

This last codicil is important for cases like Parfit's Hitchhiker, in which Omega (or perhaps Paul Ekman), driving a car through the desert, comes across yourself dying of thirst, and will give you a ride to the city only if they expect you to pay them $100 *after* you arrive in the city. (With the whole scenario being trued by strict selfishness, no knock-on effects, and so on.) There is, of course, no way of forcing the agreement - so will you compute, *in the city*, that it is *better for you* to give $100 to Omega, after having *already *been saved? Both evidential decision theory and causal decision theory will give the losing (dying in the desert, hence reflectively inconsistent) answer here; but TDT answers, "*If I had decided not to pay,* then Omega *would have* left me in the desert." So the expected utility of not paying $100 remains lower, *even after you arrive in the city,* given the way TDT computes its counterfactuals inside the formula - which is the dynamically and reflectively consistent and winning answer.. And note that this answer is arrived at in one natural step, without needing explicit reflection, let alone precommitment - you will answer this way even if the car-driver Omega made its prediction without you being aware of it, so long as Omega can credibly establish that it was predicting you with reasonable accuracy rather than making a pure uncorrelated guess. (And since it's not a very complicated calculation, Omega knowing that you are a timeless decision theorist is credible enough.)

I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation.

This is where one would refer to the omitted extended argument about a calculator on Mars and a calculator on Venus, where both calculators were manufactured at the same factory on Earth and observed before being transported to Mars and Venus. If we manufactured two envelopes on Earth, containing the same letter, and transported them to Mars and Venus without observing them, then indeed the contents of the two envelopes would be correlated in our probability distribution, even though the Mars-envelope is not a cause of the Venus-envelope, nor the Venus-envelope a cause of the Mars-envelope, because they have a common cause in the background. But if we *observe *the common cause - look at the message as it is written, before being Xeroxed and placed into the two envelopes - then the standard theory of causality *requires *that our remaining uncertainty about the two envelopes be *uncorrelated*; we have observed the common cause and screened it off. If N_i is not a cause of N_j or vice versa, and you *know *the state of all the common ancestors A_ij of N_i and N_j, and you do *not *know the state of any mutual descendants D_ij of N_i and N_j, then the standard rules of causal graphs (D-separation) show that your probabilities at N_i and N_j must be independent.

However, if you manufacture on Earth two calculators both set to calculate 123 * 456, and you have not yet performed this calculation in your head, then you can *observe completely the physical state of the two calculators* before they leave Earth, and yet still have *correlated *uncertainty about what result will flash on the screen on Mars and the screen on Venus. So this situation is simply *not *compatible with the mathematical axioms on causal graphs if you draw a causal graph in which the only common ancestor of the two calculators is the physical factory that made them and produced their correlated initial state. If you are to preserve the rules of causal graphs at all, you must have an additional node - which would logically seem to represent one's logical uncertainty about the abstract computation 123 * 456 - which is the parent of both calculators. Seeing the Venusian calculator flash the result 56,088, this physical event sends a likelihood-message to its parent node representing the logical result of 123 * 456, which sends a prior-message to its child node, the physical message flashed on the screen at Mars.

A similar argument shows that if we have completely observed our own *initial *source code, and perhaps observed Omega's *initial* source code which contains a copy of our source code and the intention to simulate it, but we do not yet know our own decision, then the only way in which our uncertainty about our own physical act can possibly be correlated *at all *with Omega's past act to fill or leave empty the box B - given that neither act physically causes the other - is if there is some common ancestor node unobserved; and having already seen that our causal graph must include logical uncertainty if it is to stay factored, we can (must?) interpret this unobserved common node as the logical output of the known expected utility calculation.

From this, I would argue, TDT follows. But of course it's going to be difficult to exhibit an algorithm that computes this - guessing unknown causal networks is an extremely difficult problem in machine learning, and only small such networks can be learned. In general, determining the causal structure of reality is AI-complete. And by interjecting logical uncertainty into the problem, we really are heading far beyond the causal networks that known machine algorithms can *learn.* But it *is* the case that if you rely on humans to learn the causal algorithm, then it is pretty clear that the Newcomb's Problem setup, if it is to be analyzed in causal terms at all, must have nodes corresponding to logical uncertainty, on pain of violating the axioms governing causal graphs. Furthermore, in being told that Omega's leaving box B full or empty correlates to our *decision* to take only one box or both boxes, *and* that Omega's act lies in the past, *and* that Omega's act is not directly influencing us, *and *that we have not found any other property which would screen off this uncertainty even when we inspect our own source code / psychology in advance of knowing our actual decision, *and* that our computation is the only *direct* ancestor of our logical output, then we're being told in unambiguous terms (I think) to make our own physical act and Omega's act a common descendant of the unknown logical output of our known computation. (A counterexample in the form of another causal graph compatible with the same data is welcome.) And of course we could make the problem very clear by letting the agent be a computer program and letting Omega have a copy of the source code with superior computing power, in which case the logical interpretation is very clear.

So these are the facts which TDT takes into account, and the facts which it ignores. The Nesov-Dai updateless decision theory is even stranger - as far as I can make out, it ignores *all* facts except for the fact about which inputs have been received by the logical version of the computation it implements. If combined with TDT, we would interpret UDT as having a never-updated weighting on all possible universes, and a causal structure (causal graph, presumably) on those universes. Any given logical computation in UDT will count all instantiations of itself in all universes which have received exactly the same inputs - even if those instantiations are being imagined by Omega in universes which UDT would ordinarily be interpreted as "known to be logically inconsistent", like universes in which the third decimal digit of pi is 3. Then UDT calculates the counterfactual consequences, weighted across all imagined universes, using its causal graphs on each of those universes, of setting the logical act to A_i. Then it maximizes on A_i.

I would ask if, applying Metacircular Decision Theory from a "common-sense human base level", you see any case in which additional facts should be taken into account, or other facts ignored, apart from those facts used by TDT (UDT). If not, and if TDT (UDT) are reflectively consistent, then TDT (UDT) is the fixed point of MCDT starting from a human baseline decision theory. Of course this can't actually be the case because TDT (UDT) are incomplete with respect to the open problems cited earlier, like logical ordering of moves, and choice of conditional strategies in response to conditional strategies. But it would be the way I'd pose the problem to you, Gary Drescher - MCDT is an interesting way of looking things, but I'm still trying to wrap my mind around it.

*Part 2: Metacircular Decision Theory as reflection criterion.*

MCDT's proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It's ok if that's intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

...In other words, metacircular consistency isn't just a

testthat we'd like the decision theory to pass. Metacircular consistencyisthe theory; itisthe algorithm.

But it looks to me like MCDT has to start from some particular base theory, and different base theories may have different fixed points (or conceivably, cycles). In which case we can't yet call MCDT itself a complete theory specification. When you talk about which facts *would* be wise to take into account, or ignore, (or recompute counterfactually even if they already have known values?), then you're imagining different source codes (or MCDT specifications?) that an agent could have; and calculating the benefits of adopting these different source codes, relative to the way the *current *base theory computes "adopting" and "benefit"

For example, if you start with CDT and apply MCDT at 7am, it looks to me like "use TDT (UDT) for all cases where my source code has a physical effect after 7am, and use CDT for all cases where the source code had a physical effect before 7am or a correlation stemming from common ancestry" is a reflectively stable fixed point of MCDT. Whenever CDT asks "*What if* I took into account these different facts?", it will say, "But Omega would not be physically affected by my self-modification, so clearly it can't benefit me in any way." If the MCDT criterion is to be applied in a different and intuitively appealing way that has only one fixed point (up to different utility functions) then this would establish MCDT as a good candidate for *the* decision theory, but right now it does look to me like *a* reflective consistency test. But maybe this is because I haven't yet wrapped my mind around the MCDT's fact-treatment-based decomposition of decision theories, or because you've already specified further mandatory structure in the base theory how the *effect of* ignoring or taking into account some particular fact is to be computed.

## 37 comments

Comments sorted by top scores.

Thanks, Eliezer--that's a clear explanation of an elegant theory. So far, TDT (I haven't looked carefully at UDT) strikes me as more promising than any other decision theory I'm aware of (including my own efforts, past and pending). Congratulations are in order!

I agree, of course, that TDT doesn't make the A6/A7 mistake. That was just a simple illustration of the need, in counterfactual reasoning (broadly construed), to specify somehow what to hold fixed and what not to, and that different ways of doing so specify different senses of counterfactual inference (i.e., that there are different kinds of 'if-counterfactually'). If counterfactual inference is construed a la Pearl, for example, then such inferences (causal-counterfactual) correspond to causal links (if-causally).

As you say, TDT's utility formula doesn't perform general logical inferences (or evidential-counterfactual inferences) from the antecedents it evaluates (i.e. the candidate outputs of the Platonic computation). Rather, the utility formula performs causal-counterfactual inferences from the set of nodes that designate the outputs of the Platonic computation, in all places where that Platonic computation is approximately physically instantiated.

However, it seems to me we can, if we wish, use TDT to define what we can call a TDT-counterfactual that tells us would be true 'if-timelessly' a particular physical agent's particular physical action were to occur. In particular, whereas CDT says that what would be true (if-causally) consists of what's causally downstream from that action, TDT says that what would be true (if-timelessly) consists of what's causally downstream from the output of the suitably-specified Platonic computation that the particular physical agent approximately implements, and also what's causally downstream from that same Platonic computation in all other places where that computation is approximately physically instantiated. (And the physical TDT agent argmaxes over the utilities of the TDT-counterfactual consequences of that agent's candidate actions.)

I think there are a few reasons we might sometimes find it useful to think in terms of the TDT-counterfactual consequences of a physical agent's actions, rather than directly in terms of the standard TDT formulation (even though they're merely two different ways of expressing the same decision theory, unless I've misunderstood).

The TDT-counterfactual perspective places TDT in a common framework with other decision theories that (implicitly or explicitly) use other kinds of counterfactual reasoning, starting with a physical agent's action as the antecedent. Then we can apply some meta-criterion to ask which of those alternative theories is correct, and why. (That was the intuition behind my MCDT proposal, although MCDT itself was hastily specified and too simpleminded to be correct.)

Plausibly, people are agents who think in terms of the counterfactual consequences of an action, rather than being hardwired to use TDT. If we are to choose to act in accordance with TDT from now on (or, equivalently, if we are to build AIs who act in accordance with TDT), we need to be persuaded that doing so is for the best (even if e.g. a Newcomb snapshot was already taken before we became persuaded). (I'm assuming here that our extant choice machinery allows us the flexibility to be persuaded about what sort of counterfactual to use; if not, alas, we can't necessarily get there from here).

In the standard formulation of TDT, you effectively view yourself as an abstract computation with one or more approximate physical instantiations, and you ask what you (thus construed) cause (i.e. what follows causal-counterfactually). In the alternative formulation, I view myself as a particular physical agent that is among one or more approximate instantiations of an abstract computation, and I ask what follows TDT-counterfactually from what I (thus construed) choose.

The original formulation seems to require a precommitment to identify oneself with all instantiations (in the causal net) of the abstract computation (or at least seems to require that in order for us non-TDT agents to decide to emulate TDT). And that identification is indeed plausible in the case of fairly exact replication. But consider, say, a 1-shot PD game between Eliezer and me. Our mutual understanding of reflexive consistency would let us win. And I agree that we both approximately instantiate, at some level of abstraction, a common decision computation, which is what lets the TDT framework apply and lets us both win.

But (in contrast with an exact-simulation case) that common computation is at a level of abstraction that does not preserve our respective personal identities. (That's kind of the point of the abstraction. My utility function for the game places value on Gary's points and not Eliezer's points; the common abstract computation lacks that bias.) So I would hesitate to identify either of us with the common abstraction. (And I see in other comments that Eliezer explicitly agrees.) Rather, I'd like to reason that if-timelessly I, Gary, choose 'Cooperate', then so does Eliezer. That way, "I am you as you are me" emerges as a (metaphorical) conclusion about the situation (we each have a choice about the other's action in the game, and are effectively acting together) rather than being needed as the point of departure.

Again, the foregoing is just an alternative but equivalent (unless I've erred) way of viewing TDT, an alternative that may be useful for some purposes.

[In TDT] If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the fixed initial conditions of the computation - the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells's "tickle".) If you can't observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

Consider a different scenario where people with and without the gene both desire to smoke, but the gene makes that desire stronger, and the stronger it is, the more likely one is to smoke. Even when you observe your own utility function, you don't necessarily have a clue whether the utility assigned to smoking is the level caused by the gene or else by the gene's absence. So your observation of your utility function doesn't necessarily help you to move away from the base-level probability of having cancer here.

Does

Argmax[A in Actions] in Sum[O in Outcomes] (Utility(O)*P(this computation yields A []-> O|rest of universe))

evaluate to:

def Tdt(Actions,Outcomes):

currentMax = 0

output = null

for A in Actions:

sum = 0

for O in Outcomes:

sum += U(O)*P(O|Tdt(Actions,Outcomes) == A and background knowledge)

if sum >= currentMax:

currentMax = sum

output = A

return output

Or am I missing some subtly? I am assuming that "P" and "U" have been defined elsewhere, and that python can deal with referencing the outcome of a computation inside itself before it has been completed (or at least that the probability function halts when given a yet to be computed function evaluating to a certain output as its inout statement). (edit): couldn't get the tabs to work, it's supposed to be pseudo python, but it's probably just as readable. Is there a way to type set tabs in the comments?

From the "Comment formatting" page on the wiki:

To make a paragraph where your indentation is preserved and no characters are treated specially, precede each line with (at least) four spaces. This is commonly used for computer program source code.

Quick stupid question: what does "A[]->O" stand for? specifically "[]->"? Is that a material implication? Should I read it as "P(O|A is the output of this computation and rest of universe)"?

(edit): could someone please help with this?

I suck at symbolic logic or computer logic or whatever so I'm commenting in the hope that someone else sees my comment and answers your question.

It means "If A were true, then O would be true." Note that this is a counterfactual statement.

I could be wrong about this, but I believe the arrow is intended to indicate a functional mapping, and the [] is some noise about types. So: The probability that (this computation yields a lambda mapping A to O) given (rest of universe).

It would be nice if someone weighed in with something more definitive. Various reference materials, along with search tools such as Google and Symbolhound, are not particularly helpful.

Bravo Eliezer! The material is extremely crispy, and I have never seen anyone who can explain technical material as well as you.

Quick question, please!

if we have completely observed our own initial source code, and perhaps observed Omega's initial source code which contains a copy of our source code and the intention to simulate it, but we do not yet know our own decision, then the only way in which our uncertainty about our own physical act can possibly be correlated at all with Omega's past act to fill or leave empty the box B - given that neither act physically causes the other - is if there is some common ancestor node unobserved;

Identifying this common ancestor node as the logical output of the expected-utility calculation is what you referred to earlier as Godelian diagonalization; is it not?

No, the Godelian diagonal is the self-replicating recipe you use to have the computation talk about itself when it says "my own result". See p.3 of here.

Bravo Eliezer! The material is extremely crispy

Really? I thought I was frantically blurting out a huge blog-comment response that I didn't really have time to edit all that well.

Seconding Richard's comment. You seem hesistant to explain technical things here for fear of being imprecise, but you're actually very good at explaining yourself and many of the folks here can fill in the gaps.

Really? I thought I was frantically blurting out a huge blog-comment response that I didn't really have time to edit all that well.

I was taking the signs that the response was blurted out quickly into account in my evaluation of skill level.

Maybe I should have put a "probably" in my statement. Certainly you are particularly good at explaining technical material to *me*.

Eli, you are doing an amazing good job of putting Pearl's calculus into a verbal form, but I can't help feeling that this would be clearer if you had a few graphs. Do you have tools that would let you draw the causal diagrams? Why not use them? Is it that the move from Pearl's causal calculus to TDT is hard to express in the graph notation? I still think, in that case, that the causal surgery part of the argument would be clearer in Pearl's notation.

Do you have tools that would let you draw the causal diagrams?

No. Do you have recommendations?

I looked through a paper of Pearl's to see what causal diagrams look like, and what I saw seemed like a good match for Graphviz. I noticed that Shalizi used it for many of the diagrams in his thesis too.

Graphviz is the LaTeX of graph-drawing tools. You'll get professional-looking output immediately, but the customization options aren't as discoverable as they would be in a visual editor.

If you plan on making lots of graphs or want them to look very pretty, I'd recommend it. If you're just looking for a quick way to draw a graph or two explaining TDT vs. CDT it may not be worth the time relative to a generic (vector) drawing program.

(The Python bindings might make things marginally easier if you know Python and don't want to learn more syntax.)

I'm think you're exaggerating how difficult it is to use graphviz for simple things by comparing it to LaTeX. Consider this diagram in the gallery and look at how trivially simple the source file that generates that image is.

I don't disagree that doing complex things can be difficult, but for graphs that consist of a handful of nodes and edges with assorted labels, and some boxes to group nodes together, it's hard to beat graphviz.

If you're under Windows, Microsoft Visio will do just fine. Also, there are tools like Smartdraw and Gliffy, but I don't have any experience with them.

I use OmniGraffle for such things on a Mac. Many people seem happy with the drawing packages in their word processor or presentation program, though. The advantage of an object based editing program is that you can keep arrows connected as you drag things around.

As a graphics doofus, I found Inkscape relatively easy to pick up the basics. But honestly, even a MS Paint/GNU Paint diagram would be better than nothing.

Gary Drescher wrote: "Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, ..."

I was under the impression that the relevant logicians (e.g. Anderson, Belnap, Dunn, Meyer) had solved this problem (of having to avoid irrelevant contradictions) decisively. Instead, EY uses the gadgetry of surgery on causal Bayesian networks to address this. Is there a sense in which relevant logics are doing screening and/or surgery? Does anyone know of an exposition that connects relevant logics to Pearl's counterfactuals?

Here's how my initial formulation of UDT (let's call it UDT1 for simplicity) would solve Drescher's problem.

Among the world programs embedded (and given a weight) in S, would be the following:

```
def P():
action = S("the value of action Ai is simply i")
S_utility = ActionToValue(action) # maps Ai to i
```

If this is the only world program that calls S with "the value of action Ai is simply i", and S's utility function has a component for S_utility at the end of this P, then upon that input, S would iterate over the Ai's, and for each Ai, compute what S_utility would be at the end of P under the assumption that S returns Ai. Finally it returns An since that maximizes S_utility.

Eliezer, the way you described it is:

If combined with TDT, we would interpret UDT as having a never-updated weighting on all possible universes, and a causal structure (causal graph, presumably) on those universes. Any given logical computation in UDT will count all instantiations of itself in all universes which have received exactly the same inputs - even if those instantiations are being imagined by Omega in universes which UDT would ordinarily be interpreted as "knowing to be logically inconsistent", like universes in which the third decimal digit of pi is 3. Then UDT calculates the counterfactual consequences, weighted across all imagined universes, using its causal graphs on each of those universes, of setting the logical act to A_i. Then it maximizes on A_i.

The "causal graph" part doesn't *sound* like UDT1. Is it equivalent?

ETA: To respond to Drescher's

"Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7."

S is simply not programmed to think that. For A6 it would simulate P with "return A6" substituting for S, and calculate the utility of A6 that way.

ETA2: The previous sentence assumes that's what the "mathematical intuition" black box does.

Wei, if you want to calculate the *consequence* of an action, you need to know that this computation outputting A1 has something do with box B containing a million dollars (and being obtained by you, for that matter) or that A2 has something to do with the driver in Parfit's Hitchhiker deciding to pick you up and take you to the city. (And yet hypothetically choosing A6 is *not* used to infer, inside the counterfactual, that A6 actually was better than A7.)

This is what I am saying would get computed via the causal graphs, and which may require actual counterfactual surgery a la Pearl - at least the part where you don't believe that A6 actually was better than A7 or that (hypothetically) deciding to cross the road makes it safe - though you may not need to *re*compute Parfit's Hitchhiker, since this is an updateless decision theory to begin with.

I'm afraid I don't understand you. Can you look at my solution to Drescher's problem and point out which part is wrong or problematic? Or give a sample problem that UDT1 can't deal with because it doesn't use causal graphs?

Last time I tried to read Pearl's book, I didn't get very far. I'll try again if given sufficient motivation. I guess you can either explain to me some more about what problem it solves, or I can just take your word for it, if you think it's really a necessary component for UDT, and I'll understand that after I comprehend Pearl.

We're taking apart your "mathematical intuition" into something that invents a causal graph (this part is still magic) and a part that updates a causal graph "given that your output is Y" (Pearl says how to do this).

If you literally have the ability to run all of reality excluding yourself as a computer program, I suppose the causal graph part might be moot, since you could just simulate elementary particles directly, instead of approximating them with a high-level causal model. But then it's not clear how to literally simulate out the whole universe in perfect detail when the inside of your computer is casting gravitational influences outward based on transistors whose exact value you haven't yet computed (since you can't compute all of yourself in advance of computing yourself!).

With different physics and a perfect Cartesian embedding (a la AIXI) you could do this, perhaps. With a perfect Cartesian embedding and knowledge of the rest of the universe outside yourself, there would be no need for causal graphs of any sort within the theory, I think. But you would still have to factor out your logical uncertainty in a way which prevented you from concluding "if I choose A6, it must have had higher utility than A7" when considering A6 as an option (as Drescher observes). After all, if you suffered a brief bout of amnesia afterward, and I told you with trustworthy authority that you *really had* chosen A6, you would conclude that you really must have calculated higher expected utility for it relative to your probability distribution and utility function.

If I believably tell you that Lee Harvey Oswald really didn't shoot JFK, you conclude that someone else did. But in the counterfactual on our standard causal model, if LHO hadn't shot JFK, no one else would have. So when postulating that your output is A6 inside the decision function, you've got to avoid certain conclusions that you would in fact come to, if you observed in reality that your output really was A6, like A6 having higher expected utility than A7. This sort of thing is the domain of causal graphs, which is why I'm assuming that the base model is a causal graph with some logical uncertainty in it. Perhaps you could come up with a similar but non-causal formalism for pure logical uncertainty, and then this would be very interesting.

Eliezer, one of your more recent comments finally prodded me into reading http://bayes.cs.ucla.edu/IJCAI99/ijcai-99.pdf (don't know why I waited so long), and I can now understand this comment much better. Except this part:

But you would still have to factor out your logical uncertainty in a way which prevented you from concluding "if I choose A6, it must have had higher utility than A7" when considering A6 as an option (as Drescher observes).

Under UDT1, when I'm trying to predict the consequences of choosing A6, I *do* want to assume that it has higher expected utility than A7. Because suppose my prediction subroutine sees that there will be another agent who is very similar to me, about to make the same decision, it should predict that it will also choose A6, right?

Now when the prediction subroutine returns, that assumption pops off the stack and goes away. I then call my utility evaluation routine to compute a utility for those predictions. There is no place for me to conclude "if I choose A6, it must have had higher utility than A7" in a form that would cause any problems.

Am I missing something here?

Why bother predicting the counterfactual consequences of choosing A6 since you already "know" the EU is higher than A7 and all the other options?

On the other hand, if you actually do see a decision process similar to your decision choose A6, then you know that A6 *really does* have EU higher than A7.

Why bother predicting the counterfactual consequences of choosing A6 since you already "know" the EU is higher than A7 and all the other options?

Are you sure you're not anthropomorphizing the decision procedure? If I actually run through the steps that it specifies in my head, I don't see any place where it would say "why bother" or fail to do the prediction.

On the other hand, if you actually do see a decision process similar to your decision choose A6, then you know that A6 really does have EU higher than A7.

No, in UDT1 you don't update on outside computations like that. You just recompute the EU.

In any case, you shouldn't *know* wrong things at any point. The trick is to be able to consider what's going on *without* assuming (knowing) that you result from an actual choice.

No, in UDT1 you don't update on outside computations like that. You just recompute the EU.

This doesn't seem right. You update quite fine, in the sense that you'd prefer a strategy where observing utility-maximizer choose X leads you to conclude that X is the highest-utility choice, in the sense that all the subsequent actions are chosen as if it's so.

Looking over this... maybe this is stupid, but... isn't this sort of a use/mention issue?

When simulating "if I choose A6", then simulate "*THEN* I would have believed A6 has higher EU", *without* having to escalate that to "actual I (not simulated I) actually currently now believes A6 has higher EU"

Just don't have a TDT agent consider the beliefs of the counterfactual simulated versions of itself it be a reliable authority on actual noncounterfactual reality.

Am I missing the point? Am I skimming over the hard part, or...?

That's one possible approach. But then you have to define what exactly constitutes a "use" and what constitutes a "mention" with respect to inferring facts about the universe. Compare the crispness of Pearl's counterfactuals to classical causal decision theory's counterfactual distributions falling from heaven, and you'll see why you want more formal rules saying which inferences you can carry out.

Seems to me that it ought be treatable as "perfectly ordinary"...

That is, if you run a simulation, there's no reason to for you to believe the same things that the modeled beings believe, right? If one of the modeled beings happen to be a version of you that's acting and believing in terms of a counterfactual that is the premise of the simulation, then... why would that automatically lead to you believing the same thing in the first place? If you simulate a piece of paper that has written upon it "1+1=3", does that mean that you actually believe "1+1=3"? So if instead you simulate a version of yourself that gets confused and believes that "1+1=3"... well, that's just a simulation. If there's a risk of that escalating into your actual model of reality, that would suggest something is very wrong somewhere in how you set up a simulation in the first place, right?

ie, *simulated* you is allowed to make all the usual inferences from, well, other stuff in the simulated world. It's just that *actual* you doesn't get to automatically equate simulated you's beliefs with actual you's beliefs.

So allow the simulated version to make all the usual inferences. I don't see why any restriction is needed other than the level separation, which doesn't need to treat this issue as a special case.

ie, simulated you in the counterfactual in which A6 was chosen believes that, well, A6 is what the algorithm in question would choose as the best choice. So? You calmly observe/model the actions simulated you takes *if* it believes that and so on without having to actually believe that yourself. Then, once all the counterfactual modelings are done and you apply your utility function to each of those to determine their actual expected utility, thus finding that A7 produces the highest EU, you actually do A7.

It simply happens to be that most of the versions of you from the counterfactual models that arose in the process of doing the TDT computation had *false* beliefs about what the actual output of the computation actually is in actual reality.

Am I missing the point still, or...?

(wait... I'm understanding this issue to be something that you consider an unsolved issue in TDT and I'm saying "no, seems to me to be simple to make TDT do the right thing here. The Pearl style counterfactual stuff oughtn't cause any problem here, no special cases, no forbidden inferences need to be hard coded here", but now, looking at your comment, maybe you meant "This issue justifies TDT because TDT actually does the right thing here", in which case there was no need for me to say any of this at all. :))

The belief that A6 is highest-utility must come from somewhere. Strategy that includes A6 is not guaranteed to be real (game semantics: winning, ludics: without a daemon), that is it's not guaranteed to hold without assuming facts for no reason. The action of A6 is exactly such an assumption that is given no reason to be actually found in the strategy, and the activity of the decision-making algorithm is exactly in proving (implementing) one of the actions to be actually carried out. Of course, the fact that A6 is highest-utility may also be considered counterfactually, but then you are just doing something not directly related to proving this particular choice.

Sorry, I'm not sure I follow what you're saying.

I meant when dealing with the logical uncertainty of not yet knowing the outcome of the calculation that your decision process consists of, and counterfactually modelling each of the outcomes it "could" output, then when modeling the results of your own actions/beliefs as a result of that, simply don't escalate that from a model of you to, well, actually you. The simulated you that conditions based on you (counterfactually) having decided A6 would presumably believe A6 has higher utility. So? You, who are also running the simulation for if you had chosen A7, etc etc, would compare and conclude that A7 has highest utility, even though simulated you believes (incorrectly) A6. Just keep separate levels, don't do use/mention style errors, and (near as I can tell) there wouldn't be a problem.

Or am I utterly missing the point here?

Remember the counterfactual zombie principle: you are only implication, your decision or your knowledge only says what it would be if you exist, but you can't assume that you do exist.

When you counterfactual-consider A6, you consider how the world-with-A6 will be, but don't assume that it exists, and so can't infer that it's of highest utility. You are right that your copy in world-with-A6 would also choose A6, but that also doesn't have to be an action of maximum utility, since it's not guaranteed the situation will exist. For the action that you do choose, you may know that you've chosen it, but for the action you counterfactually-consider, you don't assume that you do choose it. (In causal networks, this seems to correspond to cutting off the action-node from yourself before setting it to a value.)

But then it's not clear how to literally simulate out the whole universe in perfect detail when the inside of your computer is casting gravitational influences outward based on transistors whose exact value you haven't yet computed (since you can't compute all of yourself in advance of computing yourself!).

Somewhat tangentially, this is a way to grok how the information-processing capabilities of markets are computationally intractable to simulate (or predict their outputs via experts).