## Posts

## Comments

**gary_drescher**on An approach to the Agent Simulates Predictor problem · 2016-04-22T15:20:39.000Z · score: 0 (0 votes) · LW · GW

For the simulation-output variant of ASP, let's say the agent's possible actions/outputs consist of all possible simulations Si (up to some specified length), concatenated with "one box" or "two boxes". To prove that any given action has utility greater than zero, the agent must prove that the associated simulation of the predictor is correct. Where does your algorithm have an opportunity to commit to one-boxing before completing the simulation, if it's not yet aware that any of its available actions has nonzero utility? (Or would that commitment require a further modification to the algorithm?)

For the simulation-as-key variant of ASP, what principle would instruct a (modified) UDT algorithm to redact some of the inferences it has already derived?

**gary_drescher**on An approach to the Agent Simulates Predictor problem · 2016-04-20T16:36:09.000Z · score: 1 (1 votes) · LW · GW

Suppose we amend ASP to require the agent to output a full simulation of the predictor before saying "one box" or "two boxes" (or else the agent gets no payoff at all). Would that defeat UDT variants that depend on stopping the agent before it overthinks the problem?

(Or instead of requiring the the agent to output the simulation, we could use the entire simulation, in some canonical form, as a cryptographic key to unlock an encrypted description of the problem itself. Prior to decrypting the description, the agent doesn't even know what the rules are; the agent is told in advance only that that decryption will reveal the rules.)

**gary_drescher**on Open Thread, April 27-May 4, 2014 · 2014-05-20T17:16:29.013Z · score: 3 (3 votes) · LW · GW

According to information his family graciously posted to his blog, the cause of death was occlusive coronary artery disease with cardiomegaly.

**gary_drescher**on Reflection in Probabilistic Logic · 2013-04-09T21:00:04.615Z · score: 1 (1 votes) · LW · GW

It occurs to me that my references above to "coherence" should be replaced by "coherence & P(T)=1 & reflective consistency". That is, there exists (if I understand correctly) a P that has all three properties, and that assigns the probabilities listed above. Therefore, those three properties would not suffice to characterize a suitable P for a UDT agent. (Not that anyone has claimed otherwise.)

**gary_drescher**on Reflection in Probabilistic Logic · 2013-03-26T20:17:57.345Z · score: 13 (13 votes) · LW · GW

Wow, this is great work--congratulations! If it pans out, it bridges a really fundamental gap.

I'm still digesting the idea, and perhaps I'm jumping the gun here, but I'm trying to envision a UDT (or TDT) agent using the sense of subjective probability you define. It seems to me that an agent can get into trouble even if its subjective probability meets the coherence criterion. If that's right, some additional criterion would have to be required. (Maybe that's what you already intend? Or maybe the following is just muddled.)

Let's try invoking a coherent P in the case of a simple decision problem for a UDT agent. First, define G <--> P("G") < 0.1. Then consider the 5&10 problem:

If the agent chooses A, payoff is 10 if ~G, 0 if G.

If the agent chooses B, payoff is 5.

And suppose the agent can prove the foregoing. Then unless I'm mistaken, there's a coherent P with the following assignments:

P(G) = 0.1

P(Agent()=A) = 0

P(Agent()=B) = 1

P(G | Agent()=B) = P(G) = 0.1

And P assigns 1 to each of the following:

P("Agent()=A") < epsilon

P("Agent()=B") > 1-epsilon

P("G & Agent()=B") / P("Agent()=B") = 0.1 +- epsilon

P("G & Agent()=A") / P("Agent()=A") > 0.5

The last inequality is consistent with the agent indeed choosing B, because the postulated conditional probability of G makes the expected payoff given A less than the payoff given B.

Is that P actually incoherent for reasons I'm overlooking? If not, then we'd need something beyond coherence to tell us which P a UDT agent should use, correct?

(edit: formatting)

**gary_drescher**on The Cognitive Science of Rationality · 2011-09-11T20:34:25.984Z · score: 14 (14 votes) · LW · GW

If John's physician prescribed a burdensome treatment because of a test whose false-positive rate is 99.9999%, John needs a lawyer rather than a statistician. :)

**gary_drescher**on Example decision theory problem: "Agent simulates predictor" · 2011-05-27T13:17:23.044Z · score: 4 (4 votes) · LW · GW

In April 2010 Gary Drescher proposed the "Agent simulates predictor" problem, or ASP, that shows how agents with lots of computational power sometimes fare worse than agents with limited resources.

Just to give due credit: Wei Dai and others had already discussed Prisoner's Dilemma scenarios that exhibit a similar problem, which I then distilled into the ASP problem.

**gary_drescher**on Discussion for Eliezer Yudkowsky's paper: Timeless Decision Theory · 2011-01-12T14:30:39.872Z · score: 0 (0 votes) · LW · GW

and for an illuminating reason - the algorithm is only run with one set of information

That's not essential, though (see the dual-simulation variant in Good and Real).

**gary_drescher**on Discussion for Eliezer Yudkowsky's paper: Timeless Decision Theory · 2011-01-11T21:38:52.196Z · score: 4 (4 votes) · LW · GW

Just to clarify, I think your analysis here doesn't apply to the transparent-boxes version that I presented in Good and Real. There, the predictor's task is not necessarily to predict what the agent does for real, but rather to predict what the agent would do in the event that the agent sees $1M in the box. (That is, the predictor simulates what--according to physics--the agent's configuration would do, if presented with the $1M environment; or equivalently, what the agent's 'source code' returns if called with the $1M argument.)

If the agent would one-box if $1M is in the box, but the predictor leaves the box empty, then the predictor has not predicted correctly, even if the agent (correctly) two-boxes upon seeing the empty box.

**gary_drescher**on Another attempt to explain UDT · 2010-11-18T17:54:18.068Z · score: 1 (1 votes) · LW · GW

2) "Agent simulates predictor"

This basically says that the predictor is a rock, doesn't depend on agent's decision,

True, it doesn't "depend" on the agent's decision in the specific sense of "dependency" defined by currently-formulated UDT. The question (as with any proposed DT) is whether that's in fact the right sense of "dependency" (between action and utility) to use for making decisions. Maybe it is, but the fact that UDT itself says so is insufficient reason to agree.

[EDIT: fixed typo]

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-28T20:40:19.828Z · score: 1 (1 votes) · LW · GW

I assume (please correct me if I'm mistaken) that you're referring to the payout-value as the output of the world program. In that case, a P-style program and a P1-style program can certainly give different outputs for some hypothetical outputs of S (for the given inputs). However, both programs's payout-outputs will be the same for whatever turns out to be the *actual* output of S (for the given inputs).

P and P1 have the same causal structure. And they have the same output with regard to (whatever is) the *actual* output of S (for the given inputs). But P and P1 differ *counterfactually* as to what the payout-output *would be* if the output of S (for the given inputs) were different than whatever it actually is.

So I guess you could say that what's unspecified are the counterfactual consequences of a hypothetical decision, given the (fully specified) physical structure of the scenario. But figuring out the counterfactual consequences of a decision is the main thing that the decision theory itself is supposed to do for us; that's what the whole Newcomb/Prisoner controversy boils down to. So I think it's the solution that's underspecified here, not the problem itself. We need a theory that takes the physical structure of the scenario as input, and generates counterfactual consequences (of hypothetical decisions) as outputs.

PS: To make P and P1 fully comparable, drop the "E*1e9" terms in P, so that both programs model the conventional transparent-boxes problem without an extraneous pi-preference payout.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-28T18:22:20.756Z · score: 1 (1 votes) · LW · GW

My concern is that there may be several world-programs that correspond faithfully to a given problem description, but that correspond to different analyses, yielding different decision prescriptions, as illustrated by the P1 example above. (Upon further consideration, I should probably modify P1 to include "S()=S1()" as an additional input to S and to Omega_Predict, duly reflecting that aspect of the problem description.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-28T16:10:30.172Z · score: 2 (2 votes) · LW · GW

That's very elegant! But the trick here, it seems to me, lies in the rules for setting up the world program in the first place.

First, the world-program's calling tree should match the structure of TDT's graph, or at least match the graph's (physically-)causal links. The physically-causal part of the structure tends to be uncontroversial, so (for present purposes) I'm ok with just stipulating the physical structure for a given problem.

But then there's the choice to use the same variable S in multiple places in the code. That corresponds to a choice (in TDT) to splice in a logical-dependency link from the Platonic decision-computation node to other Platonic nodes. In both theories, we need to be precise about the criteria for this dependency. Otherwise, the sense of dependency you're invoking might turn out to be wrong (it makes the theory prescribe incorrect decisions) or question-begging (it implicitly presupposes an answer to the key question that the theory itself is supposed to figure out for us, namely what things are or are not counterfactual consequences of the decision-computation).

So the question, in UDT1, is: under what circumstances do you represent two real-world computations as being tied together via the same variable in a world-program?

That's perhaps straightforward if S is implemented by literally the same physical state in multiple places. But as you acknowledge, you might instead have distinct Si's that diverge from one another for some inputs (though not for the actual input in this case). And the different instances need not have the same physical substrate, or even use the same algorithm, as long as they give the same answers when the relevant inputs are the same, for some mapping between the inputs and between the outputs of the two Si's. So there's quite a bit of latitude as to whether to construe two computations as "logically equivalent".

So, for example, for the conventional transparent-boxes problem, what principle tells us to formulate the world program as you proposed, rather than having:

```
def P1(i):
const S1;
E = (Pi(i) == 0)
D = Omega_Predict(S1, i, "box contains $1M")
if D ^ E:
C = S(i, "box contains $1M")
payout = 1001000 - C * 1000
else:
C = S(i, "box is empty")
payout = 1000 - C * 1000
```

(along with a similar program P2 that uses constant S2, yielding a different output from Omega_Predict)?

This alternative formulation ends up telling us to two-box. In this formulation, if S and S1 (or S and S2) are in fact the same, they would (counterfactually) differ if a different answer (than the actual one) were output from S—which is precisely what a causalist asserts. (A similar issue arises when deciding what facts to model as “inputs” to S—thus forbidding S to “know” those facts for purposes of figuring out the counterfactual dependencies—and what facts to build instead into the structure of the world-program, or to just leave as implicit background knowledge.)

So my concern is that UDT1 may covertly beg the question by selecting, among the possible formulations of the world-program, a version that turns out to presuppose an answer to the very question that UDT1 is intended to figure out for us (namely, what counterfactually depends on the decision-computation). And although I agree that the formulation you've selected in this example is correct and the above alternative formulation isn't, I think it remains to explain why.

(As with my comments about TDT, my remarks about UDT1 are under the blanket caveat that my grasp of the intended content of the theories is still tentative, so my criticisms may just reflect a misunderstanding on my part.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-07T12:33:27.233Z · score: 2 (2 votes) · LW · GW

Ok. I think it would be very helpful to sketch, all in one place, what TDT2 (i.e., the envisioned avenue-2 version of TDT) looks like, taking care to pin down any needed sense of "dependency". And similarly for TDT1, the avenue-1 version. (These suggestions may be premature, I realize.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-07T00:43:23.237Z · score: 2 (2 votes) · LW · GW

The link between the Platonic decision C and the physical decision D

No, D was the Platonic simulator. That's why the nature of the C->D dependency is crucial here.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-07T00:02:10.040Z · score: 2 (2 votes) · LW · GW

No, but whenever we see a

physicalfact F that depends on a decision C/D we're still in the process of making plus Something Else (E),

Wait, F depends on decision computation C in what sense of “depends on”? It can't quite be the originally defined sense (quoted from your email near the top of the OP), since that defines dependency between Platonic computations, not between a Platonic computation and a physical fact. Do you mean that D depends on C in the original sense, and F in turn depends on D (and on E) in a different sense?

then we express our uncertainty in the form of a

causalgraph with directed arrows from C to D, D to F, and E to F.

Ok, but these arrows can't be used to define the relevant sense of dependency above, since the relevant sense of dependency is what tells us we need to draw the arrows that way, if I understand correctly.

Sorry to keep being pedantic about the meaning of “depends”; I know you're in thinking-out-loud mode here. But the theory gives wildly different answers depending (heh) on how that gets pinned down.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-06T16:27:33.061Z · score: 3 (3 votes) · LW · GW

If we go down avenue (1), then we give primacy to our intuition that if-counterfactually you make a different decision, this logically controls the mathematical fact (D xor E) with E held constant, but does not logically control E with (D xor E) held constant. While this does sound intuitive in a sense, it isn't quite nailed down - after all, D is ultimately just as constant as E and (D xor E), and to change any of them makes the model equally inconsistent.

I agree this sounds intuitive. As I mentioned earlier, though, nailing this down is tantamount to circling back and solving the full-blown problem of (decision-supporting) counterfactual reasoning: the problem of how to distinguish which facts to “hold fixed”, and which to “let vary” for consistency with a counterfactual antecedent.

In any event, is the idea to try to build a separate graph for math facts, and use that to analyze “logical dependency” among the Platonic nodes in the original graph, in order to carry out TDT's modified “surgical alteration” of the original graph? Or would you try to build one big graph that encompasses physical and logical facts alike, and then use Pearl's decision procedure without further modification?

If we view the physical observation of $1m as telling us the raw mathematical fact (D xor E), and then perform mathematical inference on D, we'll find that we can affect E, which is not what we want.

Wait, isn't it decision-computation C—rather than simulation D—whose “effect” (in the sense of logical consequence) on E we're concerned about here? It's the logical dependents of C that get surgically altered in the graph when C gets surgically altered, right? (I know C and D are logically equivalent, but you're talking about inserting a physical node after D, not C, so I'm a bit confused.)

I'm having trouble following the gist of avenue (2) at the moment. Even with the node structure you suggest, we can still infer E from C and from the physical node that matches (D xor E)—unless the new rule prohibits relying on that physical node, which I guess is the idea. But what exactly is the prohibition? Are we forbidden to infer any mathematical fact from any physical indicator of that fact? Or is there something in particular about node (D xor E) that makes it forbidden? (It would be circular to cite the node's dependence on C in the very sense of "dependence" that the new rule is helping us to compute.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T19:17:00.464Z · score: 2 (2 votes) · LW · GW

I already saw the $1M, so, by two-boxing, aren't I just choosing to be one of those who see their E module output True?

Not if a counterfactual consequence of two-boxing is that the large box (probably) would be empty (even though in fact it is not empty, as you can already see).

That's the same question that comes up in the original transparent-boxes problem, of course. We probably shouldn't try to recap that whole debate in the middle of this thread. :)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T18:24:51.193Z · score: 5 (5 votes) · LW · GW

2) Treat differently mathematical knowledge that we learn by genuinely mathematical reasoning and by physical observation. In this case we know (D xor E) not by mathematical reasoning, but by physically observing a box whose state we believe to be correlated with D xor E. This may justify constructing a causal DAG with a node descending from D and E, so a counterfactual setting of D won't affect the setting of E.

Perhaps I'm misunderstanding you here, but D and E are Platonic computations. What does it mean to construct a causal DAG among Platonic computations? [EDIT: Ok, I may understand that a little better now; see my edit to my reply to (1).] Such a graph links together general mathematical facts, so the same issues arise as in (1), it seems to me: Do the links correspond to logical inference, or something else? What makes the graph acyclic? Is mathematical causality even coherent? And if you did have a module that can detect (presumably timeless) causal links among Platonic computations, then why not use that module directly to solve your decision problems?

Plus I'm not convinced that there's a meaningful distinction between math knowledge that you gain by genuine math reasoning, and math knowledge that you gain by physical observation.

Let's say, for instance, that I feed a particular conjecture to an automatic theorem prover, which tells me it's true. Have I then learned that math fact by genuine mathematical reasoning (performed by the physical computer's Platonic abstraction)? Or have I learned it by physical observation (of the physical computer's output), and hence be barred from using that math fact for purposes of TDT's logical-dependency-detection? Presumably the former, right? (Or else TDT will make even worse errors.)

But then suppose the predictor has simulated the universe sufficiently to establish that U (the universe's algorithm, including physics and initial conditions) leads to there being $1M in the box in this situation. That's a mathematical fact about U, obtained by (the simulator's) mathematical reasoning. Let's suppose that when the predictor briefs me, the briefing includes mention of this mathematical fact. So even if I keep my eyes closed and never physically see the $1M, I can rely instead on the corresponding mathematically derived fact.

(Or more straightforwardly, we can view the universe itself as a computer that's performing mathematical reasoning about how U unfolds, in which case any physical observation is intrinsically obtained by mathematical reasoning.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T17:03:45.899Z · score: 5 (5 votes) · LW · GW

1) Construct a full-blown DAG of math and Platonic facts, an account of which mathematical facts make other mathematical facts true, so that we can compute mathematical counterfactuals.

“Makes true” means logically implies? Why would that graph be acyclic? [EDIT: Wait, maybe I see what you mean. If you take a pdf of your beliefs about various mathematical facts, and run Pearl's algorithm, you should be able to construct an acyclic graph.]

Although I know of no worked-out theory that I find convincing, I believe that counterfactual inference (of the sort that's appropriate to use in the decision computation) makes sense with regard to events in universes characterized by certain kinds of physical laws. But when you speak of mathematical counterfactuals more generally, it's not clear to me that that's even coherent.

Plus, if you did have a general math-counterfactual-solving module, why would you relegate it to the logical-dependency-finding subproblem in TDT, and then return to the original factored causal graph? Instead, why not cast the whole problem as a mathematical abstraction, and then directly ask your math-counterfactual-solving module whether, say, (Platonic) C's one-boxing counterfactually entails (Platonic) $1M? (Then do the argmax over the respective math-counterfactual consequences of C's candidate outputs.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T15:23:45.291Z · score: 1 (1 votes) · LW · GW

Have some Omega thought experiments been one shot, never to be repeated type deals or is my memory incorrect?

Yes, and that's the intent in this example as well. Still, it can be useful to look at the expected distribution of outcomes over a large enough number of trials that have the same structure, in order to infer the (counterfactual) probabilities that apply to a single trial.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T15:11:51.202Z · score: 1 (1 votes) · LW · GW

The backward link isn't causal. It's a logical/Platonic-dependency link, which is indeed how TDT handles counterfactuals (i.e., how it handles the propagation of "surgical alterations" to the decision node C).

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T15:07:55.941Z · score: 2 (2 votes) · LW · GW

(I refrained from doing this for the problem described in Gary's post, since it doesn't mention UDT at all, and therefore I'm assuming you want to find a TDT-only solution.)

Yes, I was focusing on a specific difficulty in TDT, But I certainly have no objection to bringing UDT into the thread too. (I myself haven't yet gotten around to giving UDT the attention I think it deserves.)

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T14:19:18.161Z · score: 0 (0 votes) · LW · GW

By "unsolvable" I mean that you're screwed over in final outcomes, not that TDT fails to have an output.

Oh ok. So it's unsolvable in the same sense that "Choose red or green. Then I'll shoot you." is unsolvable. Sometimes choice really *is* futile. :) [EDIT: Oops, I probably misunderstood what you're referring to by "screwed over".]

The interesting part of the problem is that, whatever you decide, you deduce facts about the background such that you know that what you are doing is the wrong thing.

Yes, assuming that you're the sort of algorithm that can (without inconsistency) know its own choice here before the choice is executed.

If you're the sort of algorithm that may revise its intended action in response to the updated deduction, and if you have enough time left to perform the updated deduction, then the (previously) intended action may not be reliable evidence of what you will actually do, so it fails to provide sound reason for the update in the first place.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T13:46:43.940Z · score: 1 (1 votes) · LW · GW

When:

D(M) = true, D(!M) = true, E = true

Omega fails.

No, but it seems that way because I neglected in my OP to supply some key details of the transparent-boxes scenario. See my new edit at the end of the OP.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T13:31:25.888Z · score: 0 (0 votes) · LW · GW

In the setup in question, D goes into an infinite loop (since in the general case it must call a copy of C, but because the box is transparent, C takes as input the output of D).

No, because by stipulation here, D *only* simulates the hypothetical case in which the box contains $1M, which does *not* necessarily correspond to the output of D (see my earlier reply to JGWeissman:

http://lesswrong.com/lw/1qo/a_problem_with_timeless_decision_theory_tdt/1kpk).

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T02:12:29.541Z · score: 5 (4 votes) · LW · GW

I think this problem is based (at least in part) on an incoherence in the basic transparent box variant of Newcomb's problem.

If the subject of the problem will two-box if he sees the big box has the million dollars, but will one-box if he sees the big box is empty. Then there is no action Omega could take to satisfy the conditions of the problem.

The rules of the transparent-boxes problem (as specified in *Good and Real*) are: the predictor conducts a simulation that tentatively presumes there will be $1M in the large box, and then puts $1M in the box (for real) iff the simulation showed one-boxing. So the subject you describe gets an empty box and one-boxes, but that doesn't violate the conditions of the problem, which do not require the empty box to be predictive of the subject's choice.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T01:53:20.649Z · score: 3 (3 votes) · LW · GW

For now, let me just reply to your incidental concluding point, because that's brief.

I disagree that the red/green problem is unsolvable. I'd say the solution is that, with respect to the available information, both choices have equal (low) utility, so it's simply a toss-up. A correct decision algorithm will just flip a coin or whatever.

Having done so, will a correct decision algorithm try to revise its choice in light of its (tentative) new knowledge of what its choice is? Only if it has nothing more productive to do with its remaining time.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T01:19:07.044Z · score: 2 (2 votes) · LW · GW

Actually, you're in a different camp than Laura: she agrees that it's incorrect to two-box regardless of any preference you have about the specified digit of pi. :)

The easiest way to see why two-boxing is wrong is to imagine a large number of trials, with a different chooser, and a different value of i, for each trial. Suppose each chooser strongly prefers that their trial's particular digit of pi be zero. The proportion of two-boxer simulations that end up with the digit equal to zero is no different than the proportion of one-boxer simulations that end up with the digit equal to zero (both are approximately .1). But the proportion of the one-boxer simulations that end up with an actual $1M is much higher (.9) than the proportion of two-boxer simulations that end up with an actual $1M (.1).

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T00:48:07.176Z · score: 2 (2 votes) · LW · GW

Everything you just said is true.*

Everything you just said is also consistent with everything I said in my original post.

*Except for one typo: you wrote (D or E) instead of (D xor E).

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-05T00:26:19.857Z · score: 1 (1 votes) · LW · GW

If D=false and E=true and there's $1M in the box and I two-box, then (in the particular Newcomb's variant described above) the predictor is not wrong. The predictor correctly computed that (D xor E) is true, and set up the box accordingly, as the rules of this particular variant prescribe.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T23:51:03.945Z · score: 5 (5 votes) · LW · GW

Sorry, the above post omits some background information. If E "depends on" C in the particular sense defined, then the TDT algorithm mandates that when you "surgically alter" the output of C in the factored causal graph, you then you must correspondingly surgically alter the output of E in the graph.

So it's not at all a matter of any intuitive connotation of "depends on". Rather, "depends on", in this context, is purely a technical term that designates a particular test that the TDT algorithm performs. And the algorithm's prescribed use of that test culminates in the algorithm making the wrong decision in the case described above (namely, it tells me to two-box when I should one-box).

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T20:44:50.627Z · score: 1 (1 votes) · LW · GW

Better now?

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T20:22:51.200Z · score: 0 (0 votes) · LW · GW

Hm, sorry, it's displaying for me in the same size as the rest of the site, so I'm not sure what you're seeing. I'll strip the formatting and see if that helps.

**gary_drescher**on A problem with Timeless Decision Theory (TDT) · 2010-02-04T19:56:58.532Z · score: 2 (2 votes) · LW · GW

Done.

**gary_drescher**on Timeless Decision Theory and Meta-Circular Decision Theory · 2009-08-26T21:01:09.092Z · score: 5 (5 votes) · LW · GW

[In TDT] If you desire to smoke cigarettes, this would be observed and screened off by conditioning on the fixed initial conditions of the computation - the fact that the utility function had a positive term for smoking cigarettes, would already tell you that you had the gene. (Eells's "tickle".) If you can't observe your own utility function then you are actually taking a step outside the timeless decision theory as formulated.

Consider a different scenario where people with and without the gene both desire to smoke, but the gene makes that desire stronger, and the stronger it is, the more likely one is to smoke. Even when you observe your own utility function, you don't necessarily have a clue whether the utility assigned to smoking is the level caused by the gene or else by the gene's absence. So your observation of your utility function doesn't necessarily help you to move away from the base-level probability of having cancer here.

**gary_drescher**on Timeless Decision Theory and Meta-Circular Decision Theory · 2009-08-26T20:26:12.665Z · score: 13 (13 votes) · LW · GW

Thanks, Eliezer--that's a clear explanation of an elegant theory. So far, TDT (I haven't looked carefully at UDT) strikes me as more promising than any other decision theory I'm aware of (including my own efforts, past and pending). Congratulations are in order!

I agree, of course, that TDT doesn't make the A6/A7 mistake. That was just a simple illustration of the need, in counterfactual reasoning (broadly construed), to specify somehow what to hold fixed and what not to, and that different ways of doing so specify different senses of counterfactual inference (i.e., that there are different kinds of 'if-counterfactually'). If counterfactual inference is construed a la Pearl, for example, then such inferences (causal-counterfactual) correspond to causal links (if-causally).

As you say, TDT's utility formula doesn't perform general logical inferences (or evidential-counterfactual inferences) from the antecedents it evaluates (i.e. the candidate outputs of the Platonic computation). Rather, the utility formula performs causal-counterfactual inferences from the set of nodes that designate the outputs of the Platonic computation, in all places where that Platonic computation is approximately physically instantiated.

However, it seems to me we can, if we wish, use TDT to define what we can call a TDT-counterfactual that tells us would be true 'if-timelessly' a particular physical agent's particular physical action were to occur. In particular, whereas CDT says that what would be true (if-causally) consists of what's causally downstream from that action, TDT says that what would be true (if-timelessly) consists of what's causally downstream from the output of the suitably-specified Platonic computation that the particular physical agent approximately implements, and also what's causally downstream from that same Platonic computation in all other places where that computation is approximately physically instantiated. (And the physical TDT agent argmaxes over the utilities of the TDT-counterfactual consequences of that agent's candidate actions.)

I think there are a few reasons we might sometimes find it useful to think in terms of the TDT-counterfactual consequences of a physical agent's actions, rather than directly in terms of the standard TDT formulation (even though they're merely two different ways of expressing the same decision theory, unless I've misunderstood).

The TDT-counterfactual perspective places TDT in a common framework with other decision theories that (implicitly or explicitly) use other kinds of counterfactual reasoning, starting with a physical agent's action as the antecedent. Then we can apply some meta-criterion to ask which of those alternative theories is correct, and why. (That was the intuition behind my MCDT proposal, although MCDT itself was hastily specified and too simpleminded to be correct.)

Plausibly, people are agents who think in terms of the counterfactual consequences of an action, rather than being hardwired to use TDT. If we are to choose to act in accordance with TDT from now on (or, equivalently, if we are to build AIs who act in accordance with TDT), we need to be persuaded that doing so is for the best (even if e.g. a Newcomb snapshot was already taken before we became persuaded). (I'm assuming here that our extant choice machinery allows us the flexibility to be persuaded about what sort of counterfactual to use; if not, alas, we can't necessarily get there from here).

In the standard formulation of TDT, you effectively view yourself as an abstract computation with one or more approximate physical instantiations, and you ask what you (thus construed) cause (i.e. what follows causal-counterfactually). In the alternative formulation, I view myself as a particular physical agent that is among one or more approximate instantiations of an abstract computation, and I ask what follows TDT-counterfactually from what I (thus construed) choose.

The original formulation seems to require a precommitment to identify oneself with all instantiations (in the causal net) of the abstract computation (or at least seems to require that in order for us non-TDT agents to decide to emulate TDT). And that identification is indeed plausible in the case of fairly exact replication. But consider, say, a 1-shot PD game between Eliezer and me. Our mutual understanding of reflexive consistency would let us win. And I agree that we both approximately instantiate, at some level of abstraction, a common decision computation, which is what lets the TDT framework apply and lets us both win.

But (in contrast with an exact-simulation case) that common computation is at a level of abstraction that does not preserve our respective personal identities. (That's kind of the point of the abstraction. My utility function for the game places value on Gary's points and not Eliezer's points; the common abstract computation lacks that bias.) So I would hesitate to identify either of us with the common abstraction. (And I see in other comments that Eliezer explicitly agrees.) Rather, I'd like to reason that if-timelessly I, Gary, choose 'Cooperate', then so does Eliezer. That way, "I am you as you are me" emerges as a (metaphorical) conclusion about the situation (we each have a choice about the other's action in the game, and are effectively acting together) rather than being needed as the point of departure.

Again, the foregoing is just an alternative but equivalent (unless I've erred) way of viewing TDT, an alternative that may be useful for some purposes.

**gary_drescher**on Towards a New Decision Theory · 2009-08-21T14:16:29.519Z · score: 1 (1 votes) · LW · GW

If you could spend a day with any living person

I think you'd find me anticlimactic. :) But I do appreciate the kind words.

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-20T21:23:57.653Z · score: 0 (0 votes) · LW · GW

I agree that "choose" connotes multiple alternatives, but they're counterfactual antecedents, and when construed as such, are not inconsistent with determinism.

I don't know about being *ontologically* basic, but (what I think of as) physical/causal laws have the important property that they compactly specify the entirety of space-time (together with a specification of the initial conditions).

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-20T16:25:54.790Z · score: 2 (2 votes) · LW · GW

Just as a matter of terminology, I prefer to say that we can *choose* (or that we *have a choice about*) the output, rather than that we *control* it. To me, *control* has too strong a connotation of *cause*.

It's tricky, of course, because the concepts of choice-about and causal-influence-over are so thoroughly conflated that most people will use the same word to refer to both without distinction. So my terminology suggestion is kind of like most materialsts' choice to relinquish the word *soul* to refer to something extraphysical, retaining *consciousness* to refer to the actual physical/computational process. (Causes, unlike souls, are real, but still distinct from what they're often conflated with.)

Again, this is just terminology, nothing substantive.

EDIT: In the (usual) special case where a means-end link is causal, I agree with you that we control something that's ultimately mathematical, even in my proposed sense of the term.

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-20T14:51:54.578Z · score: 1 (1 votes) · LW · GW

To clarify: the agent in MCDT is a particular physical instantiation, rather than being timeless/Platonic (well, except insofar as physics itself is Platonic).

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-20T13:44:16.969Z · score: 23 (24 votes) · LW · GW

This is very cool, and I haven't digested it yet, but I wonder if it might be open to the criticism that you're effectively postulating the favored answer to Newcomb's Problem (and other such scenarios) by postulating that when you surgically alter one of the nodes, you correspondingly alter the nodes for the other instances of the computation. After all, the crux of the counterfactual-reasoning dilemma in Newcomb's Problem (and similarly in the Prisoner's Dilemma) is to jusftify the inference "If I choose both boxes, then (probably) so does the simulation (even if in fact I/it do not)" rather than "If I choose both boxes, then the simulation doesn't necessarily match my choice (even though in fact it does)". It could be objected that your formalism postulates the desired answer rather than giving a basis for deriving it--an objection that becomes more important when we move away from identical or functionally equivalent source code and start to consider approximate similarities. (See my criticism of Leslie (1991)'s proposal that you should make your choice as though you were also choosing on behalf of other agents of similar causal structure. If I'm not mistaken, your proposal seems to be a formalization of that idea.)

Here's an alternative proposal.

Metacircular Decision Theory (MCDT)

For purposes of this discussion, let me just stipulate that subjective probabilities will be modeled as though they were quantum under MWI--that is, we'll regard the entire distribution as part of the universe. That move will help with dual-simulation/counterfactual-mugging scenarios; but also, as I argued in Good and Real, we effectively make that move whenever we assign value to probabilistic outcomes even in nonesoteric situations (so we may as well avail ourselves of that move in the weird scenarios too, though eventually we need to justify the move).

Say we have an agent embodied in the universe. The agent knows some facts about the universe (including itself), has an inference system of some sort for expanding on those facts, and has a preference scheme that assigns a value to the set of facts, and is wired to select an action--specifically, the/an action that implies (using its inference system) the/a most-preferred set of facts.

But without further constraint, this process often leads to a contradiction. Suppose the agent's repertoire of actions is A1, ...An, and the value of action Ai is simply i. Say the agent starts by considering the action A7, and dutifully evaluates it as 7. Next, it contemplates the action A6, and reasons as follows: "Suppose I choose A6. I know I'm a utility-maximizing agent, and I already know there's another choice that has value 7. Therefore, if follows from my (hypothetical) choice of A6 that A6 has a value of at least 7." But that inference, while sound, contradicts the fact that A6's value is 6.

Unsurprisingly, a false premise leads to a contradiction. To avoid contradiction, we need to limit the set of facts that the agent is allowed to reason from when making inferences about a hypothetical action. But which facts do we omit? Different choices yield different preferred actions. If we omit the fact that val(A6)=6, then we can infer val(A6)>=7; if instead we omit the fact that the agent utility-maximizes, then we can infer val(A6)=6 without contradiction (or at least without the particular contradiction above).

So this is the usual full-blown problem of counterfactual inference: which things do we "hold fixed" when contemplating a counterfactual antecedent, and which do we "let vary" for consistency with that antecedent? Different choices here correspond to different decision theories. If the agent allows inferences (only) from all facts about physical law as applied to the future, and all facts about the past and present universe-state, except for facts about the agent's internal decision-making state, then we get CDT. If we leave the criteria unspecified/ambiguous, we get EDT. If we allow the agent to reason from facts about the future as well as the past and present, we get FDT (Fatalist Decision Theory: choice is futile, which most people think follows from determinism).

MCDT's proposed criterion is this: the agent makes a meta-choice about which facts to omit when making inferences about the hypothetical actions, and selects the set of facts which lead to the best outcome if the agent then evaluates the original candidate actions with respect to that choice of facts. The agent then iterates that meta-evaluation as needed (probably not very far) until a fixed point is reached, i.e. the same choice (as to which facts to omit) leaves the first-order choice unchanged. (It's ok if that's intractable or uncomputable; the agent can muddle through with some approximate algorithm.)

EDIT1: The algorithm also needs to check, when it evaluates a meta-level choice candidate, that the winning choice at the next level down is consistent with all known facts. If not, the meta-level candidate is eliminated from consideration. (Otherwise, the A6 choice could remain stable in the example above.)

EDIT2: Or rather, that consistency check can probably *substitute for* the additional meta-iterations.

So e.g. in Newcomb's Problem or the Prisoner's Dilemma, the agent can calculate that it does better if it retains the fact that its dispositional-state/source-code is functionally equivalent to the simulation's/other's (but omits facts about which particular choice is made by both) than if it makes the CDT choice and omits the fact about equivalence, but keeps the facts about the simulation's/other's choice (or keeps some probability distribution about the simulation's/other's choice).

In other words, metacircular consistency isn't just a *test* that we'd like the decision theory to pass. Metacircular consistency *is* the theory; it *is* the algorithm.

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-19T23:34:21.827Z · score: 1 (1 votes) · LW · GW

I didn't really get the purpose of the paper's analysis of "rationality talk". Ultimately, as I understood the paper, it was making a prescriptive argument about how people (as actually implemented) should behave in the scenarios presented (i.e, the "rational" way for them to behave).

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-19T23:08:11.174Z · score: 4 (4 votes) · LW · GW

Exactly. Unless "cultivating a disposition" amounts to a (subsequent-choice-circumventing) precommitment, you still need a reason, when you make that subsequent choice, to act in accordance with the cultivated disposition. And there's no good explanation for why that reason should care about whether or not you previously cultivated a disposition.

**gary_drescher**on Ingredients of Timeless Decision Theory · 2009-08-19T19:58:22.259Z · score: 3 (3 votes) · LW · GW

I don't think DBDT gives the right answer if the predictor's snapshot of the local universe-state was taken before the agent was born (or before humans evolved, or whatever), because the "critical point", as Fisher defines it, occurs too late. But a one-box chooser can still expect a better outcome.

**gary_drescher**on Towards a New Decision Theory · 2009-08-19T11:29:36.556Z · score: 12 (12 votes) · LW · GW

Just to elaborate a bit, Nesov's scenario and mine share the following features:

In both cases, we argue that an agent should forfeit a smaller sum for the sake of a larger reward that would have been obtainted (couterfactually contingently on that forfeiture) if a random event had turned out differently than in fact it did (and than the agent knows it did).

We both argue for using the original coin-flip probability distribution (i.e., not-updating, if I've understood that idea correctly) for purposes of this decision, and indeed in general, even in mundane scenarios.

We both note that the forfeiture decision is easier to justify if the coin-toss was quantum under MWI, because then the original probability distribution corresponds to a real physical distribution of amplitude in configuration-space.

Nesov's scenario improves on mine in several ways. He eliminates some unnecessary complications (he uses one simulation instead of two, and just tells the agent what the coin-toss was, whereas my scenario requires the agent to deduce that). So he makes the point more clearly, succinctly and dramatically. Even more importantly, his analysis (along with Yudkowsky, Dai, and others here) is more formal than my ad hoc argument (if you've looked at Good and Real, you can tell that formalism is not my forte.:)).

I too have been striving for a more formal foundation, but it's been elusive. So I'm quite pleased and encouraged to find a community here that's making good progress focusing on a similar set of problems from a compatible vantage point.

**gary_drescher**on Towards a New Decision Theory · 2009-08-17T02:56:30.181Z · score: 10 (10 votes) · LW · GW

My book discusses a similar scenario: the dual-simulation version of Newcomb's Problem (section 6.3), in the case where the large box is empty (no $1M) and (I argue) it's still rational to forfeit the $1K. Nesov's version nicely streamlines the scenario.