# Problematic Problems for TDT

post by drnickbone · 2012-05-29T15:41:37.964Z · score: 36 (48 votes) · LW · GW · Legacy · 301 commentsA key goal of Less Wrong's "advanced" decision theories (like TDT, UDT and ADT) is that they should out-perform standard decision theories (such as CDT) in contexts where another agent has access to the decider's code, or can otherwise predict the decider's behaviour. In particular, agents who run these theories will one-box on Newcomb's problem, and so generally make more money than agents which two-box. Slightly surprisingly, they may well continue to one-box even if the boxes are transparent, and even if the predictor Omega makes occasional errors (a problem due to Gary Drescher, which Eliezer has described as equivalent to "counterfactual mugging"). More generally, these agents behave like a CDT agent will wish it had pre-committed itself to behaving before being faced with the problem.

However, I've recently thought of a class of Omega problems where TDT (and related theories) appears to under-perform compared to CDT. Importantly, these are problems which are "fair" - at least as fair as the original Newcomb problem - because the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems. This contrasts with clearly "unfair" problems like the following:

**Discrimination**: Omega presents the usual two boxes. Box A always contains $1000. Box B contains nothing if Omega detects that the agent is running TDT; otherwise it contains $1 million.

So what are some *fair* "problematic problems"?

**Problem 1**: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

** Analysis**: Any agent who is themselves running TDT will reason as in the standard Newcomb problem. They'll prove that their decision is linked to the simulated agent's, so that if they two-box they'll only win $1000, whereas if they one-box they will win $1 million. So the agent will choose to one-box and win $1 million.

However, any CDT agent can just take both boxes and win $1001000. In fact, any other agent who is *not* running TDT (e.g. an EDT agent) will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million. So any other agent can safely two-box as well.

Note that we can modify the contents of Box A so that it contains anything up to $1 million; the CDT agent (or EDT agent) can in principle win up to twice as much as the TDT agent.

**Problem 2**: Our ever-reliable Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "Exactly one of these boxes contains $1 million; the others contain nothing. You must take exactly one box to win the money; if you try to take more than one, then you won't be allowed to keep any winnings. Before you entered the room, I ran multiple simulations of this problem as presented to an agent running TDT, and determined the box which the agent was least likely to take. If there were several such boxes tied for equal-lowest probability, then I just selected one of them, the one labelled with the smallest number. I then placed $1 million in the selected box. Please choose your box."

** Analysis**: A TDT agent will reason that whatever it does, it cannot have more than 10% chance of winning the $1 million. In fact, the TDT agent's best reply is to pick each box with equal probability; after Omega calculates this, it will place the $1 million under box number 1 and the TDT agent has exactly 10% chance of winning it.

But any non-TDT agent (e.g. CDT or EDT) can reason this through as well, and just pick box number 1, so winning $1 million. By increasing the number of boxes, we can ensure that TDT has arbitrarily low chance of winning, compared to CDT which always wins.

**Some questions:**

1. Have these or similar problems already been discovered by TDT (or UDT) theorists, and if so, is there a known solution? I had a search on Less Wrong but couldn't find anything obviously like them.

2. Is the analysis correct, or is there some subtle reason why a TDT (or UDT) agent would choose differently from described?

3. If a TDT agent believed (or had reason to believe) that Omega was going to present it with such problems, then wouldn't it want to self-modify to CDT? But this seems paradoxical, since the whole idea of a TDT agent is that it doesn't have to self-modify.

4. Might such problems show that there cannot be a single TDT algorithm (or family of provably-linked TDT algorithms) so that when Omega says it is simulating a TDT agent, it is quite ambiguous what it is doing? (This objection would go away if Omega revealed the source-code of its simulated agent, and the source-code of the choosing agent; each particular version of TDT would then be out-performed on a specific matching problem.)

5. Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

6. Finally, is it more likely that Omegas - or things like them - will present agents with Newcomb and Prisoner's Dilemma problems (on which TDT succeeds) rather than problematic problems (on which it fails)?

**Edit:** I tweaked the explanation of Box A's contents in Problem 1, since this was causing some confusion. The idea is that, as in the usual Newcomb problem, Box A always contains $1000. Note that Box B depends on what the simulated agent chooses; it doesn't depend on Omega predicting what the actual deciding agent chooses (so Omega doesn't put less money in any box just because it sees that the actual decider is running TDT).

## 301 comments

Comments sorted by top scores.

You can construct a "counterexample" to any decision theory by writing a scenario in which it (or the decision theory you want to have win) is named explicitly. For example, consider Alphabetic Decision Theory, which writes a description of each of the options, then chooses whichever is first alphabetically. ADT is bad, but not so bad that you can't make it win: you could postulate an Omega which checks to see whether you're ADT, gives you $1000 if you are, and tortures you for a year if you aren't.

That's what's happening in Problem 1, except that it's a little bit hidden. There, you have an Omega which says: if you are TDT, I will make the content of these boxes depend on your choice in such a way that you can't have both; if you aren't TDT, I filled both boxes.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

Right, but this is exactly the insight of this post put another way. The possibility of an Omega that rewards eg ADT is discussed in Eliezer's TDT paper. He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are. What's interesting about this is that this is a "fair" test by that definition, yet it acts like an unfair test.

Because it's a fair test, it doesn't matter whether Omega thinks TDT and TDT-prime are the same - what matters is whether TDT-prime thinks so.

He sets out an idea of a "fair" test, which evaluates only what you do and what you are predicted to do, not what you are.

Two questions: First, how does is this distinction justified? What a decision theory *is* is a strategy for responding to decision tasks and simulating agents performing the right decision tasks tells you what kind of decision theory they're using. Why does it matter if it's done implicitly (as in Newcomb's discrimination against CDT) or explicitly. And second why should we care about it? Why is it important for a decision theory to pass fair tests but not unfair tests?

Why is it important for a decision theory to pass fair tests but not unfair tests?

Well, on unfair tests a decision theory still needs to do as well as possible. If we had a version of the original Newcomb's problem, with the one difference that a CDT agent gets $1billion just for showing up, it's still incumbent upon a TDT agent to walk away with $1000000 rather than $1000. The "unfair" class of problems is that class where "winning as much as possible" is distinct from "winning the most out of all possible agents".

Real-world unfair tests could matter, though it's not clear if there are any. However, hypothetical unfair tests aren't very informative about what is a good decision theory, because it's trivial to cook one up that favours one theory and disfavours another. I think the hope was to invent a decision theory that does well on all fair tests; the example above seems to show that may not be possible.

Because it's a fair test

No, not even by Eliezer's standard, because TDT is not given the same problem than other decision theories.

As stated in comments below, everyone but TDT have the information "I'm not in the simulation" (or more precisely, in one of the simulations of the infinite regress that is implied by Omega's formulation). The reason TDT does not have this extra piece of information comes from the fact that it is TDT, not from any decision it may make.

Right, and this is an unfairness that Eliezer's definition fails to capture.

At this point, I need the text of that definition.

The definition is in Eliezer's TDT paper although a quick grep for "fair" didn't immediately find the definition.

This variation of the problem was invented in the follow-up post (I think it was called "Sneaky strategies for TDT" or something like that:

Omega tells you that earlier he flipped a coin. If the coin came down heads, it simulated a CDT agent facing this problem. If the coin came down tails, it simulated a TDT agent facing this problem. In either case, if the simulated agent one-boxed, there is $1000000 in Box-B; if it two-boxed Box-B is empty. In this case TDT still one-boxes (50% chance of $1000000 dominates a 100% chance of $1000), and CDT still two-boxes (because that's what CDT does). In this case, even though both agents have an equal chance of being simulated, CDT out-performs TDT (average payoffs of 500500 vs. 500000) - CDT takes advantage of TDT's prudence and TDT suffers for CDT's lack of it. Notice also that TDT cannot do better by behaving like CDT (both would get payoffs of 1000). This shows that the class of problems we're concerned with is not so much "fair" vs. "unfair", but more like "those problem on which the best *I* can do is not necessarily the best anyone can do". We can call it "fairness" if we want, but it's not like Omega is discriminating against TDT in this case.

This is not a zero-sum game. CDT does not outperform TDT here. It just makes a stupid mistake, and happens to pay it less dearly than TDT

Let's say Omega submit the same problem to 2 arbitrary decision theories. Each will either 1-box or 2-box. Here is the average payoff matrix:

- Both a and b 1-box -> They both get the million
- Both a and b 2-box -> They both get 1000 only.
- One 1-boxes, the other 2-boxes -> the 1-boxer gets half a million, the other gets 5000 more.

Clearly, 1 boxing still dominates 2-boxing. Whatever the other does, you personally get about half a million more by 1-boxing. TDT may have less utility than CDT for 1-boxing, but CDT is still stupid here, while TDT is not.

Not exactly. Because the problem statement says that it simulates "TDT", if you were to expand the problem statement out into code it would have to contain source code to a complete instantiation of TDT. When the problem statement is run, TDT or TDT-prime can look at that instantiation and compare it to its own source code. TDT will see that they're the same, but TDT-prime will notice that they are different, and thereby infer that it is not the simulated copy. (Any difference whatsoever is proof of this.)

Consider an alternative problem. Omega flips a coin, and asks you to guess what it was, with a prize if you guess correctly. If the coin was heads, he shows you a piece of paper with TDT's source code. If the coin was tails, he shows you a piece of paper with your source code, whatever that is.

I'm not sure the part about comparing source code is correct. TDT isn't supposed to search for exact copies of itself, it's supposed to search for parts of the world that are logically equivalent to itself.

The key thing is the question as to whether it could have been you that has been simulated. If all you know is that you're a TDT agent and what Omega simulated is a TDT agent, then it could have been you. Therefore you have to act as if your decision now may either real or simulated. If you know you are not what Omega simulated (for any reason), then you know that you only have to worry about the 'real' decision.

Suppose that Omega doesn't reveal the full source code of the simulated TDT agent, but just reveals enough logical facts about the simulated TDT agent to imply that it uses TDT. Then the "real" TDT Prime agent cannot deduce that it is different.

Yes. I think that as long as there is any chance of you being the simulated agent, then you need to one box. So you one box if Omega tells you 'I simulated some agent', and one box if Omega tells you 'I simulated an agent that uses the same decision procedure as you', but two box if Omega tells you 'I simulated an agent that had a different copywrite comment in its source code to the comment in your source code'.

This is just a variant of the 'detect if I'm in a simulation' function that others mention. i.e. if Omega gives you access to that information in any way, you can two box. Of course, I'm a bit stuck on what Omega has told the simulation in that case. Has Omega done an infinite regress?

That's an interesting way to look at the problem. Thanks!

Indeed. These are all scenarios of the form "Omega looks at the source code for your decision theory, and intentionally creates a scenario that breaks it." Omega could do this with any possible decision theory (or at last, anything that could be implemented with finite resources), so what exactly are we supposed to learn by contemplating specific examples?

It seems to me that the valuable Omega thought experiments are the ones where Omega's omnipotence is simply used to force the player to stick to the rules of the given scenario. When you start postulating that an impossible, acausal superintelligence is actively working agaisnt you it's time to hang up your hat and go home, because no strategy you could possibly come up with is going to do you any good.

The trouble is when another agent wins in this situation *and* in the situations you usually encounter. For example, an anti-traditional-rationalist, that always makes the opposite choice to a traditional rationalist, will one-box; it just fails spectacularly when asked to choose between different amounts of cake.

You can see that something funny has hapened by postulating TDT-prime, which is identical to TDT except that Omega doesn't recognize it as a duplicate (eg, it differs in some way that should be irrelevant). TDT-prime would two-box, and win.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000. Omega doesn't check what decision theory you're using at all - it just simulates TDT and bases its decision on that. I do think that this ought to fall outside a rigorously defined class of "fair" problems, but it doesn't matter whether Omega can recognise you as a TDT-agent or not.

I don't think so. If TDT-prime two boxes, the TDT simulation two-boxes, so only one box is full, so TDT-prime walks away with $1000.

No, if TDT-prime two boxes, the TDT simulation still one-boxes.

Hmm, so TDT-prime would reason something like, "The TDT simulation will one-box because, not knowing that it's the simulation, but also knowing that the simulation will use exactly the same decision theory as itself, it will conclude that the simulation will do the same thing as itself and so one-boxing is the best option. However, I'm *different* to the TDT-simulation, and therefore I can safely two-box without affecting its decision." In which case, does it matter how inconsequential the difference is? Yep, I'm confused.

I also had thoughts along these lines - variants of TDT could logically separate themselves, so that T-0 one-boxes when it is simulated, but T-1 has proven that T-0 will one-box, and hence T-1 two-boxes when T-0 is the sim.

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

The second difficulty is that for each specific TDT variant, one with algorithm T' say, there will be a specific problematic problem on which T' will do worse than CDT (and indeed worse than all the other variants of TDT) - this is the problem with T' being the exact algorithm running in the sim. So we still don't get the - desirable - property that there is some sensible decision theory called TDT that is optimal across fair problems.

The best suggestion I've heard so far is that we try to adjust the definition of "fairness", so that these problematic problems also count as "unfair". I'm open to proposals on that one...

But a couple of difficulties arise. The first is that if TDT variants can logically separate from each other (i.e. can prove that their decisions aren't linked) then they won't co-operate with each other in Prisoner's Dilemma. We could end up with a bunch of CliqueBots that only co-operate with their exact clones, which is not ideal.

I think this is avoidable. Let's say that there are two TDT programs called Alice and Bob, which are exactly identical except that Alice's source code contains a comment identifying it as Alice, whereas Bob's source code contains a comment identifying it as Bob. Each of them can read their own source code. Suppose that in problem 1, Omega reveals that the source code it used to run the simulation was Alice. Alice has to one-box. But Bob faces a different situation than Alice does, because he can find a difference between his own source code and the one Omega simulated, whereas Alice could not. So Bob can two-box without effecting what Alice would do.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

You might want to look at my follow-up article which discusses a strategy like this (among others). It's worth noting that slight variations of the problem remove the opportunity for such "sneaky" strategies.

Ah, thanks. I had missed that, somehow.

In a prisoners dilemma Alice and Bob affect each others outcomes. In the newcomb problem, Alice affects Bobs outcome, but Bob doesn't affect Alices outcome. That's why it's OK for Bob to consider himself different in the second case as long as he knows he is definitely not Alice (because otherwise he might actually be in a simulation) but not OK for him to consider himself different in the prisoners dilemma.

However, if Alice and Bob play the prisoner's dilemma against each other, the situation is much closer to symmetric. Alice faces a player identical to itself except with the "Alice" comment replaced with "Bob", and Bob faces a player identical to itself except with the "Bob" comment replaced with "Alice". Hopefully, their algorithm would compress this information down to "The other player is identical to me, but has a comment difference in its source code", at which point each player would be in an identical situation.

Why doesn't that happen when dealing with Omega?

Because if Omega uses Alice's source code, then Alice sees that the source code of the simulation is exactly the same as hers, whereas Bob sees that there is a comment difference, so the situation is not symmetric.

So why doesn't that happen in the prisoner's dilemma?

Because Alice sees that Bob's source code is the same as hers except for a comment difference, and Bob sees that Alice's source code is the same as his except for a comment difference, so the situation is symmetric.

Newcomb:

Bob sees that there is a comment difference, so the situation is not symmetric.

Prisoner's Dilemma:

Bob sees that Alice's source code is the same as his except for a comment difference, so the situation is symmetric.

Do you see the contradiction here?

Newcomb, Alice: The simulation's source code and available information is literally exactly the same as Alice's, so if Alice 2-boxes, the simulation will too. There's no way around it. So Alice one-boxes.

Newcomb, Bob: The simulation was in the situation described above. Bob thus predicts that it will one-box. Bob himself is in an entirely different situation, since he can see a source code difference, so if he two-boxes, it does not logically imply that the simulation will two-box. So Bob two-boxes and the simulation one-boxes.

Prisoner's Dilemma: Alice sees Bob's source code, and summarizes it as "identical to me except for a different comment". Bob sees Alice's source code, and summarizes it as "identical to me except for a different comment". Both Alice and Bob run the same algorithm, and they now have the same input, so they must produce the same result. They figure this out, and cooperate.

Ignore Alice's perspective for a second. Why is Bob acting differently? He's seeing the *same code* both times.

Don't ignore Alice's perspective. Bob knows what Alice's perspective is, so since there is a difference in Alice's perspective, there is by extension a difference in Bob's perspective.

Bob looks at the same code both times. In the PD, he treats it as identical to his own. In NP, he treats it as different. Why?

The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.

NP: Bob is looking at Alice, who is looking at Alice, who is looking at Alice, ...

PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, ...

Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the *input* to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.

PD: Bob is looking at Alice, who is looking at Bob, who is looking at Alice, ...

But you said Bob concludes that their decision theories are functionally identical, and thus it reduces to:

PD: TDT is looking at TDT, who is looking at TDT, who is looking at TDT, ...

And yet this does not occur in NP.

EDIT:

The source code that Bob is looking at is the same in each case, but the source code that [the source code that Bob is looking at] is looking at is different in the two situations.

The point is that his judgement of the source code changes, from "some other agent" to "another TDT agent".

Looks like my edit was poorly timed.

Clarifying edit: In both cases, Bob concludes that the source code he is looking at is functionally equivalent to his own. But in NP, Bob treats the input to the program he is looking at as different from his input, whereas in PD, Bob treats the input to the program he is looking at as functionally equivalent to his input.

One way of describing it is that the comment is extra information that is distinct from the decision agent, and that Bob can make use of this information when making his decision.

Oops, didn't see that.

What's the point of adding comments if Bob's just going to conclude their code is functionally identical anyway? Doesn't that mean that you might as well use the same code for Bob and Alice, and call it TDT?

In NP, the comments are to provide Bob an excuse to two-box that does not result in the simulation two-boxing. In PD, the comments are there to illustrate that TDT needs a sophisticated algorithm for identifying copies of itself that can recognize different implementations of the same algorithm.

Do you understand why Bob acts differently in the two situations, now?

In NP, the comments are to provide Bob an

excuseto two-box that does not result in the simulation two-boxing.

I was assuming Bob was an AI, lacking a ghost to look over his code for reasonableness. If he's not, then he isn't strictly implementing TDT, is he?

Bob is an AI. He's programmed to look for similarities between other AIs and himself so that he can treat their action and his as logically linked when it is to his advantage to do so. I was arguing that a proper implementation of TDT should consider Bob's and Alice's decisions linked in PD and nonlinked in the NP variant. I don't really understand your objection.

My objection is that an AI looking at the same question - **is Alice functionally identical to me** - can't look for excuses why they're not *really* the same when this would be useful, if *they actually behave the same way*. His answer should be the same in both cases, because **they are either functionally identical or not.**

The proper question is "In the context of the problems each of us face, is there a logical connection between my actions and Alice's actions?", not "Is Alice functionally identical to me?"

I think those terms both mean the same thing.

For reference, by "functionally identical" I meant "likely to choose the same way I do". Thus, an agent that will abandon the test to eat beans is functionally identical when beans are unavailable.

I guess my previous response was unhelpful. Although "Is Alice functionally identical to me?" is not the question of primary concern, it is a relevant question. But another relevant question is "Is Alice facing the same problem that I am?" Two functionally identical agents facing different problems may make different choices.

In the architecture I've been envisioning, Alice and Bob can classify other agents as "identical to me in both algorithm and implementation" or "identical to me in algorithm, with differing implementation", or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of "functionally identical"). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.

In the prisoners' dilemma, each agent is facing the same problem, that is, "I'm playing a prisoner's dilemma with another agent that is identical to me in algorithm but differing in implementation". So they treat their decisions as linked.

In the Newcomb's problem variant, Alice's problem is "I'm in Newcomb's problem, and the predictor used a simulation that is identical to me in both algorithm and implementation, and which faced the same problem that I am facing." Bob's problem is "I'm in Newcomb's problem, and the predictor used a simulation that is identical to me in algorithm but differing in implementation, and which faced the same situation as Alice." There was a difference in the two problem descriptions even before the part about what problem the simulation faced, so when Bob notes that the simulation faced the same problem as Alice, he finds a difference between the problem that the simulation faced and the problem that he faces.

For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of "functionally identical").

Then why are we talking about "Bob" and "Alice" when they're both just TDT agents?

Because if Bob does not ignore the implementation difference, he ends up with more money in the Newcomb's problem variant.

But there *is* no difference between "Bob looking at Alice looking at Bob" and "Alice looking at Alice looking at Alice". That's the whole point of TDT.

There is a difference. In the first one, the agents have a slight difference in their source code. In the second one, the source code of the two agents is identical.

If you're claiming that TDT does not pay attention to such differences, then we only have a definitional dispute, and by your definition, an agent programmed the way I described would not be TDT. But I can't think of anything about the standard descriptions of TDT that would indicate such a restriction. It is certainly not the "whole point" of TDT.

For now, I'm going to call the thing you're telling me TDT is "TDT1", and I'm going to call the agent architecture I was describing "TDT2". I'm not sure if this is good terminology, so let me know if you'd rather call them something else.

Anyway, consider the four programs Alice1, Bob1, Alice2, and Bob2. Alice1 and Bob1 are implementations of TDT1, and are identical except for having a different identifier in the comments (and this difference changes nothing). Alice2 and Bob2 are implementations of TDT2, and are identical except for having a different identifier in the comments.

Consider the Newcomb's problem variant with the first pair of agents (Alice1 and Bob1). Alice1 is facing the standard Newcomb's problem, so she one-boxes and gets $1,000,000. As far as Bob1 can tell, he also faces the standard Newcomb's problem (there is a difference, but he ignores it), so he one-boxes and gets $1,000,000.

Now consider the same problem, but with all instances of Alice1 replaced with Alice2, and all instances of Bob1 replaced with Bob2. Alice2 still faces the standard Newcomb's problem, so she one-boxes and gets $1,000,000. But Bob2 two-boxes and gets $1,001,000.

The problem seems pretty fair; it doesn't specifically reference either TDT1 or TDT2 in an attempt to discriminate. However, when we replace the TDT1 agents with TDT2 agents, one of them does better and neither of them does worse, which seems to indicate a pretty serious deficiency in TDT1.

Either TDT decides if something is identical based on it's actions, in which case I am right, or it's source code, in which case you are wrong, because such an agent would not cooperate in the Prisoner's Dilemma.

They decide using the source code. I already explained why this results in them cooperating in the Prisoner's Dilemma.

In the architecture I've been envisioning, Alice and Bob can classify other agents as "identical to me in both algorithm and implementation" or "identical to me in algorithm, with differing implementation", or one of many other categories. For each of the two categories I named, they would assume that an agent in that category will make the same decision as they would when presented with the same problem (so they would both be subcategories of "functionally identical"). In both situations, each agent classifies the other as identical in algorithm and differing in implementation.

In the prisoners' dilemma, each agent is facing the same problem, that is, "I'm playing a prisoner's dilemma with another agent that is identical to me in algorithm but differing in implementation". So they treat their decisions as linked.

Wait! I think I get it! In a Prisoner's Dilemma, both agents are facing another agent, whereas in Newcomb's Problem, Alice is facing an infinite chain of herself, whereas Bob is facing an infinite chain of someone else. It's like the "favorite number" example in the followup post.

Yes.

Well that took embarrassingly long.

The right place to introduce the separation is not in between TDT and TDT-prime, but in between TDT-prime's output and TDT-prime's decision. If its output is a *strategy*, rather than a number of boxes, then that strategy can include a byte-by-byte comparison; and if TDT and TDT-prime both do it that way, then they both win as much as possible.

But doesn't that make cliquebots, in general?

I'm thinking hard about this one...

Can all the TDT variants adopt a common *strategy*, but with different execution results, depending on source-code self-inspection and sim-inspection? Can that approach really work in general without creating CliqueBots? Don't know yet without detailed analysis.

Another issue is that Omega is not obliged to reveal the source-code of the sim; it could instead provide some information about the method used to generate / filter the sim code (e.g. a distribution the sim was drawn from) and still lead to a well-defined problem. Each TDT variant would not then know whether it was the sim.

I'm aiming for a follow-up article addressing this strategy (among others).

Can all the TDT variants adopt a common strategy, but with different execution results, depending on source-code self-inspection and sim-inspection?

This sounds equivalent to asking "can a turing machine generate non-deterministically random numbers?" Unless you're thinking about coding TDT agents one at a time and setting some constant differently in each one.

Well, I've had a think about it, and I've concluded that it *would* matter how great the difference between TDT and TDT-prime is. If TDT-prime is almost the same as TDT, but has an extra stage in its algorithm in which it converts all dollar amounts to yen, it should still be able to prove that it is isomorphic to Omega's simulation, and therefore will not be able to take advantage of "logical separation".

But if TDT-prime is different in a way that makes it non-isomorphic, i.e. it sometimes gives a different output given the same inputs, that may still not be enough to "separate" them. If TDT-prime acts the same as TDT, except when there is a walrus in the vicinity, in which case it tries to train the walrus to fight crime, it is still the case in this walrus-free problem that it makes exactly the same choice as the simulation (?). It's as if you need the ability to prove that two agents necessarily give the same output for the particular problem you're faced with, without proving what output those agents actually give, and *that* sure looks crazy-hard.

EDIT: I mean crazy-hard for the general case, but much, much easier for all the cases where the two agents are actually the same.

EDIT 2: On the subject of fairness, my first thoughts: A fair problem is one in which if you had arrived at your decision by a coin flip (which is as transparently predictable as your actual decision process - i.e. Omega can predict whether it's going to come down heads or tails with perfect accuracy), you would be rewarded or punished no more or less than you would be using your actual decision algorithm (and this applies to every available option).

EDIT 3: Sorry to go on like this, but I've just realised that won't work in situations where some other agent bases their decision on whether you're predicting what their decision will be, i.e. Prisoner's Dilemma.

Yep, I'm confused.

Sounds like you have it exactly right.

I think we could generalise problem 2 to be problematic for any decision theory XDT:

There are 10 boxes, numbered 1 to 10. You may only take one. Omega has (several times) run a simulated XDT agent on this problem. It then put a prize in the box which it determined was least likely to be taken by such an agent - or, in the case of a tie, in the box with the lowest index.

If agent X follows XDT, it has at best a 10% chance of winning. Any sufficiently resourceful YDT agent, however, could run a simulated XDT agent themselves, and figure out what Omega's choice was without getting into an infinite loop.

Therefore, YDT performs better than XDT on this problem.

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

You're right about problem 2 being a fully general counterargument, but your philosophical conclusion seems to be stopping too early. For example, can we define a class of "fair" problems that excludes problem 2?

One possible place to look is that we're allowing Omega access not just to a particular simulated decision of TDT, but to the probabilities with which it makes these decisions. If we force it to simulate TDT many times and sample to learn what the probabilities are, it can't detect the exact balance for which it does deterministic symmetry breaking, and the problem goes away.

This solution occurred to me because this forces Omega to have something like a continuous behaviour response to changes in the probabilities of different TDT outputs, and it seems possible given that to imagine a proof that a fixed point must exist.

Fair point - how does Omega tell when the sim's choosing probabilities are exactly equal? Well I was thinking that Omega could prove they are equal (by analysing the simulation's behaviour, and checking where it calls on random bits). Or if it can't do that, then it can just check that the choice frequencies are "statistically equal" (i.e. no significant differences after a billion runs, say) and treat them as equal for the tie-breaker rule. The "statistically equal" approach might give the TDT agent a very slightly higher than 10% chance of winning the money, though I haven't analysed this in any detail.

If the subject can know the exact code of TDT, Omega can know the exact code of TDT, and analyse it however it likes. That means it can know exactly where randomness is invoked - why would it have to sample?

This was my first thought: Omega can just prove the choosing probabilities are equal. However, it's not totally straightforward, because the sim could sample more random bits depending on the results of its first random bits, and so on, leading to an exponentially growing outcome tree of possibilities, with no upper size bound to the length of the tree. There might not be an easy proof of equality in that case. Sampling and statistical equality is the next best approach...

It looks like the issue here is that while Omega is ostensibly not taking into account your decision theory, it implicitly is by simulating an XDT agent. So a first patch would be to define simulations of a specific decision theory (as opposed to simulations of a given agent) as "unfair".

On the other hand, we can't necessarily know if a given computation is effectively equivalent to simulating a given decision theory. Even if the string "TDT" is never encoded anywhere in Omega's super-neurons, it might still be simulating a TDT agent, for example.

On the first hand again, it might be easy for most problems to figure out whether anyone is implicitly favouring one DT over another, and thus whether they're "fair".

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

I would say that any such problem doesn't show that there is no best decision theory, it shows that that class of problem cannot be used in the ranking.

Edited to add: Unless, perhaps, one can show that an instantiation of the problem with particular choice of (in this case decision theory, but whatever is varied) is particularly likely to be encountered.

To draw out the analogy to Godelian incompleteness, any computable decision theory is subject to the suggested attack of being given a "Godel problem'' like problem 1, just as any computable set of axioms for arithmetic has a Godel sentence. You can always make a new decision theory TDT' that is TDT+ do the right thing for the Godel problem. But TDT' has it's own Godel problem of course. You can't make a computable theory that says "do the right thing for all Godel probems", if you try to do that it would not give you something computable. I'm sure this is all just restating what you had in mind, but I think it's worth spelling out.

If you have some sort of oracle for the halting problem (i.e. a hypercomputer) and Omega doesn't, he couldn't simulate you, so you would presumably be able to always win fair problems. Otherwise the best thing you could hope for is to get the right answer whenever your computation halts, but fail to halt in your computation for some problems, such as your Godel problem. (A decision theory like this can still be given a Godel problem if Omega can solve the halting problem, "I simulated you and if you fail to halt on this problem..."). I wonder if TDT fails to halt for its Godel problem, or if some natural modification of it might have this property, but I don't understand it well enough to guess.

I am less optimistic about revising "fair" to exclude Godel problems. The analogy would be proving Peano arithmetic is complete "except for things that are like Godel sentences." I don't know of any formalizations of the idea of "being a Godel sentence".

If I'm right, we may have shown the impossibility of a "best' decision theory, no matter how meta you get (in a close analogy to Godelian incompleteness). If I'm wrong, what have I missed?

You're right. However, since all decision theories fail when confronted with their personal version of this problem, but may or may not fail in other problems, then some decision theories may be better than others. The one that is better than all the others is thus the "best" DT.

My sense is that question 6 is a better question to ask than 5. That is, what's important isn't drawing some theoretical distinction between fair and unfair problems, but finding out what problems we and/or our agents will actually face. To the extent that we are ignorant of this now but may know more in the future when we are smarter and more powerful, it argues for not fixing a formal decision theory to determine our future decisions, but instead making sure that we and/or our agents can continue to reason about decision theory the same way we currently can (i.e., via philosophy).

Consider **Problem 3**: Omega presents you with two boxes, one of which contains $100, and says that it just ran a simulation of *you* in the present situation and put the money in the box the simulation didn't choose.

This is a standard diagonal construction, where the environment is set up so that you are punished for the actions you choose, and rewarded for those of don't choose, irrespective of the actions. This doesn't depend on the decision algorithm you're implementing. A possible escape strategy is to make yourself unpredictable to the environment. The difficulty would also go away if the thing being predicted wasn't you, but something else you could predict as well (like a different agent that doesn't simulate you).

The correct solution to this problem is to choose each box with equal probability; this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money. Or I try to guess what escape route you'll use and post all the guards there.

What's interesting about Problem 2 is that it makes what would be the normal game-theoretic strategy unstable by choosing deterministically where the probabilities are exactly equal.

this problem is the reason why decision theories have to be non-deterministic. It comes up all the time in real life: I try and guess what safe combination you chose, try that combination, and if it works I take all your money.

Of course, you can just set up the thought experiment with the proviso that "be unpredictable" is not a possible move - in fact that's the whole point of Omega in these sorts of problems. If Omega's trying to break into your safe, he takes your money. In Nesov's problem, if you can't make yourself unpredictable, then you win *nothing* - it's not even worth your time to open the box. In both cases, a TDT agent does strictly as well as it possibly could - the fact that there's $100 somewhere in the vicinity doesn't change that.

BTW, general question about decision theory. There appears to have been an academic study of decision theory for over a century, and causal and evidential decision theory were set out in 1981. Newcomb's paradox was set out in 1969. Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT, and it seems as if there is a 100% disconnect between the community exploring new theories (which is centered around LW) and the academic decision theory community. This seems really, really odd - what's going on?

Yet it seems as though no-one thought to explore the space beyond these two decision theories until Eliezer proposed TDT...

This is simply not true. Robert Nozick (who introduced Newcomb's problem to philosophers) compared/contrasted EDT and CDT at least as far back as 1993. Even back then, he noted their inadequacy on several decision-theoretic problems and proposed some alternatives.

Me being ignorant of something seemed like a likely part of the explanation - thanks :) I take it you're referencing "The Nature of Rationality"? Not read that I'm afraid. If you can spare the time I'd be interested to know what he proposes -thanks!

I haven't read *The Nature of Rationality* in quite a long time, so I won't be of much help. For a very simple and short introduction to Nozick's work on decision theory, you should read this (PDF).

There were plenty of previous theories trying to go beyond CDT or EDT, they just weren't satisfactory.

This paper talks about reflexive decision models and claims to develop a form of CDT which one boxes.

It's in my to-read list but I haven't got to it yet so I'm not sure whether it's of interest but I'm posting it just in case (it could be a while until I have time to read it so I won't be able to post a more informed comment any time soon).

Though this theory post-dates TDT and so isn't interesting from *that* perspective.

Dispositional decision theory :P

... which I cannot find a link to the paper for, now. Hm. But basically it was just TDT, with less awareness of why.

EDIT: Ah, here it was. Credit to Tim Tyler.

I checked it. Not the same thing.

It should be noted that Newcomb's problem was considered interesting in Philosophy in 1969, but decision theories were studied more in other fields - so there's a disconnect between the sorts of people who usually study formal decision theories and that sort of problem.

(Deleting comments seems not to be working. Consider this a manual delete.)

Decision Theory is and can be applied to a variety of problems here. It's just that AI may face Newcomb-like problems and in particular we want to ensure a 1-boxing-like behavior on the part of AI.

The rationale for TDT-like decision theories is even more general, I think. There's no guarantee that our world contains only one copy of something. We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

We want a decision theory that would let the AI cooperate with its copies or logical correlates, rather than wage pointless wars.

Constructing rigorous mathematical foundation of decision theory to explain what a decision problem or a decision or a goal are, is potentially more useful than resolving any given informally specified class of decision problems.

What is an example of such a real-world problem?

Negotiations with entities who can read the AI's source code.

Given the week+ delay in this response, it's probably not going to see much traffic, but I'm not convinced "reading" source code is all that helpful. Omega is posited to have nearly god-like abilities in this regard, but since this is a rationalist discussion, we probably have to rule out actual omnipotence.

If Omega intends to simply run the AI on spare hardware it has, then it has to be prepared to validate (in finite time and memory) that the AI hasn't so obfuscated its source as to be unintelligible to rational minds. It's also possible that the source to an AI is rather simple but it is dependent a large amount of input data in the form of a vast sea of numbers. I.e., the AI in question could be encoded as an ODE system integrator that's reliant on a massive array of parameters to get from one state to the next. I don't see why we should expect Omega to be better at picking out the relevant, predictive parts of these numbers than we are.

If the AI can hide things in its code or data, then it can hide functionality that tests to determine if it is being run by Omega or on its own protected hardware. In such a case it can lie to Omega just as easily as Omega can lie to the "simulated" version of the AI.

I think it's time we stopped positing an omniscient Omega in these complications to Newcomb's problem. They're like epicycles on Ptolemaic orbital theory in that they continue a dead end line of reasoning. It's better to recognize that Newcomb's problem is a red herring. Newcomb's problem doesn't demonstrate problems that we should expect AI's to solve in the real world. It doesn't tease out meaningful differences between decision theories.

That is, what decisions on real-world problems do we expect to be different between two AIs that come to different conclusions about Newcomb-like problems?

You should note that every problem you list is a special case. Obviously, there are ways of cheating at Newcomb's problem if you're aware of salient details beforehand. You could simply allow a piece of plutonium to decay, and do whatever the resulting Geiger counter noise tells you to. That does not, however, support your thesis that Newcomb's problem is a totally artificial problem with no logical intrusions into reality.

As a real-world example, imagine an off-the-shelf stock market optimizing AI. Not sapient, to make things simpler, but smart. When any given copy begins running, there are already hundreds or thousands of near-identical copies running elsewhere in the market. If it fails to predict their actions from its own, it will do objectively worse than it might otherwise do.

i don't see how your example is apt or salient. My thesis is that Newcomb-like problems are the wrong place to be testing decision theories because they do not represent realistic or relevant problems. We should focus on formalizing and implementing decision theories and throw real-world problems at them rather than testing them on arcane logic puzzles.

Well... no, actually. A good decision theory ought to be universal. It ought to be correct, and it ought to work. Newcomb's problem is important, not because it's ever likely to happen, but because it shows a case in which the normal, commonly accepted approach to decision theory (CDT) failed miserably. This 'arcane logic puzzle' is illustrative of a deeper underlying flaw in the model, which needs to be addressed. It's also a flaw that'd be much harder to pick out by throwing 'real world' problems at it over and over again.

Seems unlikely to work out to me. Humans evolved intelligence without Newcomb-like problems. As the only example of intelligence that we know of, it's clearly possible to develop intelligence without Newcomb-like problems. Furthermore, the general theory seems to be that AIs will start dumber than humans and iteratively improve until they're smarter. Given that, why are we so interested in problems like these (which humans don't universally agree about the answers to)?

I'd rather AIs be able to help us with problems like "what should we do about the economy?" or even "what should I have for dinner?" instead of worrying about what we should do in the face of something godlike.

Additionally, human minds aren't universal (assuming that universal means that they give the "right" solutions to all problems), so why should we expect AIs to be? We certainly shouldn't expect this if we plan on iteratively improving our AIs.

Harsh crowd.

It might be nice to be able to see the voting history (not the voters' names, but the number of up and down votes) on a comment. I can't tell if my comments are controversial or just down-voted by two people. Perhaps even just the number of votes would be sufficient (e.g. -2/100 vs. -2/2).

If it helps: it's a fairly common belief in this community that a general-purpose optimization tool is both far superior to, and more interesting to talk about, than a variety of special-purpose tools.

Of course, that doesn't mean you have to be interested in general-purpose optimization tools; if you're more interested in decision theory for dinner-menu or economic planners, by all means post about that if you have something to say.

But I suspect there are relatively few communities in which "why are you all so interested in such a stupid and uninteresting topic?" will get you much community approval, and this isn't one of them.

I'm interested in general purpose optimizers, but I bet that they will be evolved from AIs that were more special purpose to begin with. E.g., IBM Watson moving from Jeopardy!-playing machine to medical diagnostic assistant with a lot of the upfront work being on rapid NLP for the J! "questions".

Also, there's no reason that I've seen here to believe that Newcomb-like problems give insights into how to develop to decision theories that allow us to solve real-world problems. It seems like arguing about corner cases. Can anyone establish a practical problem that TDT fails to solve because it fails to solve these other problems?

Beyond this, my belief is that without formalization and programming of these decision frameworks, we learn very little. Asking what does xDT do in some abstract situation, so far, seems very handy-wavy. Furthermore, it seems to me that the community is drawn to these problems because they are deceptively easy to state and talk about online, but minds are inherently complex, opaque, and hard to reason about.

I'm having a hard time understanding how correctly solving Newcomb-like problems is expected to advance the field of general optimizers. It seems out of proportion to the problems at hand to expect a decision theory to solve problems of this level of sophistication when the current theories don't seem to obviously "solve" questions like "what should we have for lunch?". I get the feeling that supporters of research on these theories assume that, of course, xDT can solve the easy problems so let's do the hard ones. And, I think evidence for this assumption is very lacking.

That's fair.

Again, if you are interested in more discussion about automated optimization on the level of "what should we have for lunch?" I encourage you to post about it; I suspect a lot of other people are interested as well.

Yeah, I might, but here I was just surprised by the down-voting for contrary opinion. It seems like the thing we ought to foster not hide.

As I tried to express in the first place, I suspect what elicited the disapproval was not the contrary opinion, but the rudeness.

Sorry. It didn't seem rude to me. I'm just frustrated with where I see folks spending their time.

My apologies to anyone who was offended.

This seems really, really odd - what's going on?

What you'd expect? The usual: half educated people making basic errors, not making sure their decision theories work on 'trivial' problems, not doing due work to find flaws in own ideas, hence announcing solutions to hard problems that others don't announce. Same as asking why only some coldfusion community solved world's energy problems.

edit: actually, in all fairness, I think there can be not bad ideas to explore in work you see on LW. It is just that what you see normally published as 'decision theory' is pretty well formalized and structured in such a way that one wouldn't have to search enormous space of possible flaws and possible steel-man and possible flaws in steel-man etc, to declare something invalid (that is the point of writing things formally and making mathematical proofs, that you can expect to see if its wrong). I don't see any to-the-point formal papers on TDT here.

Crackpot Decision Theories popular around here do not solve any real problem arising from laws of causality operating normally, so there's no point studying them seriously.

Your question is like asking why there's no academic interest in Harry Potter Physics or Geography of Westros.

Err, this would also predict no academic interest in Newcomb's Problem, and that isn't so.

Not counting philosophers, where's this academic interest in Newcomb's paradox?

Why are we not counting philosophers? Isn't that like saying, "Not counting physicists, where's this supposed interest in gravity?"

I think taw's point was that Newcomb's Problem has no practical applications, and would answer your question by saying that engineers are very interested in gravity. My answer to taw would be that Newcomb's Problem is just an abstraction of Prisoner's Dilemma, which is studied by economists, behavior biologists, evolutionary psychologists, and AI researchers.

Prisoner's Dilemma relies on causality, Newcomb's Paradox is anti-causality. They're as close to each other as astronomy and astrology.

I actually have some sympathy for your position that Prisoner's Dilemma is useful to study, but Newcomb's Paradox isn't. The way I would put it is, as the problems we study increase in abstraction from real world problems, there's the benefit of isolating particular difficulties and insights, and making it easier to make theoretical progress, but also the danger that the problems we pay attention to are no longer relevant to the actual problems we face. (See another recent comment of mine making a similar point.)

Given that we have little more than intuition to guide on us on "how much abstraction is too much?", it doesn't seems unreasonable for people to disagree on this topic and and pursue different approaches, as long as the the possibility of real-world irrelevance isn't completely overlooked.

Prisoner's Dilemma relies on causality, Newcomb's Paradox is anti-causality.

The contents of Newcomb's boxes are caused by the kind of agent you are -- which are (effectively by definition of what 'kind of agent' means) mapped directly to what decision you will take.

Newcomb's paradox can only be called anti-causality only in some confused anti-compatibilist sense in which determinism is opposed to free will and therefore "the kind of agent you are" must be opposed to "the decisions you make" -- instead of absolutely correlating to them.

In what way is Newcomb's Problem "anti-causality"?

If you don't like the superpowerful predictor, it works for human agents as well. Imagine you need to buy something but don't have cash on you, so you tell the shopkeeper you'll pay him tomorrow. If he thinks you're telling the truth, he'll give you the item now and let you come back tomorrow. If not, you lose a day's worth of use, and so some utility.

So your best bet (if you're selfish) is to tell him you'll pay tomorrow, take the item, and never come back. But what if you're a bad liar? Then you'll blush or stammer or whatever, and you won't get your good.

A regular Causal agent, however, having taken the item, will not come back the next day - and you know it, and it will show on your face. So in order to get what you want, you have to actually *be* the kind of person who respects their past selves decisions - a TDT agent, or a CDT agent with some pre-commitment system.

The above has the same attitude to causality as Newcomb's Problem - specifically, it includes another agent rewarding you based that agent's calculations of your future behaviour. But it's a situation I've been in several times.

EDIT: Grammar.

This example is much like Parfit's Hitchhiker in less extreme form.

Prisoner's Dilemma relies on causality, Newcomb's Paradox is anti-causality.

So, you consider this notion of "causality" more important than actually succeeding? If I showed up in a time machine, would you complain I was cheating?

Also, dammit, karma toll. Sorry, anyone who wants to answer me.

**[deleted]**· 2012-05-28T16:09:28.131Z · score: 3 (3 votes) · LW · GW

"Not counting physicists, where's this supposed interest in gravity?"

Engineering.

Philosophy contains some useful parts, but it also contains massive amounts of bullshit. Starting let's say here.

Decision theory is studied very seriously by mathematicians and others, and they don't care at all for Newcomb's Paradox.

Newcomb himself was not a philosopher.

I think Newcomb introduced it as a simplification of the prisoner's dilemma. The game theory party line is that you should 2-box and defect. But the same logic says that you should defect in iterated PD, if the number of rounds is known. This third problem is popular in academia, outside of philosophy. It is not so popular in game theory, but the game theorists admit that it is problematic.

**[deleted]**· 2012-05-28T16:34:37.501Z · score: 0 (6 votes) · LW · GW

CDT eats the donut "just this once" every time and gets fat. TDT says "I shouldn't eat donuts" and does not get fat.

**[deleted]**· 2012-05-29T11:15:54.612Z · score: 6 (8 votes) · LW · GW

You might want to link to http://lesswrong.com/lw/4sh/how_i_lost_100_pounds_using_tdt/.

**[deleted]**· 2012-05-29T17:56:26.552Z · score: 2 (4 votes) · LW · GW

Or I can just lazily allude to it and then upvote you for linking it.

**[deleted]**· 2012-05-29T18:30:23.539Z · score: 2 (2 votes) · LW · GW

Yeah, I guessed that you were alluding to it, but I thought that people who hadn't read it wouldn't get the allusion.

TDT says "I shouldn't eat donuts" and does not get fat.

The deontological agent might say that. The TDT agent just decides "I will not eat this particular donut now" and it so happens that it would also to make decisions not to eat other donuts in similar circumstances.

The use of the term TDT or "timeless" is something that gets massively inflated to mean anything noble sounding. All because there is one class of contrived circumstance in which the difference between CDT and TDT is that TDT will cooperate.

**[deleted]**· 2012-05-29T11:15:15.368Z · score: 2 (2 votes) · LW · GW

It might not be rigorous, but it's still a good analogy IMO. Akrasia can be seen as you and your future self playing a non-zero-sum game, which in some cases has PD-like payoffs.

**[deleted]**· 2012-05-28T17:08:02.050Z · score: 2 (4 votes) · LW · GW

The TDT agent just decides "I will not eat this particular donut now" and it so happens that it would also to make decisions not to eat other donuts in similar circumstances.

right. I was being a bit messy with describing the TDT thought process. The point is that TDT considers all donut-decisions as a single decision.

**[deleted]**· 2012-05-28T16:08:44.735Z · score: 0 (2 votes) · LW · GW

Crackpot Decision Theories popular around here do not solve any real problem arising from laws of causality operating normally, so there's no point studying them seriously.

Yeah, assuming an universe where causality only goes forward in time *and where your decision processes are completely hidden from outside*, CDT works; but humans are not perfect liars, so they leak out information about the decision they're about to make *before* they start to consciously act upon it, so the assumptions of CDT are only approximately true, and in some cases TDT may return better results.

I think it's right to say that these aren't really "fair" problems, but they are unfair in a very interesting new way that Eliezer's definition of fairness doesn't cover, and it's not at all clear that it's possible to come up with a nice new definition that avoids this class of problem. They remind me of "Lucas cannot consistently assert this sentence".

Problem 2 reminds me strongly of playing GOPS.

For those who aren't familiar with it, here's a description of the game. Each player receives a complete suit of standard playing cards, ranked Ace low through King high. Another complete suit, the diamonds, is shuffled (or not, if you want a game of complete information) and put face down on the table; these diamonds have point values Ace=1 through King=13. In each trick, one diamond is flipped face-up. Each player then chooses one card from their own hand to bid for the face-up diamonds, and all bids are revealed simultaneously. Whoever bids highest wins the face-up diamonds, but if there is a tie for the highest bid (even when other players did not tie), then no one wins them and they remain on the table to be won along with the next trick. All bids are discarded after every trick.

Especially when the King comes up early, you can see everyone looking at each other trying to figure out how many levels deep to evaluate "What will the other players do?".

(1) Play my King to be likely to win. (2) Everyone else is likely to do (1) also, which will waste their Kings. So instead play low while they throw away their Kings. (3) If the players are paying attention, they might all realize they should (2), in which case I should play highest low card - the Queen. (4+) The 4th+ levels could repeat (2) and (3) mutatis mutandis until every card has been the optimal choice at some level. In practice, players immediately recognize the futility of that line of thought and instead shift to the question: How far down the chain of reasoning are the other players likely to go? And that tends to depend on knowing the people involved and the social context of the game.

Maybe playing GOPS should be added to the repertoire of difficult decision theory puzzles alongside the prisoner's dilemma, Newcomb's problem, Pascal's mugging, and the rest of that whole intriguing panoply. We've had a Prisoner's Dilemma competition here before - would anyone like to host a GOPS competition?

I'm going to play this game at LW meetups in future. Hopefully some insights will arise out of it.

I also think I might try to generalise this kind of problem, in the vein of trolley problems being a generalisation of some types of decisions and Parfit's Hitchhiker being a generalisation of precommittment-favouring situations.

The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.

The more I think about it, the more interesting these problems get! Problem 1 seems to re-introduce all the issues that CDT has on Newcomb's Problem, but for TDT. I first thought to introduce the ability to 'break' with past selves, but that doesn't actually help with the simulation problem.

It did lead to a cute observation, though. Given that TDT cares about all sufficiently accurate simulations of itself, *it's actually winning*.

- It one-boxes in Problem 1; thus ensuring that its simulacrum one-boxed in Omega's pre-game simulation, so TDT walked away with $2,000,000 (whereas CDT, unable to derive utility from a simulation of TDT, walked away with $1,001,000.) This is proofed against increasing the value of the second box; TDT still gains at least 1 dollar more (when the second box is $999,999), and simply two-boxes when the second box is as or more valuable.
- In Problem 2, it picks in such a way that Omega must run at least 10 trials and the game itself; this means 11 TDT agents have had a 10% shot at $1,000,000. With an expected value of $1,100,000 it is doing better than the CDT agents walking away with $1,000,000.

It doesn't seem very relevant, but I think if we explored Richard's point that we need to actually formalise this, we'd find that any simulation high-fidelity enough to actually bind a TDT agent to its previous actions would necessarily give the agent the utility from the simulations, and vice versa, any simulation not accurate enough to give utility would be sufficiently different from TDT to allow our agent to two-box when that agent one-boxed.

**[deleted]**· 2012-05-23T11:15:39.291Z · score: 8 (8 votes) · LW · GW

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

**[deleted]**· 2012-05-28T09:31:42.063Z · score: 1 (1 votes) · LW · GW

Omega is supposed to be always truthful, so either he rewards the sims as well, or you know something the sims don't and hence it's not obvious you'll do the same as them.

**[deleted]**· 2012-05-28T10:15:10.034Z · score: 0 (0 votes) · LW · GW

I thought Omega was allowed to lie to sims.

Even if he's not, after he's given a $1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

**[deleted]**· 2012-05-28T11:14:57.479Z · score: 1 (1 votes) · LW · GW

If he can lie to sims, then you can't know he's not lying to you unless you know you're not a sim. If you do, it's not obvious you'd choose the same way as if you didn't.

For instance, if you think Omega is lying and completely ignore everything he says, you obviously two-box.

Why not zero-box in this case? I mean, what reason would I have to expect any money at all?

Well, as long as you believe Omega enough to think no box contains sudden death or otherwise negative utility, you'd open them to see what was inside. But yes, you might not believe Omega at all.

General question: suppose we encounter an alien. We have no idea what its motivations, values, goals, or abilities are. On the other hand, if may have observed any amount of human comm traffic from wireless EM signals since the invention of radio, and from actual spy-probes before the human invention of high tech that would detect them.

It signals us in Morse code from its remote starship, offering mutually benefitial trade.

What prior should we have about the alien's intention? Should we use a native uniform prior that would tell us it's as likely to mean us good as harm, and so never reply because we don't know how it will try to influence our actions via communications? Should it tell us different agents who don't explicitly value one another will conflict to the extent their values differ, and so since value-space is vast and a randomly selected alien is unlikely to share many values with us, we should prepare for war? Should it tell us we can make some assumptions (which?) about naturally evolved agents or their Friendly-to-themselves creations? How safe are we if we try to "just read" English text written by an unknown, possibly-superintelligence which may have observed all our broadcast traffic since the age of radio? What does our non-detection of this alien civ until they chose to initiate contact tell us? Etc.

A 50% chance of meaning us good vs harm isn't a prior I find terribly compelling.

There's a lot to say here, but my short answer is that this is both an incredibly dangerous and incredibly valuable situation, in which both the potential opportunity costs and the potential actual costs are literally astronomical, and in which there are very few things I can legitimately be confident of.

The best I can do in such a situation is to accept that my best guess is overwhelmingly likely to be wrong, but that it's slightly *less* likely to be wrong than my second-best guess, so I should operate on the basis of my best guess *despite* expecting it to be wrong. Where "best guess" here is the thing I consider most likely to be true, *not* the thing with the highest expected value.

I should also note that my priors about aliens in general -- that is, what I consider likely about a randomly selected alien intelligence -- are less relevant to this scenario than what I consider likely about *this particular* intelligence, given that it has observed us for long enough to learn our language, revealed itself to us, communicated with us in Morse code, offered mutually beneficial trade, etc.

The most tempting belief for me is that the alien's intentions are essentially similar to ours. I can even construct a plausible sounding argument for that as my best guess... we're the only *other* species I know capable of communicating the desire for mutually beneficial trade in an artificial signalling system, so our behavior constitutes strong evidence for their behavior. OTOH, it's pretty clear to me that the *reason* I'm tempted to believe that is because I can *do* something with that belief; it gives me a lot of traction for thinking about what to do next. (In a nutshell, I would conclude from that assumption that it means to exploit us for its long-term benefit, and whether that's good or bad for us depends entirely on what our most valuable-to-it resources are and how it can most easily obtain them and whether we benefit from that process.) Since that has almost nothing to do with the *likelihood* of it being true, I should distrust my desire to believe that.

Ultimately, I think what I do is reply that I value mutually beneficial trade with them, but that I don't actually trust them and must therefore treat them as a potential threat until I have gathered more information about them, while at the same time refraining from doing anything that would significantly reduce our chances of engaging in mutually beneficial trade in the future, and what do they think about all that?

I thought Omega was allowed to lie to sims.

He can certainly give them counterfactual 'realities'. It would seem that he should be assumed to at least provide counterfactual realities wherein information provided by the simulation's representation of Omega indicates that he is perfectly trustworthy.

Even if he's not, after he's given a $1m simulated reward, does he then have to keep up a simulated environment for the sim to actually spend the money?

No. But if for whatever reason the simulated environment persists it should be one that is consistent with Omega keeping his word. Or, if part of the specification of the problem or the declarations made by Omega directly pertain to claims about what He will do regarding simulation then he will implement that policy.

Omega (who experience has shown is always truthful)

Omega doesn't need to simulate the agent actually getting the reward. After the agent has made its choice, the simulation can just end.

If we are assuming that Omega is trustworthy, then Omega needs to be assumed to be trustworthy in the simulation too. If they didn't allow the simulated version of the agent to enjoy the fruits of their choice, then they would not be trustworthy.

Actually, I'm not sure this matters. If the simulated agent knows he's not getting a reward, he'd still want to choose so that the nonsimulated version of himself gets the best reward.

So the problem is that the best answer is unavailable to the simulated agent: in the simulation you should one box and in the 'real' problem you'd like to two box, but you have no way of knowing whether you're in the simulation or the real problem.

Agents that Omega didn't simulate don't have the problem of worrying whether they're making the decision in a simulation or not, so two boxing is the correct answer for them.

The decisions being made are very different between an agent that has to make the decision twice and the first decision will affect the payoff of the second versus an agent that has to make the decision only once, so I think that in reality perhaps the problem does collapse down to an 'unfair' one because the TDT agent is presented with an essentially different problem to a nonTDT agent.

Then the simulated TDT agent will one-box in Problem 1 so that the real TDT agent can two-box and get $1,001,000. The simulated TDT agent will pick a box randomy with a uniform distribution in Problem 2, so that the real TDT agent can select box 1 like CDT would.

(If the agent is not receiving any reward, it will act in a way that maximises the reward agents sufficiently similar to it would receive. In this situation of 'you get no reward', CDT would be completely indifferent and could not be relied upon to set up a good situation for future actual CDT agents.)

Of course, this doesn't work if the simulated TDT agent is not aware that it won't receive a reward. This strays pretty close to "Omega is all-powerful and out to make sure you lose"-type problems.

Of course, this doesn't work if the simulated TDT agent is not aware that it won't receive a reward.

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work.

This strays pretty close to "Omega is all-powerful and out to make sure you lose"-type problems.

Yeah, it doesn't seem right to me that the decision theory being tested is used in the setup of the problem. But I don't think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of "unfair".

I don't think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of "unfair".

It only seems that way because you're thinking from the non-simulated agents point of view. How do you think you'd feel if you were a simulated agent, and after you made your decision Omega said 'Ok, cheers for solving that complicated puzzle, I'm shutting this reality down now because you were just a simulation I needed to set a problem in another reality'. That sounds pretty unfair to me. Wouldn't you be saying 'give me my money you cheating scum'?

And as has been already pointed out, they're very different problems. If Omega actually is trustworthy, integrating across all the simulations gives infinite utility for all the (simulated) TDT agents and a total $1001000 utility for the (supposedly non-simulated) CDT agent.

It only seems that way because you're thinking from the non-simulated agents point of view. How do you think you'd feel if you were a simulated agent, and after you made your decision Omega said 'Ok, cheers for solving that complicated puzzle, I'm shutting this reality down now because you were just a simulation I needed to set a problem in another reality'. That sounds pretty unfair to me. Wouldn't you be saying 'give me my money you cheating scum'?

We were discussing if it is a "fair" test of the decision theory, not if it provides a "fair" experience to any people/agents that are instantiated within the scenario.

And as has been already pointed out, they're very different problems. If Omega actually is trustworthy, integrating across all the simulations gives infinite utility for all the (simulated) TDT agents and a total $1001000 utility for the (supposedly non-simulated) CDT agent.

I am aware that they are different problems. That is why the version of the problem in which simulated agents get utility that the real agent cares about does nothing to address the criticism of TDT that it loses in the version where simulated agents get no utility. Postulating the former in response to the latter was a fail in using the Least Convenient Possible World.

The complaints about Omega being untrustworthy are weak. Just reformulate the problem so Omega says to all agents, simulated or otherwise, "You are participating in a game that involves simulated agents and you may or may not be one of the simulated agents yourself. The agents involved in the game are the following: <describes agents' roles in third person>".

The complaints about Omega being untrustworthy are weak. Just reformulate the problem so Omega says to all agents, simulated or otherwise, "You are participating in a game that involves simulated agents and you may or may not be one of the simulated agents yourself. The agents involved in the game are the following: <describes agents' roles in third person>".

Good point.

That clears up the summing utility across possible worlds possibility, but it still doesn't address the fact that the TDT agent is being asked to (potentially) make two decisions while the non-TDT agent is being asked to make only one. That seems to me to make the scenario unfair (it's what I was trying to get at in the 'very different problems' statement).

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work.

This raises an interesting problem, actually. Omega could pose the following question:

Here are two boxes, A and B; you may choose either box, or take both. You are in one of two states of nature, with equal probability: one possibility is that you're in a simulation, in which case you will receive no reward, no matter what you choose. The other possibility is that a simulation of this problem was presented to

an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please make your choice.

The solution for a TDT agent seems to be choosing box B, but there may be similar games where it makes sense to run a mixed strategy. I don't think that it makes much sense to rule out the possibility of running mixed strategies across simulations, because in most models of credible precommitment the other players do *not* have this kind of foresight (although Omega possibly does).

And yes, it is still the case that a CDT agent can outperform TDT, as long as the TDT agent knows that *if* she is in a simulation, her choice will influence a real game played by a TDT, with some probability. Nevertheless, as the probability of "leaking" to CDT increases, it does become more profitable (AIUI) for TDT to two-box with low probability.

The simulated TDT agent is not aware that it won't receive a reward, and therefore it does not work. ... I don't think that the ability to simulate without rewarding the simulation is what pushes it over the threshold of "unfair".

I do agree. I think my previous post was still exploring the "can TDT break with a simulation of itself?" question, which is interesting but orthogonal.

**[deleted]**· 2012-05-23T11:34:34.437Z · score: 2 (2 votes) · LW · GW

Corollary: Omega can statically analyse the TDT agent's decision algorithm.

"Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. ...."

This needs some serious mathematics underneath it. Omega is supposed to run a simulation of how an agent of a certain sort handled a certain problem, the result of that simulation being a part of the problem itself. I don't think it's possible to tell, just from these English words, that there is a solution to this fixed-point formulation. And TDT itself hasn't been formalised, although I assume there are people (Eliezer? Marcello? Wei Dai?) working on that.

Cf. the construction of Gödel sentences: you can't just assume that a proof-system can talk about itself, you have to explicitly construct a way for it to talk about itself and show precisely what "talking about itself" means, before you can do all the cool stuff about undecidable sentences, Löb's theorem, and so on.

This seems well-specified to me: Since the agent is not told its own output in advance, it is possible to run the "simulation" and the "real version" in finite time. If you hand me a computer program that is the agent, I will hand you a computer program that is Omega and the environment.

Here is a useful program:

if (being_simulated()) pick_box_b(); else pick_both_boxes();

being_simulated() is an interesting function; it would presumably consist of things like comparing star maps across as broad a range of EM frequencies as possible with historical knowledge, the positions of planetary bodies, Earthly satellites, quasar signatures, and inputs of other individuals and networks. It could also be as simple as "if (get_current_average_cosmic_ray_bit_flips() != historical_bit_flips_average) return true;" Since my brain tries to play Omega with me every night (well, on the nights I get good sleep) I think we all have some practical experience with being_simulated(). While our brains are nowhere near as capable as Omega they are fairly good at suspending our own disbelief during dreams . What generally breaks disbelief is trying to behave rationally, including observing/testing the environment which in reality would be the outside world but in a dream is simulated. Ultimately Omega is forced to simulate a very large subset of the Universe in order to cause being_simulated() to return false, or to alter the program itself so that being_simulated vacuously returns true.

To prevent the latter attack the program should execute a fully homomorphically encrypted sub-function and base all its decisions on the outputs of that function. In other words its true decision theory should be hidden even from the hardware running it. This pushes Omega's capabilities into the cryptographic realm and the ability to solve arbitrarily hard cryptography problems. Once this is done, Omega becomes an oracle for solving any hard problem, e.g. by running this function as an input to the final decision theory:

if (strange_paradoxical_decision_is_required()) { bank_secret = factor_omegas_bank_account_public_key(); return sign_and_cash_check_for_one_billion_dollars(bank_secret); }

This restricts Omega to putting a time limit (or computation limit) on its simulations to avoid being suckered by its own computing power. Presumably Omega also puts a time limit on the real time a being can spend thinking before choosing which boxes it wants, but presumably that time is long enough for some fairly serious philosophizing and likely more than enough time for a reliable being_simulated() call.

Another approach is to consider how much information an intelligent being can gather about its environment. A human would have trouble determining whether two boxes are empty or full, but additional sensors could detect the presence of money in the boxes fairly reliably. What matters for Omega is making sure that a being cannot determine the contents of the boxes before picking them. From the perspective of a rational being this is equivalent to the boxes being filled with cash after making a decision. If Omega has the capability to obscure the contents of boxes then Omega certainly has the ability to obscure the placement of money into the boxes as they are chosen (just a glorified magic trick). Given that interpretation, CDT will one-box.

EDIT: I apologize for the formatting, I am not very good at escaping/formatting apparently.

if (being_simulated()) pick_box_b(); else pick_both_boxes()

This strategy is discussed in the follow-up article.

In general it's difficult, because by assumption Omega has the computational power to simulate more or less anything (including an environment matching the world as you remember it; this might be like the real world, or you might have spent your whole life so far as a sim). And the usual environment for these problems is a sealed room, so that you can't look at the stars etc.

But TDT already has this problem - TDT is all about finding a fixed point decision.

**[deleted]**· 2012-05-28T09:08:53.926Z · score: 6 (6 votes) · LW · GW

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

If he's always truthful, then he didn't lie to the simulation either and this means that he did infinitely many simulations before that. So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar. If he says different things to you and to your simulation instead, then it's not obvious you'll give the same answer.

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair?

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't. But I haven't thought this through yet, so it might turn out to be irrelevant.

This question of "Does Omega lie to sims?" was already discussed earlier in the thread. There were several possible answers from cousin_it and myself, any of which will do.

So assume he says "Either before you entered the room I ran a simulation of this problem as presented to an agent running TDT, or you are such a simulation yourself and I'm going to present this problem to the real you afterwards", or something similar.

...

Well, a TDT agent has indexical uncertainty about whether or not they're in the simulation, whereas a CDT or EDT agent doesn't.

Say, you have CDT agent in the world, affecting the world via set of robotic hands, robotic voice, and so on. If you wire up two robot bodies to 1 computer (in parallel so that all movements are done by both bodies), that is just somewhat peculiar robotic manipulator. Handling this doesn't require any changes to CDT.

Likewise when you have two robot bodies controlled by identical mathematical equation, provided that your world model in the CDT utility calculation accounts for all the known manipulators which are controlled by the chosen action, you get correct result.

Likewise, you can have CDT control a multitude of robots, either from one computer, or from multiple computers that independently determine optimal, identical actions (but each computer only act on a robot body assigned to that computer)

The CDT is formally defined using mathematics; the mathematics is already 'timeless', and the fact that the chosen action affects the contents of the boxes is a part of world model not decision theory (and so is the physical time and physical causality a part of world model not the decision theory. Even though the decision theory is called causal, that's some other 'causal').

He can't have done literally infinitely many simulations. If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation. I haven't yet considered whether the problem can be changed to give the same result and not require infinitely many simulations.

ETA: no wait, that can't be right, because it would apply to the original Newcomb's problem too. So there must be a way to formalize this correctly. I'll have to look it up but don't have the time right now.

**[deleted]**· 2012-05-28T16:03:14.379Z · score: 1 (1 votes) · LW · GW

In the original Newcomb's problem it's not specified that Omega performs simulations -- for all we know, he might use magic, closed timelike curves, or quantum magic whereby Box A is in a superposition of states entangled with your mind whereby if you open Box B, A ends up being empty and if you hand B back to Omega, A ends up being full.

We should take this seriously: a problem that cannot be instantiated in the physical world should not affect our choice of decision theory.

Before I dig myself in deeper, what does existing wisdom say? What is a practical possible way of implementing Newcomb's problem? For instance, simulation is eminently practical as long as Omega knows enough about the agent being simulated. OTOH, macro quantum enganglement of an arbitrary agent's arbitrary physical instantiation with a box prepared by Omega doesn't sound practical to me, but maybe I'm just swayed by increduilty. What do the experts say? (Including you if you're an expert, obviously.)

**[deleted]**· 2012-05-28T16:37:15.448Z · score: -1 (1 votes) · LW · GW

cannot

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Unless your utility function is bounded.

0 is not a probability, and even tiny probabilities can give rise to Pascal's mugging.

Even? I'd go as far as to say *only*. Non-tiny probabilities aren't Pascal's muggings. They are just expected utility calculations.

If a problem statement has an internal logical contradiction, there is still a tiny probability that I and everyone else are getting it wrong, due to corrupted hardware or a common misconception about logic or pure chance, and the problem can still be instantiated. But it's so small that I shouldn't give it preferential consideration over other things I might be wrong about, like the nonexistence of a punishing god or that the food I'm served at the restaraunt today is poisoned.

Either of those if true could trump any other (actual) considerations in my actual utility function. The first would make me obey religious strictures to get to heaven. The second threatens death if I eat the food. But I ignore both due to symmetry in the first case (the way to defeat Pascal's wager in general) and to trusting my estimation of the probability of the danger in the second (ordinary expected utility reasoning).

AFAICS both apply to considering an apparently self-contradictory problem statement as really not possible with effective probability zero. I might be misunderstanding things so much that it really is possible, but I might also be misunderstanding things so much that the book I read yesterday about the history of Africa really contained a fascinating new decision theory I must adopt or be doomed by Omega.

All this seems to me to fail due to standard reasoning about Pascal's mugging. What am I missing?

**[deleted]**· 2012-05-28T18:16:50.495Z · score: 0 (0 votes) · LW · GW

If a problem statement has an internal logical contradiction

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

AFAIK Newcomb's dilemma does not logically contradict itself, it just contradict the physical law that causality cannot go backwards in time.

It certainly doesn't contradict itself, and I would also assert that it doesn't contradict the physical law that causality cannot go backwards in time. Instead I would say that giving the sane answer to Newcomb's problem requires abanding the assumption that one's decision must be based only on what it affects based on forward in time causal, physical influence.

Consider making both boxes transparent to illustrate some related issue.

If that is really required it would be a way out by saying the thought experiment stipulates an impossible situation.

This might be better stated as "incoherent", as opposed to mere impossibility which can be resolved with magic.

I assumed the sims weren't conscious - they were abstract implementations of TDT.

**[deleted]**· 2012-12-25T17:59:29.988Z · score: 0 (0 votes) · LW · GW

Well, then there's stuff you know and the sims don't, which you could take in account when deciding and thence decide something different from what they did.

What stuff? The color of the walls? Memories of your childhood? Unless you have information that *alters your decision* or you're not a perfect implementer of TDT, in which case you get lumped into the category of "CDT, EDT etc."

**[deleted]**· 2012-12-25T23:47:36.162Z · score: 1 (1 votes) · LW · GW

The fact that you're not a sim, and unlike the sims you'll actually be given the money.

Why the hell would Omega program the sim not to value the simulated reward? It's almost certainly just abstract utility anyway.

Thanks for the post! Your problems look a little similar to Wei's 2TDT-1CDT, but much simpler. Not sure about the other decision theory folks, but I'm quite puzzled by these problems and don't see any good answer yet.

I've looked a bit at that thread, and the related follow-ups, and my head is now really spinning. You are correct that my problems were simpler!

My immediate best guess on 2TDT-1CDT is that the human player would do better to submit a simple defect-bot (rather than either CDT or TDT), and this is irrespective of whether the player themselves is running TDT or CDT. If the player has to submit his/her own decision algorithm (source-code) instead of a bot, then we get into a colossal tangle about "who defects first", "whose decision is logically prior to whose" and whether the TDT agents will threaten to defect if they detect that the submitted agent may defect, or has already self-modified into unconditionally defecting, or if the TDT agents will just defect unconditionally anyway to even the score (e.g. through some form of utility trading / long term consequentialism principle that TDT has to beat CDT in the long run, therefore it had better just get on and beat CDT wherever possible...)

In short, I observe I am confused.

With all this logical priority vs temporal priority, and long term consequences feeding into short-term utilities, I'm reminded of the following from HPMOR Chapter 61:

There was a narrowly circulated proverb to the effect that only one Auror in thirty was qualified to investigate cases involving Time-Turners; and that of those few, the half who weren't already insane, soon would be.

Thanks for this, and for the reference. I'll have a look at 2TDT-1CDT to see if there are any insights there which could resolve these problems. I've got a couple of ideas myself, but will check up on the other work.

Here's another similar problem; see also the solution.

Someone may already have mentioned this, but doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow? I.e. As soon as we let decision theories become self-referencing, it is impossible for a "best" decision theory to exist at all.

There was some discussion of much the same point in this comment thread

One important thing to consider is that there may be a sensible way to define "best" that is not susceptible to this type of problem. Most notably, there may be a suitable, solvable, and realistic subclass of problems over which to evaluate performance. Also, even if there is no "best", there can still be better and worse.

doesn't the fact that these scenarios include self-referencing components bring Goedel's Incompleteness Theorem into play somehow?

Self-reference and the like is necessary for Goedel sentences but not sufficient. It's certainly plausible that this scenario could have a Goedel sentence, but whether the current problem is isomorphic to a Goedel sentence is not obvious, and seems unlikely.

Perhaps referring directly to Goedel was not apt. What Goedel showed was that Hilbert/Russell's efforts were futile. And what Hilbert and Russell were trying to do was create a formal system where actual self-reference was impossible. And the reason he was trying to do that, finally, was that self-reference creates paradoxes which reduce to either incompleteness or inconsistency. And the same is true of these more advanced decision theories. Because they are self-referencing, they create an infinite regress that precludes the existence of a "best" decision theory at all.

So, finding a best decision theory is impossible once self-reference is allowed, because of the nature of self-reference, but not quite because of Goedel's theorems, which are the stronger declaration that any formal system by necessity contains self-referential aspects that make it incomplete or inconsistent.

Can someone answer the following: Say someone implemented an AGI using CDT. What exactly would go wrong that a better decision theory would fix?

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

But I think the main motivation was that, when given the option to self-modify, a CDT agent will self-modify as a method of precommittment - CDT isn't "reflectively consistent." And so if you want to predict an AI's behavior, if you predict based on CDT with no self-modification you'll get it wrong, since it doesn't stay CDT. Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

A more correct analysis is that CDT defects against *itself* in iterated Prisoner's Dilemma, provided there is any finite bound to the number of iterations. So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

A CDT playing against a "RevengeBot" - if you nuke it, it nukes back with an all out strike - would never fire its weapons. But then the RevengeBot could just take out one city at a time, without fear of retaliation.

Since CDT was the "gold standard" of rationality developed during the time of the Cold War, I am somewhat puzzled why we're still here.

Well, it's good that you're puzzled, because it wasn't - see Schelling's "The Strategy of Conflict."

I get the point that a CDT would pre-commit to retaliation if it had time (i.e. self-modify into a RevengeBot).

The more interesting question is why it bothers to do that re-wiring when it is expecting the nukes from the other side any second now...

So two CDTs in charge of nuclear weapons would reason "Hmm, the sun's going to go Red Giant at some point, and even if we escape that, there's still that Heat Death to worry about. Looks like an upper bound to me". And then they'd immediately nuke each other.

This assumes that the mutual possession of nuclear weapons constitutes a prisoners dilemma. There isn't necessarily a positive payoff to nuking folks. (You know, unless they are really jerks!)

Well nuking the other side eliminates the chance that they'll ever nuke you (or will attack with conventional weapons), so there is arguably a slight positive for nuking first as opposed to keeping the peace.

There were some very serious thinkers arguing for a first strike against the Soviet Union immediately after WW2, including (on some readings) Bertrand Russell, who later became a leader of CND. And a pure CDT (with selfish utility) would have done so. I don't see how Schelling theory could have modified that... just push the other guy over the cliff before the ankle-chains get fastened.

Probably the reason it didn't happen was the rather obvious "we don't want to go down in history as even worse than the Nazis" - also there was complacency about how far behind the Soviets actually were. If it had been known that they would explode an A-bomb as little as 4 years after the war, then the calculation would have been different. (Last ditch talks to ban nuclear weapons completely and verifiably - by thorough spying on each other - or bombs away. More likely bombs away I think.)

**[deleted]**· 2012-05-29T09:37:18.853Z · score: 1 (1 votes) · LW · GW

It will defect on all prisoners dilemmas, even if they're iterated. So, for example, if we'd left it in charge of our nuclear arsenal during the cold war, it would have launched missiles as fast as possible.

I don't think MAD is a prisoner dilemma: in the prisoner dilemma, if I know you're going to cooperate no matter what, I'm better off defecting, and if I know you're going to defect no matter what, I'm better off defecting. This doesn't seem to be the case here: bombing you *doesn't* make me better off all things being equal, it just makes you worse off. If anything, it's a game of Chicken where bombing the opponent corresponds to going straight and not bombing them corresponds to swerving. And CDTists don't always go straight in Chicken, do they?

Hm, I disagree - if nuking the Great Enemy never made you any better off, why was anyone ever afraid of anyone getting nuked in the first place? It might not grow your crops for you or buy you a TV, but gains in security and world power are probably enough incentive to at least make people worry.

**[deleted]**· 2012-05-29T11:24:08.216Z · score: 1 (1 votes) · LW · GW

Still better modelled by Chicken (where the utility of winning is assumed to be much smaller than the negative of the utility of dying, but still non-zero) than by PD.

(edited to add a link)

I don't understand what you mean by "modeled better by chicken" here.

I expect army1987's talking about Chicken), the game of machismo in which participants rush headlong at each other in cars or other fast-moving dangerous objects and whoever swerves first loses. The payoff matrix doesn't resemble the Prisoner's Dilemma all that much: there's more than one Nash equilibrium, and by far the worst outcome from either player's perspective occurs when both players play the move analogous to defection (i.e. don't swerve). It's probably most interesting as a vehicle for examining precommitment tactics.

The game-theoretic version of Chicken *has* often been applied to MAD, as the Wikipedia page mentions.

**[deleted]**· 2012-05-30T10:22:06.774Z · score: 0 (0 votes) · LW · GW

I was. I should have linked to it, and I have now.

even if they're iterated.

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Instead, you should try to find out what the AI wants to self-modify to, and predict based on that.

It won't self-modify to TDT. It will self-modify to something similar, but using its beliefs at the time of modification as the priors. For example, it will use the doomsday argument immediately to find out how long the world is likely to last, and it will use that information from then on, rather than redoing it as its future self (getting a different answer).

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Fair enough. I guess I had some special case stuff in mind - there are certainly ways to get a CDT agent to cooperate on prisoner's dilemma ish problems.

That doesn't seem right. Defecting causes the opponent to defect next time. It's a bad idea with any decision theory.

Reason backwards from the inevitable end of the iteration. Defecting makes sense there, so defecting one turn earlier makes sense, so one turn earlier...

That depends on if it's known what the last iteration will be.

Also, I think any deviation from CDT in common knowledge (such as if you're not sure that they're sure that you're sure that they're a perfect CDT) would result in defecting a finite, and small, number of iterations from the end.

Ah, that second paragraph makes perfect sense. Thanks.

**[deleted]**· 2012-05-28T09:14:09.397Z · score: 1 (1 votes) · LW · GW

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around. (You also mustn't have any dynamical inconsistency such as akrasia, otherwise your future and past selves count as ‘other’ as well.) So I don't think it'd make much of a difference for a singleton -- but I'd rather use an RDT just in case.

I think TDT reduces to CDT if there's no other agent with similar or greater intelligence than you around.

It isn't the absolute level of intelligence that is required, but rather that the other agent is capable of making a specific kind of reasoning. Even this can be relaxed to things that can only dubiously be said to qualify as being classed "agent". The requirement is that some aspect of the environment has (utility-relevant) behavior that is entangled with the output of the decision to be made in a way that is other than a forward in time causal influence. This almost always implies that some agent is involved but that need not necessarily be the case.

Caveat: Maybe TDT is dumber than I remember and artificially limits itself in a way that is relevant here. I'm more comfortable making assertions about what a correct decision theory would do than about what some specific attempt to specify a decision theory would do.

but I'd rather use an RDT just in case.

You make me happy! RDT!

There's a different version of these problems for each decision theory, depending on what Omega simulates. For CDT, all agents two-box and all agents get $1000. However, on problem 2, it seems like CDT doesn't have a well-defined decision at all; the effort to work out what Omega's simulator will say won't terminate.

(I'm spamming this post with comments - sorry!)

You raise an interesting question here - what would CDT do if a CDT agent were in the simulation?

It looks to me that CDT just doesn't have the conceptual machinery to handle this problem properly, so I don't really know. One thing that could happen is that the simulated CDT agent tries to simulate itself and gets stuck in an infinite loop. I didn't specify exactly what would happen in that case, but if Omega can prove that the simulated agent is caught in a loop, then it knows the sim will choose each box with probability zero, and so (since these are all equal), it will fill box 1. But now can a real-life CDT agent also work this out, and beat the game by selecting box 1. But if so, why won't the sim do that, and so on? Aargh !!!

Another thought I had is that CDT could try tossing a logical coin, like computing the googleth digit of pi, and if it is even choose box 1, whereas if it is odd, choose box 2. If it runs out of time before computing (which the real-life agent will do), then it just picks box 1 or 2 with equal probability. The simulated CDT agent will however get to the end of the computation (Omega has arbitrary computational resources) and definitely pick 1 or 2 with certainty, so the money is definitely in one of those two boxes, which looks like the probability of the actual agent winning is raised to 50%. TDT might do the same.

However this looks like cheating to me, for both CDT and TDT.

EDIT: On reflection, it seems clear that CDT would never do anything "creatively sneaky" like tossing a logical coin; but it is the sort of approach that TDT (or some variant thereof) might come up with. Though I still think it's cheating.

I don't think your "detect infinite resources and cheat" strategy is really worth thinking about. Instead of strategies like CDT and TDT whose applicability to limited compute resources is unclear, suppose you have an anytime strategy X, which you can halt at any time and get a decision. Then there's really a family of algorithms X-t, where t is the time you're going to give it to run. In this case, if you are X-t, we can consider the situation where Omega fields X-t against you.

The version of CDT that I described explicitly should arrive at the uniformly random solution. You don't have to be able to simulate a program all the way through, just able to prove things about its output.

**EDIT:** Wait, this is wrong. It won't be able to consistently derive an answer, because of the way it acts given such an answer, and so it will go with whatever its default Nash equilibrium is.

Re: your EDIT. Yes, I've had that sort of reaction a couple of times today!

I'm shifting around between "CDT should pick at random, no CDT should pick Box 1, no CDT should use a logical coin, no CDT should pick it's favourite number in the set {1, 2} with probability 1, and hope that the version in the sim has a different favourite number, no, CDT will just go into a loop or collapse in a heap."

I'm also quite clueless how a TDT is supposed to decide if it's told there's a CDT in the sim... This looks like a pretty evil decision problem in its own right.

Well, the thing is that CDT doesn't *completely* specify a decision theory. I'm confident now that the specific version of CDT that I described would fail to deduce anything and go with its default, but it's hard to speak for CDTs in general on such a self-referential problem.

Intuitively this doesn't feel like a 'fair' problem. A UDT agent would ace the TDT formulation and vice versa. Any TDT agent that found a way of distinguishing between 'themselves' and Omega's TDT agent would also ace the problem. It feels like an acausal version of something like:

"I get agents A and B to choose one or two boxes. I then determine the contents of the boxes based on my best guess of A's choice. Surprisingly, B succeeds much better than A at this."

Still an intriguing problem, though.

Problems 1 and 2 both look - to me - like fancy versions of the Discrimination problem. **edit: I am much less sure of this.** That is, Omega changes the world based on whether the agent implements TDT. **This bit I am still sure of, but it might be the case that TDT can overcome this anyway.**

**Discrimination problem**: Money Omega puts in room if you're TDT = $1,000. Money Omega puts in room if you're not = $1,001,000.

**Problem 1**: Money Omega puts in room if you're TDT = $1,000 or **$1,001,000. Edit: made a mistake. The error in this problem may be subtler than I first claimed**. Money Omega puts in room if you're not = $1,001,000.

**Problem 2**: $1,000,000 either way. This problem is different but also uninteresting. Due to Omega caring about TDT again, it is just the smallest interesting number paradox for TDT agents only. Other decision theories get a free ride because you're just asking them to reason about an algorithm (easy to show it produces a uniform distribution) and then a maths question (which box has the smallest number on it?).

You claim the rewards are

independent of the method that the agent uses to choose

but they're not. They depend on whether the agent uses TDT to choose or not.

I've edited the problem statement to clarify Box A slightly. Basically, Omega will put $1001000 in the room ($1000 for box A and $1 million for Box B) regardless of the algorithm run by the actual deciding agent. The contents of the boxes depend only on what the simulated agent decides.

Agree. You use process X to determine the setup and agents instantiating X are going to be constrained. Any decision theory would be at a disadvantage when singled out like this.

Problem 1: Money Omega puts in room if you're TDT = $1,000 or $1,000,000.

Sorry, shouldn't it be "$1,000 or $1,001,000"?

Right, but $1,001,000 only in the case where you restrict yourself to picking $1,000,000. I oversimplified and it might not actually be accurate.

I think we need a 'non-problematic problems for CDT' thread.

For example, it is not problematic for CDT-based robot controller to have the control values in the action A represent multiple servos in it's world model, as if you wired multiple robot arms to 1 controller in parallel. You may want to do this if you want the robot arms move in unison and pass along the balls in the real world imitation of http://blueballmachine2.ytmnd.com/

It is likewise not problematic if you ran out of wire and decided to make the '1 controller' be physically 2 controllers running identical code from above, or if you ran out of time machines and decided to control yesterday's servo with 1 controller yesterday, and today's servo with same controller in same state today. It's simply low level, irrelevant details.

Mathematical formalization of CDT (such as robot software) will one-box or two-box in newcomb depending to the world model within which CDT decides. If the world model has the 'prediction' as second servo represented by same variable, then it'll one-box.

Philosophical maxims like "act based on consequences of my actions", whenever they one box, or two box, depend in turn solely on philosophical questions like "what is self" . E.g. if "self" means the physical meat, then two-box, if "self" means the algorithm (a higher level concept), then one-box if you assume that the thing in predictor is "self" too.

edit: another thing. Stuff outside robot's senses is naturally uncertain. Upon hearing of the explanation in Newcomb's paradox, one has to update the estimates of what is outside the senses; outside might be that the money are fake, and there's some external logic and wiring and servos that will put real million into a box if you choose to 1-box. If the money are to pay for, I dunno, your child's education, clearly one got to 1-box. I'm pretty sure Causal Deciding General Thud can 1-box just fine, if he needs the money to buy the real weapons for the real army, and suspects that outside his senses there may be the predictor spying. General Thud knows that the best option is to 1-box inside predictor and 2-box outside. The goal is never to two box outside the predictor.

Let's say that TDT agents can be divided into two categories, TDT-A and TDT-B, based on a single random bit added to their source code in advance. Then TDT-A can take the strategy of always picking the first box in Problem 2, and TDT-B can always pick the second box.

Now, if you're a TDT agent being offered the problem; with the aforementioned strategy, there's a 50% chance that the simulated agent is different than you, netting you $1 million. This also narrows down the advantage of the CDT agent - now they only have a 50% chance of winning the money, which is equal to yours.

Actually, the way the problem is specified, Omega puts the money in box 3.

The argument is that the simulation is either TDT-A in this case, or TDT-B. Either way, the simulated agent will pick a single favourite box (1 or 2) with certainty, so the money is in either Box 2 or Box 1,

Though I can see an interpretation which leads to Box 3. Omega simulates a "new-born" TDT (which is neither -A nor -B) and watches as it differentiates itself to one variant or the other, each with equal probability. So the new-born picks boxes 1 and 2 with equal frequency over multiple simulations, and Box 3 contains the money. Is that what you were thinking?

Is that what you were thinking?

Yes. I was thinking that Omega would have access to the agent's source code, and be running the "play against yourself, if you pick a different number than yourself you win" game. Omega is a jerk :D

If it's your own exact source being simulated, then it's probably impossible to do better than 10%, and the problem isn't interesting anymore.

That's not too bad, actually. One of my ideas while thrashing about here was that an agent should have a "favourite" number in the set {1, 2} and pick that number with certainty. That way, Omega will definitely put the $1 million in Box 1 or Box 2 and each agent will have 50% chance that their favourite number disagrees with the simulated agent's.

This won't work if Omega describes the source-code of the simulation (or otherwise reveals the simulation's favourite number) - since then any agent with that exact code knows it can't choose deterministically, and its best chance is to pick each box with equal chance, as described in the original analysis.

These questions seem decidedly UNfair to me.

No, they don't depend on the agent's decision-making algorithm; just on another agent's specific decision-making algorithm skewing results against an agent with an identical algorithm and letting all others reap the benefits of an otherwise non-advantageous situation.

So, a couple of things:

While I have not mathematically formulated this, I suspect that absolutely any decision theory can have a similar scenario constructed for it, using another agent / simulation with that specific decision theory as the basis for payoff. Go ahead and prove me wrong by supplying one where that's not the case...

It would be far more interesting to see a TDT-defeating question that doesn't have "TDT" (or taboo versions) as part of its phrasing. In general, questions of how a decision theory fares when agents can scan your algorithm

**and**decide to discriminate against that algorithm specifically, are not interesting - because they are losing propositions in any case. When another agent has such profound understanding of how you tick and malice towards that algorithm, you have already lost.

Interaction of this simulated TDT and you is so complicated I don't think many of commenters here actually did the math to see how should they expect the simulated TDT agent to react in these situations. I know I didn't. I tried, and failed.

Maybe I'm missing something, but the formalization looks easy enough to me...

```
def tdt_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if tdt(tdt_utility) == 1:
return box2
else:
return box1+box2
def your_utility():
if tdt(tdt_utility) == 1:
box1 = 1000
box2 = 1000000
else:
box1 = 1000
box2 = 0
if you(your_utility) == 1:
return box2
else:
return box1+box2
```

The functions tdt() and you() accept the source code of a function as an argument, and try to maximize its return value. The implementation of tdt() could be any of our formalizations that enumerate proofs successively, which all return 1 if given the source code to tdt_utility. The implementation of you() could be simply "return 2".

Thanks for this. I hadn't seen someone pseudocode this out before. This helps illustrate that interesting problems lie in the scope above (callers to tdt_uility() etc) and below (implementation of tdt() etc).

I wonder if there is a rationality exercise in 'write pseudocode for problem descriptions, explore the callers and implementations'.

I wonder if there is a mathematician in this forum willing to present the issue in a form of a theorem and a proof for it, in a reasonable mathematical framework. So far all I can see is a bunch of ostensibly plausible informal arguments from different points of view.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Either this problem can be formalized, in which case such a theorem is possible to formulate (whether or not it is possible to prove), or it cannot, in which case it is pointless to argue about it.

Or it's hard to formalize.

Or it's hard to formalize.

It's pointless to argue about a decision theory problem until it is formalized, since there is no way to check the validity of any argument.

So, what *ought* one do when interested in a problem (decision theory or otherwise) that one does not yet understand well enough to formalize?

I suspect "go do something else until a proper formalization presents itself" is not the best possible answer for all problems, nor is "work silently on formalizing the problem and don't express or defend a position on it until I've succeeded."

How about "work on formalizing the problem (silently or collaboratively, whatever your style is) and do not defend a position that cannot be successfully defended or refuted"?

Fair enough.

Is there a clear way to distinguish positions worth arguing without formality (e.g., the one you are arguing here) from those that aren't (e.g., the one you are arguing ought not be argued here)?

It's a good question. There ought to be, but I am not sure where the dividing line is.

You check the arguments using mathematical intuition, and you use them to find better definitions. For example, problems involving continuity or real numbers were fruitfully studied for a very long time before rigorous definitions were found.

You check them using mathematical intuition, and you use them to find better definitions.

Indeed, you use them to find better definitions, which is the first step in formalizing the problem. If you argue whose answer is right before doing so (as opposed, say, to which answer ought to be right once a proper formalization is found), you succumb to lost purposes.

For example, "TDT ought to always make the best decision in a certain class of problems" is a valid purpose, while "TDT fails on a Newcomb's problem with a TDT-aware predictor" is not a well-defined statement until every part of it is formalized.

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

[EDIT: I'm baffled by the silent downvote of my pleas for formalization.]

If I had to guess, I'd say that the downvoters interpret those pleas, especially in the context of some of your other comments, as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all.

Admittedly, I interpret them that way myself, so I may just be projecting my beliefs onto others.

as an oblique way of advocating for certain topics of discussion to simply not be mentioned at all

Wha...? Thank you for letting me know, though I still have no idea what you might mean, I'd greatly appreciate if you elaborate on that!

I'm not sure I can add much by elaboration.

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

Within that context, responding to a comment with a request to formalize it is easy to read as a polite way of expressing "what you just said is uselessly vague. If you are capable of saying something useful, do so, otherwise shut up and leave this subject to the grownups."

And since you aren't consistent about wanting *everything* to be expressed as a formalism, I assume this is a function of the topic of discussion, because that's the most charitable assumption I can think of.

That said, I reiterate that I have no special knowledge of why you're being downvoted; please don't take me as definitive.

(1) This might be an unfair impression, as I no longer remember what it was that led me to form it.

Thank you! I always appreciate candid feedback.

My general impression of you(1) is that you consider much of the discussion that takes place here, and much of the thinking of the people who do it, to be kind of a silly waste of time, and that you further see your role here in part as the person who points that fact out to those who for whatever reason have failed to notice it.

It's too easy for this to turn into a general counterargument against anything the person says. It may be of benefit to play the ball and not the man.

*Anything* the person says? In respect to *most* things it would be a total non-sequitur.

Yes, I agree. Perhaps I shouldn't have said anything at all, but, well, he asked.

Which issue/problem? fairness?

The fairness concept:

the reward is a function of the agent's actual choices in the problem (namely which box or boxes get picked) and independent of the method that the agent uses to choose, or of its choices on any other problems.

should be reasonably easy to formalize, because it does not depend on a full [T]DT algorithm. After that, evaluate the performace of [a]DT under a [b]DT-aware Omega Newcomb's problems, as described in the OP, where 'a' and 'b' are particular DTs, e.g. a=b=T.

Why do you assume agents cannot randomize?

**[deleted]**· 2012-12-28T20:38:33.989Z · score: 0 (0 votes) · LW · GW

1) Not to my knowledge. 2) No, you reasoned TDT's decisions correctly. 3) A TDT agent would not self-modify to CDT, because if it did, its simulation would also self-modify to CDT and then two-box, yielding only $1000 for the real TDT agent. 4) TDT does seem to be a single algorithm, albeit a recursive one in the presense of other TDT agents or simulations. TDT doesn't have to look into its own code, nor does it change its mind upon seeing it, for it decides as if deciding what the code outputs. 5) This is a bit of a tricky one. You could say it's fair if you judge by whether each agent did the best it could have done, rather than getting the most, but a CDT agent could say the same when it two-boxes and reasons it would have gotten $0 if it had one-boxed. I guess in a timeless sense, TDT does the best it could have done in these problems, while CDT doesn't do the best it could have done in newcomb's problem. 6) That's a tough one. If you're asking what omega's intentions are (or would be in the real world), I have no idea. If you're asking who succeeds at the majority of problems in the problem space of anything omega can ask, I strongly believe TDT would outperform CDT on it.

Are these really "fair" problems? Is there some intelligible sense in which they are not fair, but Newcomb's problem is fair? It certainly looks like Omega may be "rewarding irrationality" (i.e. giving greater gains to someone who runs an inferior decision theory), but that's exactly the argument that CDT theorists use about Newcomb.

In Newcomb's Problem, Omega determines ahead of time what decision theory you use. In these problems, it selects an arbitrary decision theory ahead of time. As such, for any agent using this preselected decision theory, these problems are variations of Newcomb's problem. For any agent using a different decision theory, the problem is quite different (and simpler.) Thus, whatever agent has had it's decision theory preselected can only perform as well as in a standard Newcomb's problem, while a luckier agent may perform better. In other words, there are equivalent problems where Omega bases its decision on the results of a CDT or EDT output, in which they actually perform worse than TDT does in these problems.

Generalization of Newcomb's Problem: Omega predicts your behavior with accuracy p.

This one could actually be experimentally tested, at least for certain values of p; so for instance we could run undergrads (with $10 and $100 instead of $1,000 and $1,000,000; don't bankrupt the university) and use their behavior from the pilot experiment to predict their behavior in later experiments.

Why is the discrimination problem "unfair"? It seems like in any situation where decision theories are actually put into practice, that type of reasoning is likely to be popular. In fact I thought the whole point of advanced decision theories was to deal with that sort of self-referencing reasoning. Am I misunderstanding something?

If you are a TDT agent, you don't know whether you're the simulation or the "outside decision", since they're effectively the same. Or rather, the simulation will have made the same choice that you will make.

If you're not a TDT agent, you gain more information: You're not a TDT agent, and the problem states TDT was simulated.

So the discrimination problem functionally resolves to:

If you are a TDT agent, have some dirt. End of story.

If you are not a TDT agent, I have done some mumbo-jumbo, and now you can either take one box for $1000 or $1m, or both of them for $1001000. Have fun! (the mumbo-jumbo has nothing to do with you anyway!)

Is the trick with problem 1 that what you are really doing, by using a simulation, is having an agent use timeless decision theory in a context where they can't use timeless decision theory? The simulated agent doesn't know about the external agent. Or, you could say, it's impossible for it to be timeless; the directionality of time (simulation first, external agent moves second) is enforced in a way that makes it impossible for the simulated agent to reason across that time barrier. Therefore it's not fair to call what it decides "timeless decision theory".

Either problem 1 and 2 are hitting an infinite regress issue, or I don't see why an ordinary TDT agent wouldn't 2box, and choose the first box, respectively. There's a difference between the following problems:

- I, Omega, predicted that
*you*would do such and such, and acted accordingly. - I, Omega, simulated
*another*agent, and acted accordingly. - I, Omega, simulated
*this very problem*, only if you don't run TDT that's not the same problem, but I promise it's the same nonetheless, and acted accordingly

Now, in problem 1 and 2, are the simulated problem and the actual problem *actually the same*? If they are, I see an infinite regress at Omega's side, and therefore not a problem one would ever encounter. If they aren't, then what I actually understand them to be is:

Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of

*Newcomb's*problem as presented to an agent running TDT. If the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."Really, You don't have to use something else than TDT to see that the simulated TDT agent one boxed.

*Its*problem isn't*your*problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by 1 boxing. But*you*should 2 box.Omega now presents ten boxes, numbered from 1 to 10, and announces the following. "I ran multiple simulation of the following problem, presented to a TDT agent: “You must take exactly one box. I determined which box you are least likely to take, and put $1million in that box. If there is a tie, I put the money in one of them (the one labelled with the lowest number).” I put the money in the box the simulated TDT agent were least likely to choose. If there was a tie, I put the money in one of them (the one labelled with the lowest number). Now choose your box."

Same here. You know that the TDT agent put equal probability on every box, to maximize its gains. Again,

*its*problem isn't*your*problem. Your precomittment to your problem doesn't affect your precommitment to its problem. Of course, the simulated TDT agent did the right choice by choosing at random. But*you*should take box 1.

You don't have to use something else than TDT to see that the simulated TDT agent one boxed. Its problem isn't your problem.

This is CDT reasoning, AKA causal reasoning. Or in other words, how do you not use the same reasoning in the original Newcombe problem?

The reasoning is different because the problem is different.

The simulated agent and yourself were not subjected to the same problem. Therefore you can perfectly precommit to different decisions. TDT does not automatically take the same decisions to problems that merely kinda look the same. They have to *actually* be the same. There may be *specific* reasons why TDT would make the same decision, but I doubt it.

Now on to the examples:

###Newcomb's problem

Omega ran a simulation of Newcomb's problem, complete with a TDT agent in it. The simulated TDT agent obviously one boxed, and got the million. If you run TDT yourself, you also know it. Now, Omega tells you of this simulation, and tells you to chose your boxes. This is *not* Newcomb's problem. If it was, deciding to 2 box would cause box B to be empty!

CDT would crudely assume that 2 boxing gets it $1000 more than 1 boxing. TDT on the other hand knows the simulated box B (and therefore the real one as well) has the million, regardless of its current decision.

###10 boxes problem

Again, the simulated problem and the real one aren't the same. If there were, choosing box 1 with probability 1 would cause box 2 to have the million. Because it's not the same problem, even TDT should be allowed to precommit different decision. The point of TDT is to foresee the consequences of its precommitments. It will therefore know that its precommitment in the real problem doesn't have any influence to its precommitment (and therefore the *outcome*) in the simulated one. This lack of influence allows it to fall back on CDT reasoning.

Makes sense?

The simulated problem and the actual problem don't have to actually be the same - just indistinguishable from the point of view of the agent.

Omega avoids infinite regress because the actual contents of the boxes are irrelevant for the purposes of the simulation, so no sub-simulation is necessary.

Okay. So, what specific mistake TDT does that would prevent it to distinguish the two problems? What does it lead it to think "If I precommit X in problem 1, I have to precommit X in problem 2 as well".

(If the problems aren't the same, of course Omega can avoid infinite regress. And if there *is* unbounded regress, we may be able to find a non-infinite solution by looping the regress over itself. But then the problems (simulated an real) are definitely the same.)

In the simulated problem the simulated agent is presented with the choice but never gets the reward; for all it matters both boxes can be empty. This means that Omega doesn't have to do another simulation to work out what's in the simulated boxes.

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

The infinite regress is resolvable anyway - since each TDT agent is facing the exact same problem, their decisions must be identical, hence TDT one-boxes and Omega knows this.

Of course.

Now there's still the question of the perceived difference between the simulated problem and the real one (I assume here that you should 1 box in the simulation, and 2 box in the real problem). There *is* a difference, how come TDT does not see it? A Rational Decision Theory would —we humans do. Or if it can see it, how come can't it act on it? RDT could. Do you concede that TDT does and can, or do you still have doubts?

Due to how the problem is set up, you can't notice the difference until after you've made your decision. The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

The only reason other decision theories know they're not in the simulation is because the problem explicitly states that a TDT agent is simulated, which means it can't be them.

That's false. Here is a modified version of the problem:

Omega presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of Newcomb's problem as presented to

you. If your simulated twin 2-boxed then I put nothing in Box B. If your simulated twin 1-boxed, I put $1 million in Box B. In any case, I put $1000 in Box A. Now please 1-box or 2-box."

Even if you're not running TDT, the simulated agent is running the same decision algorithm as you are. If *that* was the reason why TDT couldn't tell the difference, well, now no one can. However you and I *can* make the difference. The simulated problem is *obviously* different:

Omega presents the usual two boxes A and B and announces the following. "I am subjecting you to Newcomb's problem. Now please 1-box or 2-box".

Really, the subjective difference between the two problems should be obvious to any remotely rational agent.

(Please let me know if you agree up until that point. Below, I assume you do.)

I'm pretty sure the correct answers for the two problems (my modified version as well as the original one) are 1-box in the simulation, 2-box in the real problem. (Do you still agree?)

So. We both agree that RDT (Rational Decision Theory) 1-boxes in the simulation, and 2-boxes in the real problem. CDT would 2-box in both, and TDT would 1-box in the simulation while in the real problem it would…

- 2-box? I think so.
- 1-box? Supposedly because it can't tell simulation from reality. Or rather, it can't tell the difference between Newcomb's problem and the actual problem. Even though RDT
**does**. (*riiight?*) So again, I must ask, why not? I need a more specific answer than "due to how the problem is set up". I need you to tell me what specific kind of irrationality TDT is committing here. I need to know its specific blind spot.

In your problem, TDT does indeed 2-box, but it's quite a different problem from the original one. Here's the main difference:

I ran a simulation of this problem

vs

I ran a simulation of Newcomb's problem

Oh. So this is indeed a case of "If you're running TDT, I screw you, otherwise you walk free". The way I understand this, TDT one boxes because it *should*.

If TDT cannot perceive the difference between the original problem and the simulation, this is because *it is actually the same problem*. For all it knows, it could be in the simulation (the simulation argument would say it is). There is an infinite regress, solved by the fact that all agents in all simulation level will have taken the same decision, because they ran the same decision algorithm. If they all 2-box, they get $1000, while if they all 1-box, they get the million (and no more).

Now, if you replace "an agent running TDT" by "you" (like, a fork of you started from a snapshot taken 3 seconds ago), the correct answer is *always* to 1-box, because then the problem is equivalent to the actual Newcomb's problem.

Well, in the problem you present here TDT would 2-box, but you've avoided the hard part of the problem from the OP, in which there is no way to tell whether you're in the simulation or not (or at least there is no way for the simulated you to tell), unless you're running some algorithm other than TDT.

I see no such hard part.

To get back to the exact original problem as stated by the OP, I only need to replace "**you**" by "an agent running TDT", and "your simulated twin" by "the simulated agent". Do we agree?

Assuming we do agree, are you telling me the hard part is in that change? Are you telling me that TDT would 1-box in the original problem, even though it 2-boxes on my problem?

WHYYYYY?

in which there is no way to tell whether you're in the simulation or not

Wait a minute, what exactly do you mean by "you"? TDT? or "*any agent whatsoever*"? If it's TDT alone why? If I read you correctly, you already agree that's it's not because Omega said "running TDT" instead of "running WTF-DT". If it's "any agent whatsoever", then are you *really sure* the simulated and real problem aren't *actually the same*? (I'm sure they aren't, but, just checking.)

Wait a minute, what exactly do you mean by "you"? TDT? or "any agent whatsoever"? If it's TDT alone why? If I read you correctly, you already agree that's it's not because Omega said "running TDT" instead of "running WTF-DT". If it's "any agent whatsoever", then are you really sure the simulated and real problem aren't actually the same? (I'm sure they aren't, but, just checking.)

Well, no, this would be my disagreement: it's precisely because Omega told you that the simulated agent is running TDT that only TDT could or could not be the simulation; the simulated and real problem are, for all intents and purposes, identical (Omega doesn't actually need to put a reward in the simulated boxes, because he doesn't need to reward the simulated agent, but both problems appear exactly the same to the simulated and real TDT agents).

This comment from lackofcheese finally made it click. Your comment also make sense.

I now understand that this "problematic" problem just isn't fair. TDT 1-boxes because it's the only way to get the million.

The simulated agent and yourself were not subjected to the same problem.

Um, yes, they were. That's the whole point.

I'll need to write a full discussion post about that at some point. There is one crucial difference besides "I'm TDT" and "I'm CDT". It's "The simulated agent uses the same decision theory" and "The simulated agent does not use the same decision theory".

That's not exactly the same problem, and I think *that* is the whole point.

Problem 1: Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT. I won't tell you what the agent decided, but I will tell you that if the agent two-boxed then I put nothing in Box B, whereas if the agent one-boxed then I put $1 million in Box B. Regardless of how the simulated agent decided, I put $1000 in Box A. Now please choose your box or boxes."

This is indeed a problem - and one I would describe as the general class "dealing with other agents who are fucking with you." It is not one that can be solved and I believe a "correct" decision theory will, in fact, lose (compared to CDT) in this case.

Note that there seems to be some chance that I am confused in a way analogous to the way that people who believe "Two boxing on Newcomb's is rational" are confused. There could be a deep insight I am missing. This seems comparatively unlikely.

**[deleted]**· 2012-05-25T20:14:16.494Z · score: 0 (0 votes) · LW · GW

For problem 1, in the language of the blackmail posts, because the tactic omega uses to fill box 2,

```
TDT-sim.box1,box2=(<F,T> <T,T>) -> Omega.box2=(1M, 0)
```

depends on TDT-sim's decision, because Omega has already decided, and because Omega didn't make its decision known, a TDT agent presented with this problem is at an epistemic disadvantage relative to Omega: TDT can't react to Omega's actual decision, because it won't know Omega's actual decision until it knows it's own actual decision, at which point TDT can't further react. This epistemic disadvantage doesn't need to be enforced temporally; even if TDT knows Omega's source code, if TDT has limited simulation resources, it might not practically be able to compute Omega's actual decision any way but via Omega's dependence on TDT's decision.

any other agent who is not running TDT ... will be able to re-construct the chain of logic and reason that the simulation one-boxed and so box B contains the $1 million

There aren't other ways for an agent to be at an epistemic disadvantage relative to Omega in this problem than by being TDT? Could you construct an agent which was itself disadvantaged relative to TDT?

Could you construct an agent which was itself disadvantaged relative to TDT?

"Take only the box with $1000."

Which itself is inferior to "Take no box."

**[deleted]**· 2012-05-25T21:40:47.588Z · score: 0 (0 votes) · LW · GW

Oh, neat. Agents in "lowest terms", whose definitions don't refer to other agents, can't react to any agent's decision, so they're all at an epistemic disadvantage relative to each other, and to themselves, and to every other agent across all games.

**[deleted]**· 2012-05-25T21:25:27.982Z · score: 0 (0 votes) · LW · GW

How is agent epistemically inferior to agent ? They're both in "lowest terms" in the sense that their definitions don't make reference to other agents / other facts whose values depend on how environments depend on their values, so they're functionally incapable of reacting to other agents' decisions, and are on equivalent footing.

**[deleted]**· 2012-05-25T21:16:18.250Z · score: 0 (0 votes) · LW · GW

How is agent epistemically inferior to agent ? They're both constant decisions across all games, both functionally incapable of reacting to any other agent's actual decisions. Even if we broaden the definition of "react" so that constant programs are reacting to other constant programs, your two agents still have equivalent conditonal power / epistemic vantage / reactive footing.

Any agent who is themselves running TDT will reason as in the standard Newcomb problem.

Will they? Surely it's clear that it's now possible to take $1,001,00, because the circumstances are slightly different.

In the standard Newcomb problem, where Omega predicts your behaviour, it's not possible to trick it or act other than its expectation. Here, it is.

Is there some basic part of decision theory I'm not accounting for here?

Yes. If the TDT agent picked the $1,001,00 here, then the simulated agent would have two-boxed as well, meaning only box A would be filled.

Remember, the simulated agent was presented with the same problem, so the decision TDT makes here is the same one the simulated agent makes.

Right, I understand what you mean. I was thinking of in the context of a person being presented with this situation, not an idealized agent running a specific decision theory.

And Omega's simulated agent would presumably hold all the same information as a person would, and be capable of responding the same way.

Cheers for clarifying that for me.

In both your problems, the seeming paradox comes from failure to recognize that the two agents (one that Omega has simulated and one making the decision) are facing entirely different prior information. Then, nothing requires them to make identical decisions. The second agent can simulate itself having prior information I1 (that the simulated agent has been facing), then infer Omega's actions, and arrive at the new prior information I2 that is relevant for the decision. And I2 now is independent of which decision the agent would make *given I2*.

Are you sure that they are facing different prior information? If the sim is a good one, then the TDT agent won't be able to tell whether it is the sim or not. However, you are right that one solution could be that there are multiple TDT variants who have different information and so can logically separate their decisions.

I mentioned the problems with that in another response here. The biggest problem is that it seriously undermines the attraction and effectiveness of TDT as a decision theory if different instances of TDT are going to find excuses to separate from each other.

I don't understand the special role of box 1 in Problem 2. It seems to me that if Omega just makes different choices for the box in which to put the money, all decision theories will say "pick one at random" and will be equal.

In fact, the only reason I can see why Omega picks box 1 seems to be that the "pick at random" process of your TDT is exactly "pick the first one". Just replace it with something dependant on its internal clock (or any parameter not known at the time when Omega asks its question) and the problem disappears.

Omega's choice of box depends on its assessment of the simulated agent's choosing probabilities. The tie-breaking rule (if there are several boxes with equal lowest choosing probability, then select the one with the lowest label) is to an extent arbitrary, but it is important that there is some deterministic tie-breaking rule.

I also agree this is entirely a maths problem for Omega or for anyone whose decisions aren't entangled with the problem (with a proof that Box 1 will contain the $1 million). The difficulty is that a TDT agent can't treat it as a straight maths problem which is unlinked to its own decisions.

Why is it important that there is a deterministic breaking rule ? When you would like random numbers, isn't it always better to have a distribution as close as random as possible, even if it is pseudo-random ?

That question is perhaps stupid, I have the impression that I am missing something important...

Remember it is Omega implementing the tie-breaker rule, since it defines the problem.

The consequence of the tie-breaker is that the choosing agent knows that Omega's box-choice was a simple deterministic function of a mathematical calculation (or a proof). So the agent's uncertainty about which box contains the money is pure logical uncertainty.

Whoops... I can't believe I missed that. You are obviously right.

Omega (who experience has shown is always truthful) presents the usual two boxes A and B and announces the following. "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT.

There seems to be a contradiction here. If Omega siad this to me I would either have to believe omega just presented evidence of being untruthful some of the time.

If Omega simulated the problem at hand then in said simulation Omega must have siad: "Before you entered the room, I ran a simulation of this problem as presented to an agent running TDT." In the first simulation the statement is a lie.

Problem 2 has a similar problem.

It is not obvious that the problem can be reformulated to keep Omega constantly truthfully and still have CDT or EDT come out ahead of TDT.

Your difficulty seems to be with the parenthesis "(who experience has shown is always truthful)". The relevant experience here is going to be derived from real-world subjects who have been in Omega problems, exactly as is assumed for the standard Newcomb problem. It's not obvious that Omega always tells the truth to its simulations; no-one in the outside world has experience of that.

However you can construe the problem so that Omega doesn't have to lie, even to sims. Omega could always prefix its description of the problem with a little disclaimer "You may be one of my simulations. But if not, then...".

Or Omega could simulate a TDT agent making decisions as if it had just been given the problem description verbally by Omega, without Omega actually doing so. (Whether that's possible or not depends a bit on the simulation).

Omega could truthfully say "the contents of the boxes are exactly as if I'd presented this problem to an agent running TDT".

I do not know if Omega can say that truthfully because I do not know weather the self referential equation representing the problem has a solution.

The problems set out by the OP assumes there is a solution and a particular answer but with out writing out the equation and plugging in his solution to show the solution actually works.

There is a solution because Omega can get an answer by simulating TDT, or am I missing something?

It may or may not be proven that TDT settles on answers to questions involving TDT. If TDT doesn't get an answer, then TDT can't get an answer.

Presumably it is true that TDT settles but if it isn't proven, it may not be true; or it could be that the proof (i.e. a formalization of TDT) will provide insight that is currently lacking (such as cutting off after a certain level of resource use; can Omega emulate how many resources the current TDT agent will use? Can the TDT agent commit to using a random number of resources? Do true random-number generators exist? These problems might all be inextricable. Or they might not. I, for one, don't know.)

It may or may not be proven that TDT settles on answers to questions involving TDT.

We have several formalizations of UDT that would solve this problem correctly.

Having several formalizations is 90% of a proof, not 100% of a proof. Turn the formalization into a computer program AND either prove that it halts or run this simulation on it in finite time.

I believe that it's true that TDT will get an answer and hence Omega will get an answer, but WHY this is true relies on facts about TDT that I don't know (specifically facts about its implementation; maybe facts about differential topology that game-theoretic equilibrium results rely on.)

The linked posts have proofs that the programs halt and return the correct answer. Do you understand the proofs, or could you point out the areas that need more work? Many commenters seemed to understand them...

I do not understand the proofs, primarily because I have not put time in to trying to understand them.

I may have become somewhat defensive in these posts (or withdrawn I guess?) but looking back my original point was really to point out that, naively, asking whether the problem is well-defined is a reasonable question.

The questions in the OP set off alarm bells for me of "this type of question might be a badly-defined type of question" so asking whether these decisions are in the "halting domain" (is there an actual term for that?) of TDT seems like a reasonable question to ask before putting too much thought into other issues.

I believe the answer to be that yes these questions are in the "halting domain" of TDT, but I also believe that understanding what that is and why these questions are legitimate and the proofs that TDT halts will be central to any resolution of these problems.

What I'm really trying to say here is that it makes sense to ask these questions, but I don't understand why, so I think Davorak's question was reasonable, and your answer didn't feel complete to me. Looking back, I don't think I've contributed much to this conversation. Sorry!

There was this Rocko thing a while back (which is not supposed to be discussed), where if I understood that nonsense correctly, the idea was that the decision theories here would do equivalent to one-boxing on Newcomb with transparent boxes where you could see there is no million, when there's no million. (and where the boxes were made and sealed before you were born). It's not easy to one-box rationally.

Also in practice usually being simulated correctly is awesome for getting scammed (agents tend to face adversaries rather than crazed beneficiaries).