# Is Agent Simulates Predictor a "fair" problem?

post by Chris_Leong · 2019-01-24T13:18:13.745Z · score: 22 (6 votes) · LW · GW · 20 commentsThis is a question post.

## Contents

Answers Wei_Dai's answer cousin_it's answer jessicata's answer G Gordon Worley III's answer

It's a simple question, but I think it might help if I add in context. In the paper introducing Functional Decision Theory, it is noted that it is impossible to design an algorithm that can perform well on all decision problems since some of them can be specified to be blatantly unfair, ie. punish every agent that isn't an alphabetical decision theorist.

The question then arises, how do we define which problems are or are not fair? We start by noting that some people consider Newcomb's-like problems to be unfair since your outcome depends on a predictor's prediction, which is rooted in an analysis of your algorithm. So what makes this case any different from only rewarding the alphabetical decision theorist?

The paper answers that the prediction only depends on the decision you end up making and that any other internal details are ignored. So it only cares about your decision and not how you come to it, the problem seems fair. I'm inclined to agree with this reasoning, but a similar line of reasoning doesn't seem to hold with Agent Simulates Predictor. Here the algorithm you use is relevant as the predictor can only predict the agent if it's algorithm is less than a certain level of complexity, otherwise it may make a mistake.

Please note that this question isn't about whether this problem is worth considering; life is often unfair and we have to deal with it the best that we can. The question is about whether the problem is "fair", where I roughly understand "fair" meaning that this is in a certain class of problems that I can't specify at this moment (I suspect it would require its own seperate post) where we should be able to achieve the optimal result in each problem.

My thinking about this is that a problem is fair if it captures some aspect of some real world problem. I believe Gary Drescher came up with ASP as a distillation of the following problem, which itself tries to capture some essense of bargaining in the real world (similar to how Newcomb's Problem is a distillation of Prisoner's Dilemma, which tries to capture some essense of cooperation in the real world):

Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of UDT. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:

"I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear."

The human writes down 9, and UDT, predicting this, prescribes writing down 1. This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.

(It looks like the citation here is wrong, since I can't find a description of this game in Slepnev (2011). As far as I know, I was the first person to come up with this game as something that UDT seems to handle poorly.)

I've defined three classes of "fair" problems for UDT, which are all basically equivalent: single player extensive form games, programs with a halting oracle, and formulas in provability logic. But none of these are plain old programs without oracles or stuff. I haven't been able to define any class of "fair" problems involving plain old programs. The most I can do is agree with you: ASP doesn't seem "fair" in spirit and doesn't translate into any of the classes I mentioned. This is an open question - maybe you can find a better "fair" class!

There are some formal notions of fairness that include ASP. See Asymptotic Decision Theory.

Here's one way of thinking about this. Imagine a long sequence of instances of ASP. Both the agent and predictor in a later instance know what happened in all the earlier instances (say, because the amount of compute available in later instances is much higher, such that all previous instances can be simulated). The predictor in ASP is a logical inductor predicting what the agent will do this time.

Looking at the problem this way, it looks pretty fair. Since logical inductors can do induction, if an agent takes actions according to a certain policy, then the predictor will eventually learn this, regardless of the agent's source code. So only the policy matters, not the source code.

To my mind what seems unfair about some problems is that they propose predictors that, to the best of our knowledge, are physically impossible, like a Newcomb Omega that never makes a mistake, although these are only unfair in the sense that they depict scenarios we won't ever encounter (perfect predictors), not that they ask us something mathematically unfair.

Other more mundane types of unfairness, like where a predictor simply demands something so specific that no general algorithm could always find a way to satisfy it, seem more fair to me because they are the sorts of things we actually encounter in the real world. If you haven't encountered this sort of thing, just spend some time with a toddler, and you will be quickly disabused of the notion that there could not exist an agent which demands impossible things.

## 20 comments

Comments sorted by top scores.

Where does the 'Agent Simulates Predictor' problem come from?

How can a predictor be unfair to an algorithm that enumerates possible worlds and picks the best one, without any "decision theory" whatsoever? Unless by "unfair" you mean something like "you will get a coin that always lands tails, but the heads win, while everyone else gets a fair coin"

I don't quite understand the question, but unfair refers to the environment requiring the internals to be a particular way. I actually think it is possible to allow some internal requirements to be considered fair and I discuss this in one of my draft posts. Nonetheless, it works as a first approximation.

Say you have certain information about the world and calculate the odds of different outcomes and their utilities. For example, in the twin prisoners dilemma the odds of DC and CD are zero, so the choice is between DD and CC. In the Newcomb's problem the odds of getting $1001000 are zero, so the choice is between $1000000 (one-box) and $1000 (two-box). In the Death in Damascus problem the odds of escaping Death are zero, so the choice is to spend money on travel or not. What would be a concrete example of an unfair problem against this approach?

I think this comment does a better job of explaining the notion of fairness you're trying to point at than other words here.

BTW, I published the draft [LW · GW], although fairness isn't the main topic and only comes up towards the end.

It's impossible to enumerate possible worlds and pick the best one without a decision theory, because your decision process gives the same output in every possible world where you have a given epistemic state. We obviously need counterfactuals to make decisions, and the different decision theories can be seen as different theories about how counterfactuals work.

ASP doesn't seem impossible to solve (in the sense of having a decision theory that handles it well and not at the expense of doing poorly on other problems) so why define a class of "fair" problems that excludes it? I mean defining such classes may be useful for talking about the technical properties of specific decision theories, but the motivation here seems to be finding a way to justify not solving ASP.