EDT with updating double counts

paulfchristiano

EDT with updating double counts

post by paulfchristiano · 2021-10-12T04:40:02.158Z · LW · GW · 12 comments

  Why EDT bets at 99.99% odds (under some conditions)
  Failure diagnosis
  “Updatelessness” as a feature of preferences
None
12 comments

I recently got confused thinking about the following case:

Calculator bet: I am offered the opportunity to bet on a mathematical statement X to which I initially assign 50% probability (perhaps X = 139926 is a quadratic residue modulo 314159). I have access to a calculator that is 99% reliable, i.e. it corrupts the answer 1% of the time at random. The calculator says that X is true. With what probability should I be willing to wager?

I think the answer is clearly “99%.” But a naive application of EDT can recommend betting with 99.99% probability. I think this is a mistake, and understanding the mistake helps clarify what it means to be “updateless” and why it’s essentially obligatory for EDT agents. My takeaway is that for an EDT agent, bayesian updating is a description of the expected utility calculation rather than something that EDT agent should do to form its beliefs before calculating expected utility.

Thanks to Joe Carlsmith and Katja Grace for the conversation that prompted this post. I suspect this point is well-known in the philosophy literature. I’ve seen related issues discussed in the rationalist community, especially in this sequence [? · GW] and this post [AF(p) · GW(p)] but found those a bit confusing—in particular, I think I initially glossed over how “SSA” was being used to refer to a view which rejects bayesian updating on observations (!) in this comment [AF(p) · GW(p)] and the linked paper. In general I’ve absorbed the idea that decision theory and anthropics had a weird interaction, but hadn’t noticed that exactly the same weirdness also applied in cases where the number of observers is constant across possible worlds.

Why EDT bets at 99.99% odds (under some conditions)

I’ll make four assumptions:

I have impartial values. Perhaps I’m making a wager where I can either make 1 person happy or 99 people happy—I just care about the total amount of happiness, not whether I am responsible for it. I’ll still describe the payoffs of the bets in $, but imagine that utility is a linear function of total $ earned by all copies of me.
We live in a very big universe where many copies of me all face the exact same decision. This seems plausible for a variety of reasons; the best one is accepting an interpretation of quantum mechanics without collapse (a popular view).
I handle logical uncertainty in the same way I handle empirical uncertainty. You could construct a similar case to the calculator bet using logical uncertainty, but the correlation across possible copies of me is clearest if I take a logical fact.
I form my beliefs E by updating on my observations. Then after updating I consider E[utility|I take action a] and E[utility|I take action a’] and choose the action with higher expected utility.

Under these assumptions, what happens if someone offers me a bet of $1 at 99.9% odds? If I take the bet I’ll gain $1 if X is true, but lose $1000 if X turns out to be false? Intuitively this is a very bad bet, because I “should” only have 99% confidence. But under these assumptions EDT thinks it’s a great deal.

To calculate utility, I need to sum up over a bunch of copies of me.
- Let N be the number of copies of me in the universe who are faced with this exact opportunity to bet decision.
- My decision is identical to the other copies of me who also observed their calculator say “X is true”.
- My decision may also be correlated with copies of me who made a different observation, or with totally different people doing totally different things, but those don’t change the bottom line and I’ll ignore them to keep life simple.
- So I’ll evaluate the total money earned by people who saw their calculator say “X is true” and whose decision is perfectly correlated with mine.
To calculate utility, I calculate the probability of X and then calculate expected utility
- First I update on the fact that my calculator says X is true. This observation has probability 99% if X is true and 1% if X is false. The prior probability of X was 50%, so the posterior probability is 99%.
- My utility is the total amount of money made by all N copies of me, averaged over the world where X is true (with 99% weight) and the world where X is false (with 1% weight)
So to calculate the utility conditioned on taking the bet, I ask two questions:
- Suppose that X is true, and I decide to take the bet. What is my utility then?
  If X is true, there are 0.99 N copies of me who all saw their calculator correctly say “X is true.” So I get $0.99 N
- Suppose that X is false, and I decide to take the bet. What is my utility then?
  If X is false, then there are 0.01N copies of me who saw their calculator incorrectly say “X is true.” So I lose $1000 * 0.01N = $10N.
- I think that there’s a 99% probability that X is true, so my expected utility is 99% x $0.99N – 1% x $10N = $0.88N.
If I don’t take the bet, none of my copies win or lose any money. So we get $0 utility, which is much worse than $0.88N.
Therefore I take the bet without thinking twice.

Failure diagnosis

Intuitively 99.99% is the wrong answer to this question. But it’s important to understand what actually went wrong. After all, intuitions could be mistaken and maybe big universes lead to weird conclusions (I endorse a few of those myself). Moreover, if you’re like me and think the “obvious” argument for EDT is compelling, this case might lead you to suspect something has gone wrong in your reasoning.

The intuitive problem is that we are “updating” on the calculator’s verdict twice:

First when we form our beliefs about whether X is true.
Second when we ask “If X is true, how many copies of me would have made the current observations, and therefore make a decision correlated with my own?”

The second “update” is pretty much inherent in the nature of EDT—if I care about the aggregate fate of all of the people like me, and if all of their decisions are correlated with mine, then I need to perform a sum over all of them and so I will care twice as much about possible worlds where there are twice as many of them. Rejecting this “update” basically means rejecting EDT.

The first “update” looks solid at first, since Bayesian updating given evidence seems like a really solid epistemic principle. But I claim this is actually where we ran into trouble. In my view there is an excellent simple argument for using EDT to make decisions, but there is no good argument for using beliefs formed by condition on your observations as the input into EDT.

This may sound a bit wild, but hear me out. The basic justification for updating is essentially decision-theoretic—either it’s about counting the observers across possible worlds who would have made your observations, or it’s about dutch book arguments constraining the probabilities with which you should bet. (As an example, see SEP on bayesian epistemology.) I’ve internalized these arguments enough that it can feel like a primitive bedrock of epistemology, but really they only really constrain how you should bet (or maybe what “you” should expect to see next)—they don’t say much about what you should “expect” in any observer-independent sense that would be relevant to a utility calculation for an impartial actor.

If you are an EDT agent, the right way to understand discussions of “updating” is as a description of the calculation done by EDT. Indeed, it’s common to use the word “belief” to refer to the odds at which you’d bet, in which case beliefs are the output of EDT rather than the input. Other epistemological principles do help constrain the input to EDT (e.g. principles about simplicity or parsimony or whatever), but not updating.

This is similar to the way that an EDT agent sees causal relationships: as helpful descriptions of what happens inside normatively correct decision making. Updating and causality may play a critical role in algorithms that implement normatively correct decision making, but they are not inputs into normatively correct decision making. Intuitions and classical arguments about the relevance of these concepts can be understood as what those algorithms feel like from the inside, as agents who have evolved to implement (rather than reason about) correct decision-making.

“Updatelessness” as a feature of preferences

On this perspective whether to be “updateless” isn’t really a free parameter in EDT—there is only one reasonable theory, which is to use the prior probabilities to evaluate conditional utilities given each possible decision that an agent with your nature and observations could make.

So what are we to make of cases like transparent newcomb that appear to separate EDT from UDT?

I currently think of this as a question of values or identity (though I think this is dicier than the earlier part of the post). Consider the following pair of cases to illustrate:

I am split into two copies A and B who will go on to live separate lives in separate (but otherwise identical) worlds. There is a button in front of each copy. If copy A presses the button, they will lose $1 and copy B will gain $2. If copy B presses the button, nothing happens. In this case, all versions of EDT will press the button. In some sense at this point the two copies must care about each other, since they don’t even know which one they are, and so the $1 of loss and $2 of gain can be compared directly.
But now suppose that copy A sees the letter “A” and copy B sees the letter “B.” Now no one cares what I do after seeing “B,” and if I see “A” the entire question is whether I care what happens to the other copy. The “updateless” answer is to care about all the copies of yourself who made different observations. The normal “selfish” answer is to care about only the copy of yourself who has made the same observations.

This framing makes it clear and relatively uninteresting why you should modify yourself to be updateless: any pair of agents agents could benefit from a bilateral commitment to value each other’s welfare. It’s just that A and B start off being the same, and so they happen to be in an exceptionally good position to make such a commitment, and it’s very clear what the “fair” agreement is.

What if the agents aren’t selfish? Say they both just want to maximize happiness?

If both agents exist and they are just in separate worlds, then there is no conflict between their values at all, and they always push the button.
Suppose that only one agent exists. Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world. But in this case I think the problem comes from the sketchy way we’re using the word “exist”—if copy B gets money based on copy A’s decision, then in what sense exactly does copy A “not exist”? What are we to make of the version of copy A who is doing the same reasoning, and is apparently wrong about whether or not they exist? I think these cases are confusing from a misuse of “existence” as a concept rather than updatelessness per se.

12 comments

Comments sorted by top scores.

comment by Lukas Finnveden (Lanrian) · 2021-10-12T11:02:07.332Z · LW(p) · GW(p)

Interesting! Here's one way to look at this:

EDT+SSA-with-a-minimal-reference-class behaves like UDT in anthropic dilemmas where updatelessness doesn't matter.
I think SSA with a minimal reference class is roughly equivalent to "notice that you exist; exclude all possible worlds where you don't exist; renormalize"
In large worlds where your observations have sufficient randomness that observers of all kinds exists in all worlds, the SSA update step cannot exclude any world. You're updateless by default. (This is the case in the 99% example above.)
In small or sufficiently deterministic worlds, the SSA update step can exclude some possible worlds.

In "normal" situations, the fact that it excludes worlds where you don't exist doesn't have any implications for your decisions — because your actions will normally not have any effects in worlds where you don't exist.
But in situations like transparent newcombs, this means that you will now not care about non-existent copies of yourself.

Basically, EDT behaves fine without updating. Excluding worlds where you don't exist is one kind of updating that you can do that doesn't change your behavior in normal situations. Whether you do this or not will determine whether you act updateless in situations like transparent newcomb that happen in small or sufficiently deterministic worlds. (In large and sufficiently random worlds, you'll act updateless regardless.)

Viewed like this, the SSA part of EDT+SSA looks unnecessary and strange. Especially since I think you do want to act updateless in situations like transparent newscomb.

Replies from: paulfchristiano, JBlack

↑ comment by paulfchristiano · 2021-10-12T15:14:03.168Z · LW(p) · GW(p)

I feel like the part where you "exclude worlds where 'you don't exist' " should probably amount to "exclude worlds where your current decision doesn't have any effects"---it's not clear in what sense you "don't exist" if you are perfectly correlated with something in the world. And of course renormalizing makes no difference, it's just expressing the fact that both sides of the bet get scaled down. So if that's your operationalization, then it's also just a description of something that automatically happens inside of the utility calculation.

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Replies from: Lanrian

↑ comment by Lukas Finnveden (Lanrian) · 2021-10-12T18:08:39.933Z · LW(p) · GW(p)

Yes, with that operationalisation, the update has no impact on actions. (Which makes it even more clear that the parsimonious choice is to skip it.)

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Yeah. It might be clearer to think about this as a 2-by-2 grid, with "Would you help a recent copy of yourself that has had one divergent experience from you?" on one axis and "Would you help a version of yourself that would naively be seen as non-existant?" (e.g. in transparent newcombs) on another.

It seems fairly clear that it's reasonable to answer "yes" to both of these.
It's possible that a selfish agent could sensibly answer "no" to both of them.

But perhaps we can exclude the other options.

Answering "yes" to the former and "no" to the latter would correspond to only caring about copies of yourself that 'exist' in the naive sense. (This is what the version of EDT+SSA that I wrote about it in my top-level comment would do.) Perhaps this could be excluded as relying on philosophical confusion about 'existence'.
Answer "no" to the former and "yes" to the latter might correspond to something like... only caring about versions of yourself that you have some particular kind of (counterfactual) continuity or connection with. (I'm making stuff up here.) Anyway, maybe this could be excluded as necessarily having to rely on some confusions about personal identity.

↑ comment by JBlack · 2021-10-13T05:52:06.269Z · LW(p) · GW(p)

Doesn't "sufficient randomness in observations" just mean that you split the possible worlds further by conditional probability of observations given actual world-state? You can still eliminate the ones where the observers don't observe what you observed.

For example "I observe that the calculator says NO" doesn't let you eliminate worlds where the correct answer is YES, but it does let you eliminate all worlds where you observe that the calculator says YES. So "notice that you (an observer who sees NO) exist; exclude all possible worlds where you don't exist (because observers in that world see YES); renormalize" still does some work.

comment by JBlack · 2021-10-13T05:20:34.342Z · LW(p) · GW(p)

I'm not sure what this is, but it's not EDT.

The correct decision under EDT is the action A that maximizes Sum P(O_i | A) U(O_i, A), where the O_i are the possible outcomes and U is the utility function given outcome O_i for action A. For this agent, the utility is not a function of O_i and A, so EDT cannot be applied.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2021-10-13T21:57:07.923Z · LW(p) · GW(p)

I'm using EDT to mean the agent that calculates expected utility conditioned on each statement of the form "I take action A" and then chooses the action for which the expected utility is highest. I'm not sure what you mean by saying the utility is not a function of O_i, isn't "how much money me and my copies earn" a function of the outcome?

(In your formulation I don't know what P(|A) means, given that A is an action and not an event, but if I interpret it as "Probability given that I take action A" then it looks like it's basically what I'm doing?)

Replies from: JBlack

↑ comment by JBlack · 2021-10-15T12:06:49.811Z · LW(p) · GW(p)

The "me and my copies" that this agent bases its utility on are split across possible worlds with different outcomes. EDT requires a function that maps an action and an outcome to a utility value, and no such function exists for this agent.

Edit: as an example, what is the utility of this agent winning $1000 in a game where they don't know the chance of winning? They don't even know themselves what their own utility is, because their utility doesn't just depend upon the outcome. If you credibly tell them afterward that they were nearly certain to win, they value the same $1000 very much greater than if you tell them that there was a 1 in a million chance that they would win.

For this sort of agent that values nonexistent and causally-disconnected people, we need some different class of decision theory altogether, and I'm not sure it can even be made rationally consistent.

comment by Ben123 · 2021-10-14T10:23:04.680Z · LW(p) · GW(p)

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet iff you do, using their own calculator to decide whether to bet on X or ¬X.

This reformulation is just the basic 0-clones problem repeated, and it recommends no bet.

if X, EVT = ¯100 =
0.99 × N winners × $10
- 0.01 × N losers × $1000
if ¬X, EVT = ¯100 =
0.99 × N winners × $10
- 0.01 × N losers × $1000

Now recall the "double count" calculation for the original problem.

if X, EVT = 9900 = 0.99 × N winners × $10
if ¬X, EVT = ¯10 = -0.01 × N losers × $1000

Notice what's missing: The winners when ¬X and, crucially, the losers when X. This is a real improvement in value -- if you're one of the clones when X is true, there's no longer any risk of losing money.

Replies from: nora-belrose

↑ comment by Nora Belrose (nora-belrose) · 2022-05-29T21:56:14.851Z · LW(p) · GW(p)

Yeah, I think this is right. It seems like the whole problem arises from ignoring the copies of you who see "X is false." If your prior on X is 0.5, then really the behavior of the clones that see "X is false" should be exactly analogous to yours, and if you're going to be a clone-altruist you should care about all the clones of you whose behavior and outcomes you can easily predict.

I should also point out that this whole setup assumes that there are 0.99N clones who see one calculator output and 0.01N clones who see the opposite, but that's really going to depend on what exact type of multiverse you're considering (quantum vs. inflationary vs. something else) and what type of randomness is injected into the calculator (classical or quantum). But if you include both the "X is true" and "X is false" copies then I think it ends up not mattering.

comment by Lukas Finnveden (Lanrian) · 2024-04-09T03:03:25.747Z · LW(p) · GW(p)

Maybe interesting: I think a similar double-counting problem would appear naturally if you tried to train an RL agent in a setting where:

"Reward" is proportional to an estimate of some impartial measure of goodness.
There are multiple identical copies of your RL algorithm (including: they all use the same random seed for exploration).

In a repeated version of the calculator example (importantly: where in each iteration, you randomly decide whether the people who saw "true" get offered a bet or the people who saw "false" get offered a bet — never both), the RL algorithms would learn that, indeed:

99% of the time, they're in the group where the calculator doesn't make an error
and on average, when they get offered a bet, they will get more reward afterwards if they take it than if they don't.

The reason that this happens is because, when the RL agents lose money, there's fewer agents that associate negative reinforcement with having taken a bet just-before. Whereas whenever they gain money, there's more agents that associate positive reinforcement with having taken a bet just-before. So the total amount of reinforcement is greater in the latter case, so the RL agents learn to bet. (Despite how this loses them money on average.)

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-25T22:59:41.460Z · LW(p) · GW(p)

Suppose I conclude from this that I shouldn't update.

How does this interact with logical updates / learning more math / conceiving of additional hypotheses / etc.? A priori stuff. Can I still learn/evolve my credences over time in those ways, or do I have to freeze them also?

comment by ESRogs · 2021-10-14T19:10:46.054Z · LW(p) · GW(p)

Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world.

Was that supposed to be "seeing button 'A'"? (since A was the one who stands to lose a dollar, and B the one who stands to gain a dollar)

EDT with updating double counts

Contents

Why EDT bets at 99.99% odds (under some conditions)

Failure diagnosis

“Updatelessness” as a feature of preferences

12 comments