EDT with updating double counts

post by paulfchristiano · 2021-10-12T04:40:02.158Z · LW · GW · 9 comments

Contents

  Why EDT bets at 99.99% odds (under some conditions)
  Failure diagnosis
  “Updatelessness” as a feature of preferences
None
9 comments

I recently got confused thinking about the following case:

Calculator bet: I am offered the opportunity to bet on a mathematical statement X to which I initially assign 50% probability (perhaps X = 139926 is a quadratic residue modulo 314159). I have access to a calculator that is 99% reliable, i.e. it corrupts the answer 1% of the time at random. The calculator says that X is true. With what probability should I be willing to wager?

I think the answer is clearly “99%.” But a naive application of EDT can recommend betting with 99.99% probability. I think this is a mistake, and understanding the mistake helps clarify what it means to be “updateless” and why it’s essentially obligatory for EDT agents. My takeaway is that for an EDT agent, bayesian updating is a description of the expected utility calculation rather than something that EDT agent should do to form its beliefs before calculating expected utility.

Thanks to Joe Carlsmith and Katja Grace for the conversation that prompted this post. I suspect this point is well-known in the philosophy literature. I’ve seen related issues discussed in the rationalist community, especially in this sequence [? · GW] and this post [AF(p) · GW(p)] but found those a bit confusing—in particular, I think I initially glossed over how “SSA” was being used to refer to a view which rejects bayesian updating on observations (!) in this comment [AF(p) · GW(p)] and the linked paper. In general I’ve absorbed the idea that decision theory and anthropics had a weird interaction, but hadn’t noticed that exactly the same weirdness also applied in cases where the number of observers is constant across possible worlds.

Why EDT bets at 99.99% odds (under some conditions)

I’ll make four assumptions:

Under these assumptions, what happens if someone offers me a bet of $1 at 99.9% odds? If I take the bet I’ll gain $1 if X is true, but lose $1000 if X turns out to be false? Intuitively this is a very bad bet, because I “should” only have 99% confidence. But under these assumptions EDT thinks it’s a great deal.

Failure diagnosis

Intuitively 99.99% is the wrong answer to this question. But it’s important to understand what actually went wrong. After all, intuitions could be mistaken and maybe big universes lead to weird conclusions (I endorse a few of those myself). Moreover, if you’re like me and think the “obvious” argument for EDT is compelling, this case might lead you to suspect something has gone wrong in your reasoning.

The intuitive problem is that we are “updating” on the calculator’s verdict twice:

The second “update” is pretty much inherent in the nature of EDT—if I care about the aggregate fate of all of the people like me, and if all of their decisions are correlated with mine, then I need to perform a sum over all of them and so I will care twice as much about possible worlds where there are twice as many of them. Rejecting this “update” basically means rejecting EDT.

The first “update” looks solid at first, since Bayesian updating given evidence seems like a really solid epistemic principle. But I claim this is actually where we ran into trouble. In my view there is an excellent simple argument for using EDT to make decisions, but there is no good argument for using beliefs formed by condition on your observations as the input into EDT.

This may sound a bit wild, but hear me out. The basic justification for updating is essentially decision-theoretic—either it’s about counting the observers across possible worlds who would have made your observations, or it’s about dutch book arguments constraining the probabilities with which you should bet. (As an example, see SEP on bayesian epistemology.) I’ve internalized these arguments enough that it can feel like a primitive bedrock of epistemology, but really they only really constrain how you should bet (or maybe what “you” should expect to see next)—they don’t say much about what you should “expect” in any observer-independent sense that would be relevant to a utility calculation for an impartial actor.

If you are an EDT agent, the right way to understand discussions of “updating” is as a description of the calculation done by EDT. Indeed, it’s common to use the word “belief” to refer to the odds at which you’d bet, in which case beliefs are the output of EDT rather than the input. Other epistemological principles do help constrain the input to EDT (e.g. principles about simplicity or parsimony or whatever), but not updating.

This is similar to the way that an EDT agent sees causal relationships: as helpful descriptions of what happens inside normatively correct decision making. Updating and causality may play a critical role in algorithms that implement normatively correct decision making, but they are not inputs into normatively correct decision making. Intuitions and classical arguments about the relevance of these concepts can be understood as what those algorithms feel like from the inside, as agents who have evolved to implement (rather than reason about) correct decision-making.

“Updatelessness” as a feature of preferences

On this perspective whether to be “updateless” isn’t really a free parameter in EDT—there is only one reasonable theory, which is to use the prior probabilities to evaluate conditional utilities given each possible decision that an agent with your nature and observations could make.

So what are we to make of cases like transparent newcomb that appear to separate EDT from UDT?

I currently think of this as a question of values or identity (though I think this is dicier than the earlier part of the post). Consider the following pair of cases to illustrate:

This framing makes it clear and relatively uninteresting why you should modify yourself to be updateless: any pair of agents agents could benefit from a bilateral commitment to value each other’s welfare. It’s just that A and B start off being the same, and so they happen to be in an exceptionally good position to make such a commitment, and it’s very clear what the “fair” agreement is.

What if the agents aren’t selfish? Say they both just want to maximize happiness?

9 comments

Comments sorted by top scores.

comment by JBlack · 2021-10-13T05:20:34.342Z · LW(p) · GW(p)

I'm not sure what this is, but it's not EDT.

The correct decision under EDT is the action A that maximizes Sum P(O_i | A) U(O_i, A), where the O_i are the possible outcomes and U is the utility function given outcome O_i for action A. For this agent, the utility is not a function of O_i and A, so EDT cannot be applied.

Replies from: paulfchristiano
comment by paulfchristiano · 2021-10-13T21:57:07.923Z · LW(p) · GW(p)

I'm using EDT to mean the agent that calculates expected utility conditioned on each statement of the form "I take action A" and then chooses the action for which the expected utility is highest. I'm not sure what you mean by saying the utility is not a function of O_i, isn't "how much money me and my copies earn" a function of the outcome?

(In your formulation I don't know what P(|A) means, given that A is an action and not an event, but if I interpret it as "Probability given that I take action A" then it looks like it's basically what I'm doing?)

Replies from: JBlack
comment by JBlack · 2021-10-15T12:06:49.811Z · LW(p) · GW(p)

The "me and my copies" that this agent bases its utility on are split across possible worlds with different outcomes. EDT requires a function that maps an action and an outcome to a utility value, and no such function exists for this agent.

Edit: as an example, what is the utility of this agent winning $1000 in a game where they don't know the chance of winning? They don't even know themselves what their own utility is, because their utility doesn't just depend upon the outcome. If you credibly tell them afterward that they were nearly certain to win, they value the same $1000 very much greater than if you tell them that there was a 1 in a million chance that they would win.

For this sort of agent that values nonexistent and causally-disconnected people, we need some different class of decision theory altogether, and I'm not sure it can even be made rationally consistent.

comment by Lanrian · 2021-10-12T11:02:07.332Z · LW(p) · GW(p)

Interesting! Here's one way to look at this:

  • EDT+SSA-with-a-minimal-reference-class behaves like UDT in anthropic dilemmas where updatelessness doesn't matter.
  • I think SSA with a minimal reference class is roughly equivalent to "notice that you exist; exclude all possible worlds where you don't exist; renormalize"
  • In large worlds where your observations have sufficient randomness that observers of all kinds exists in all worlds, the SSA update step cannot exclude any world. You're updateless by default. (This is the case in the 99% example above.)
  • In small or sufficiently deterministic worlds, the SSA update step can exclude some possible worlds.
    • In "normal" situations, the fact that it excludes worlds where you don't exist doesn't have any implications for your decisions — because your actions will normally not have any effects in worlds where you don't exist.
    • But in situations like transparent newcombs, this means that you will now not care about non-existent copies of yourself.

Basically, EDT behaves fine without updating. Excluding worlds where you don't exist is one kind of updating that you can do that doesn't change your behavior in normal situations. Whether you do this or not will determine whether you act updateless in situations like transparent newcomb that happen in small or sufficiently deterministic worlds. (In large and sufficiently random worlds, you'll act updateless regardless.)

Viewed like this, the SSA part of EDT+SSA looks unnecessary and strange. Especially since I think you do want to act updateless in situations like transparent newscomb.

Replies from: paulfchristiano, JBlack
comment by paulfchristiano · 2021-10-12T15:14:03.168Z · LW(p) · GW(p)

I feel like the part where you "exclude worlds where 'you don't exist' " should probably amount to "exclude worlds where your current decision doesn't have any effects"---it's not clear in what sense you "don't exist" if you are perfectly correlated with something in the world.  And of course renormalizing makes no difference, it's just expressing the fact that both sides of the bet get scaled down. So if that's your operationalization, then it's also just a description of something that automatically happens inside of the utility calculation.

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Replies from: Lanrian
comment by Lanrian · 2021-10-12T18:08:39.933Z · LW(p) · GW(p)

Yes, with that operationalisation, the update has no impact on actions. (Which makes it even more clear that the parsimonious choice is to skip it.)

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Yeah. It might be clearer to think about this as a 2-by-2 grid, with "Would you help a recent copy of yourself that has had one divergent experience from you?" on one axis and "Would you help a version of yourself that would naively be seen as non-existant?" (e.g. in transparent newcombs) on another.

  • It seems fairly clear that it's reasonable to answer "yes" to both of these.
  • It's possible that a selfish agent could sensibly answer "no" to both of them.

But perhaps we can exclude the other options.

  • Answering "yes" to the former and "no" to the latter would correspond to only caring about copies of yourself that 'exist' in the naive sense. (This is what the version of EDT+SSA that I wrote about it in my top-level comment would do.) Perhaps this could be excluded as relying on philosophical confusion about 'existence'.
  • Answer "no" to the former and "yes" to the latter might correspond to something like... only caring about versions of yourself that you have some particular kind of (counterfactual) continuity or connection with. (I'm making stuff up here.) Anyway, maybe this could be excluded as necessarily having to rely on some confusions about personal identity.
comment by JBlack · 2021-10-13T05:52:06.269Z · LW(p) · GW(p)

Doesn't "sufficient randomness in observations" just mean that you split the possible worlds further by conditional probability of observations given actual world-state? You can still eliminate the ones where the observers don't observe what you observed.

For example "I observe that the calculator says NO" doesn't let you eliminate worlds where the correct answer is YES, but it does let you eliminate all worlds where you observe that the calculator says YES. So "notice that you (an observer who sees NO) exist; exclude all possible worlds where you don't exist (because observers in that world see YES); renormalize" still does some work.

comment by ESRogs · 2021-10-14T19:10:46.054Z · LW(p) · GW(p)

Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world.

Was that supposed to be "seeing button 'A'"? (since A was the one who stands to lose a dollar, and B the one who stands to gain a dollar)

comment by Ben123 · 2021-10-14T10:23:04.680Z · LW(p) · GW(p)

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet iff you do, using their own calculator to decide whether to bet on X or ¬X.

This reformulation is just the basic 0-clones problem repeated, and it recommends no bet.

if X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000
if ¬X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000

Now recall the "double count" calculation for the original problem.

if X, EVT = 9900 = 0.99 × N winners × $10
if ¬X, EVT = ¯10 = -0.01 × N losers × $1000

Notice what's missing: The winners when ¬X and, crucially, the losers when X. This is a real improvement in value -- if you're one of the clones when X is true, there's no longer any risk of losing money.