EDT with updating double counts

post by paulfchristiano · 2021-10-12T04:40:02.158Z · LW · GW · 12 comments

Contents

  Why EDT bets at 99.99% odds (under some conditions)
  Failure diagnosis
  “Updatelessness” as a feature of preferences
None
12 comments

I recently got confused thinking about the following case:

Calculator bet: I am offered the opportunity to bet on a mathematical statement X to which I initially assign 50% probability (perhaps X = 139926 is a quadratic residue modulo 314159). I have access to a calculator that is 99% reliable, i.e. it corrupts the answer 1% of the time at random. The calculator says that X is true. With what probability should I be willing to wager?

I think the answer is clearly “99%.” But a naive application of EDT can recommend betting with 99.99% probability. I think this is a mistake, and understanding the mistake helps clarify what it means to be “updateless” and why it’s essentially obligatory for EDT agents. My takeaway is that for an EDT agent, bayesian updating is a description of the expected utility calculation rather than something that EDT agent should do to form its beliefs before calculating expected utility.

Thanks to Joe Carlsmith and Katja Grace for the conversation that prompted this post. I suspect this point is well-known in the philosophy literature. I’ve seen related issues discussed in the rationalist community, especially in this sequence [? · GW] and this post [AF(p) · GW(p)] but found those a bit confusing—in particular, I think I initially glossed over how “SSA” was being used to refer to a view which rejects bayesian updating on observations (!) in this comment [AF(p) · GW(p)] and the linked paper. In general I’ve absorbed the idea that decision theory and anthropics had a weird interaction, but hadn’t noticed that exactly the same weirdness also applied in cases where the number of observers is constant across possible worlds.

Why EDT bets at 99.99% odds (under some conditions)

I’ll make four assumptions:

Under these assumptions, what happens if someone offers me a bet of $1 at 99.9% odds? If I take the bet I’ll gain $1 if X is true, but lose $1000 if X turns out to be false? Intuitively this is a very bad bet, because I “should” only have 99% confidence. But under these assumptions EDT thinks it’s a great deal.

Failure diagnosis

Intuitively 99.99% is the wrong answer to this question. But it’s important to understand what actually went wrong. After all, intuitions could be mistaken and maybe big universes lead to weird conclusions (I endorse a few of those myself). Moreover, if you’re like me and think the “obvious” argument for EDT is compelling, this case might lead you to suspect something has gone wrong in your reasoning.

The intuitive problem is that we are “updating” on the calculator’s verdict twice:

The second “update” is pretty much inherent in the nature of EDT—if I care about the aggregate fate of all of the people like me, and if all of their decisions are correlated with mine, then I need to perform a sum over all of them and so I will care twice as much about possible worlds where there are twice as many of them. Rejecting this “update” basically means rejecting EDT.

The first “update” looks solid at first, since Bayesian updating given evidence seems like a really solid epistemic principle. But I claim this is actually where we ran into trouble. In my view there is an excellent simple argument for using EDT to make decisions, but there is no good argument for using beliefs formed by condition on your observations as the input into EDT.

This may sound a bit wild, but hear me out. The basic justification for updating is essentially decision-theoretic—either it’s about counting the observers across possible worlds who would have made your observations, or it’s about dutch book arguments constraining the probabilities with which you should bet. (As an example, see SEP on bayesian epistemology.) I’ve internalized these arguments enough that it can feel like a primitive bedrock of epistemology, but really they only really constrain how you should bet (or maybe what “you” should expect to see next)—they don’t say much about what you should “expect” in any observer-independent sense that would be relevant to a utility calculation for an impartial actor.

If you are an EDT agent, the right way to understand discussions of “updating” is as a description of the calculation done by EDT. Indeed, it’s common to use the word “belief” to refer to the odds at which you’d bet, in which case beliefs are the output of EDT rather than the input. Other epistemological principles do help constrain the input to EDT (e.g. principles about simplicity or parsimony or whatever), but not updating.

This is similar to the way that an EDT agent sees causal relationships: as helpful descriptions of what happens inside normatively correct decision making. Updating and causality may play a critical role in algorithms that implement normatively correct decision making, but they are not inputs into normatively correct decision making. Intuitions and classical arguments about the relevance of these concepts can be understood as what those algorithms feel like from the inside, as agents who have evolved to implement (rather than reason about) correct decision-making.

“Updatelessness” as a feature of preferences

On this perspective whether to be “updateless” isn’t really a free parameter in EDT—there is only one reasonable theory, which is to use the prior probabilities to evaluate conditional utilities given each possible decision that an agent with your nature and observations could make.

So what are we to make of cases like transparent newcomb that appear to separate EDT from UDT?

I currently think of this as a question of values or identity (though I think this is dicier than the earlier part of the post). Consider the following pair of cases to illustrate:

This framing makes it clear and relatively uninteresting why you should modify yourself to be updateless: any pair of agents agents could benefit from a bilateral commitment to value each other’s welfare. It’s just that A and B start off being the same, and so they happen to be in an exceptionally good position to make such a commitment, and it’s very clear what the “fair” agreement is.

What if the agents aren’t selfish? Say they both just want to maximize happiness?

12 comments

Comments sorted by top scores.

comment by Lukas Finnveden (Lanrian) · 2021-10-12T11:02:07.332Z · LW(p) · GW(p)

Interesting! Here's one way to look at this:

  • EDT+SSA-with-a-minimal-reference-class behaves like UDT in anthropic dilemmas where updatelessness doesn't matter.
  • I think SSA with a minimal reference class is roughly equivalent to "notice that you exist; exclude all possible worlds where you don't exist; renormalize"
  • In large worlds where your observations have sufficient randomness that observers of all kinds exists in all worlds, the SSA update step cannot exclude any world. You're updateless by default. (This is the case in the 99% example above.)
  • In small or sufficiently deterministic worlds, the SSA update step can exclude some possible worlds.
    • In "normal" situations, the fact that it excludes worlds where you don't exist doesn't have any implications for your decisions — because your actions will normally not have any effects in worlds where you don't exist.
    • But in situations like transparent newcombs, this means that you will now not care about non-existent copies of yourself.

Basically, EDT behaves fine without updating. Excluding worlds where you don't exist is one kind of updating that you can do that doesn't change your behavior in normal situations. Whether you do this or not will determine whether you act updateless in situations like transparent newcomb that happen in small or sufficiently deterministic worlds. (In large and sufficiently random worlds, you'll act updateless regardless.)

Viewed like this, the SSA part of EDT+SSA looks unnecessary and strange. Especially since I think you do want to act updateless in situations like transparent newscomb.

Replies from: paulfchristiano, JBlack
comment by paulfchristiano · 2021-10-12T15:14:03.168Z · LW(p) · GW(p)

I feel like the part where you "exclude worlds where 'you don't exist' " should probably amount to "exclude worlds where your current decision doesn't have any effects"---it's not clear in what sense you "don't exist" if you are perfectly correlated with something in the world.  And of course renormalizing makes no difference, it's just expressing the fact that both sides of the bet get scaled down. So if that's your operationalization, then it's also just a description of something that automatically happens inside of the utility calculation.

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Replies from: Lanrian
comment by Lukas Finnveden (Lanrian) · 2021-10-12T18:08:39.933Z · LW(p) · GW(p)

Yes, with that operationalisation, the update has no impact on actions. (Which makes it even more clear that the parsimonious choice is to skip it.)

(I do think it's unclear whether selfish agents "should" be updateless in transparent newcomb.)

Yeah. It might be clearer to think about this as a 2-by-2 grid, with "Would you help a recent copy of yourself that has had one divergent experience from you?" on one axis and "Would you help a version of yourself that would naively be seen as non-existant?" (e.g. in transparent newcombs) on another.

  • It seems fairly clear that it's reasonable to answer "yes" to both of these.
  • It's possible that a selfish agent could sensibly answer "no" to both of them.

But perhaps we can exclude the other options.

  • Answering "yes" to the former and "no" to the latter would correspond to only caring about copies of yourself that 'exist' in the naive sense. (This is what the version of EDT+SSA that I wrote about it in my top-level comment would do.) Perhaps this could be excluded as relying on philosophical confusion about 'existence'.
  • Answer "no" to the former and "yes" to the latter might correspond to something like... only caring about versions of yourself that you have some particular kind of (counterfactual) continuity or connection with. (I'm making stuff up here.) Anyway, maybe this could be excluded as necessarily having to rely on some confusions about personal identity.
comment by JBlack · 2021-10-13T05:52:06.269Z · LW(p) · GW(p)

Doesn't "sufficient randomness in observations" just mean that you split the possible worlds further by conditional probability of observations given actual world-state? You can still eliminate the ones where the observers don't observe what you observed.

For example "I observe that the calculator says NO" doesn't let you eliminate worlds where the correct answer is YES, but it does let you eliminate all worlds where you observe that the calculator says YES. So "notice that you (an observer who sees NO) exist; exclude all possible worlds where you don't exist (because observers in that world see YES); renormalize" still does some work.

comment by JBlack · 2021-10-13T05:20:34.342Z · LW(p) · GW(p)

I'm not sure what this is, but it's not EDT.

The correct decision under EDT is the action A that maximizes Sum P(O_i | A) U(O_i, A), where the O_i are the possible outcomes and U is the utility function given outcome O_i for action A. For this agent, the utility is not a function of O_i and A, so EDT cannot be applied.

Replies from: paulfchristiano
comment by paulfchristiano · 2021-10-13T21:57:07.923Z · LW(p) · GW(p)

I'm using EDT to mean the agent that calculates expected utility conditioned on each statement of the form "I take action A" and then chooses the action for which the expected utility is highest. I'm not sure what you mean by saying the utility is not a function of O_i, isn't "how much money me and my copies earn" a function of the outcome?

(In your formulation I don't know what P(|A) means, given that A is an action and not an event, but if I interpret it as "Probability given that I take action A" then it looks like it's basically what I'm doing?)

Replies from: JBlack
comment by JBlack · 2021-10-15T12:06:49.811Z · LW(p) · GW(p)

The "me and my copies" that this agent bases its utility on are split across possible worlds with different outcomes. EDT requires a function that maps an action and an outcome to a utility value, and no such function exists for this agent.

Edit: as an example, what is the utility of this agent winning $1000 in a game where they don't know the chance of winning? They don't even know themselves what their own utility is, because their utility doesn't just depend upon the outcome. If you credibly tell them afterward that they were nearly certain to win, they value the same $1000 very much greater than if you tell them that there was a 1 in a million chance that they would win.

For this sort of agent that values nonexistent and causally-disconnected people, we need some different class of decision theory altogether, and I'm not sure it can even be made rationally consistent.

comment by Ben123 · 2021-10-14T10:23:04.680Z · LW(p) · GW(p)

I think the agent should take the bet, and the double counting is actually justified. Epistemic status: Sleep deprived.

The number of clones that end up betting along with the agent is an additional effect of its decision that EDT-with update is correctly accounting for. Since "calculator says X" is evidence that "X = true", selecting only clones that saw "calc says X" gives you better odds. What seems like a superfluous second update is really an essential step -- computing the number of clones in each branch.

Consider this modification: All N clones bet iff you do, using their own calculator to decide whether to bet on X or ¬X.

This reformulation is just the basic 0-clones problem repeated, and it recommends no bet.

if X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000
if ¬X, EVT = ¯100 = 
      0.99 × N winners × $10
    - 0.01 × N losers × $1000

Now recall the "double count" calculation for the original problem.

if X, EVT = 9900 = 0.99 × N winners × $10
if ¬X, EVT = ¯10 = -0.01 × N losers × $1000

Notice what's missing: The winners when ¬X and, crucially, the losers when X. This is a real improvement in value -- if you're one of the clones when X is true, there's no longer any risk of losing money. 


 

Replies from: nora-belrose
comment by Nora Belrose (nora-belrose) · 2022-05-29T21:56:14.851Z · LW(p) · GW(p)

Yeah, I think this is right. It seems like the whole problem arises from ignoring the copies of you who see "X is false." If your prior on X is 0.5, then really the behavior of the clones that see "X is false" should be exactly analogous to yours, and if you're going to be a clone-altruist you should care about all the clones of you whose behavior and outcomes you can easily predict.

I should also point out that this whole setup assumes that there are 0.99N clones who see one calculator output and 0.01N clones who see the opposite, but that's really going to depend on what exact type of multiverse you're considering (quantum vs. inflationary vs. something else) and what type of randomness is injected into the calculator (classical or quantum). But if you include both the "X is true" and "X is false" copies then I think it ends up not mattering.

comment by Lukas Finnveden (Lanrian) · 2024-04-09T03:03:25.747Z · LW(p) · GW(p)

Maybe interesting: I think a similar double-counting problem would appear naturally if you tried to train an RL agent in a setting where:

  • "Reward" is proportional to an estimate of some impartial measure of goodness.
  • There are multiple identical copies of your RL algorithm (including: they all use the same random seed for exploration).

In a repeated version of the calculator example (importantly: where in each iteration, you randomly decide whether the people who saw "true" get offered a bet or the people who saw "false" get offered a bet — never both), the RL algorithms would learn that, indeed:

  • 99% of the time, they're in the group where the calculator doesn't make an error
  • and on average, when they get offered a bet, they will get more reward afterwards if they take it than if they don't.

The reason that this happens is because, when the RL agents lose money, there's fewer agents that associate negative reinforcement with having taken a bet just-before. Whereas whenever they gain money, there's more agents that associate positive reinforcement with having taken a bet just-before. So the total amount of reinforcement is greater in the latter case, so the RL agents learn to bet. (Despite how this loses them money on average.)

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-10-25T22:59:41.460Z · LW(p) · GW(p)

Suppose I conclude from this that I shouldn't update. 

How does this interact with logical updates / learning more math / conceiving of additional hypotheses / etc.? A priori stuff. Can I still learn/evolve my credences over time in those ways, or do I have to freeze them also?

comment by ESRogs · 2021-10-14T19:10:46.054Z · LW(p) · GW(p)

Then it feels weird, seeing button “B,” to press the button knowing that it causes you to lose $1 in the real, actually-existing world.

Was that supposed to be "seeing button 'A'"? (since A was the one who stands to lose a dollar, and B the one who stands to gain a dollar)