# Conservation of Expected Evidence

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2007-08-13T15:55:26.000Z · score: 110 (91 votes) · LW · GW · Legacy · 80 commentsFriedrich Spee von Langenfeld, a priest who heard the confessions of condemned witches, wrote in 1631 the *Cautio Criminalis* (“prudence in criminal cases”), in which he bitingly described the decision tree for condemning accused witches: If the witch had led an evil and improper life, she was guilty; if she had led a good and proper life, this too was a proof, for witches dissemble and try to appear especially virtuous. After the woman was put in prison: if she was afraid, this proved her guilt; if she was not afraid, this proved her guilt, for witches characteristically pretend innocence and wear a bold front. Or on hearing of a denunciation of witchcraft against her, she might seek flight or remain; if she ran, that proved her guilt; if she remained, the devil had detained her so she could not get away.

Spee acted as confessor to many witches; he was thus in a position to observe *every* branch of the accusation tree, that no matter *what* the accused witch said or did, it was held as proof against her. In any individual case, you would only hear one branch of the dilemma. It is for this reason that scientists write down their experimental predictions in advance.

But *you can’t have it both ways* —as a matter of probability theory, not mere fairness. The rule that “absence of evidence *is* evidence of absence” is a special case of a more general law, which I would name Conservation of Expected Evidence: the *expectation* of the posterior probability, after viewing the evidence, must equal the prior probability.

*Therefore,* for every expectation of evidence, there is an equal and opposite expectation of counterevidence.

If you expect a strong probability of seeing weak evidence in one direction, it must be balanced by a weak expectation of seeing strong evidence in the other direction. If you’re very confident in your theory, and therefore anticipate seeing an outcome that matches your hypothesis, this can only provide a very small increment to your belief (it is already close to 1); but the unexpected failure of your prediction would (and must) deal your confidence a huge blow. On *average*, you must expect to be *exactly* as confident as when you started out. Equivalently, the mere *expectation* of encountering evidence—before you’ve actually seen it—should not shift your prior beliefs.

So if you claim that “no sabotage” is evidence *for* the existence of a Japanese-American Fifth Column, you must conversely hold that seeing sabotage would argue *against* a Fifth Column. If you claim that “a good and proper life” is evidence that a woman is a witch, then an evil and improper life must be evidence that she is not a witch. If you argue that God, to test humanity’s faith, refuses to reveal His existence, then the miracles described in the Bible must argue against the existence of God.

Doesn’t quite sound right, does it? Pay attention to that feeling of *this seems a little forced*, that quiet strain in the back of your mind. It’s important.

For a true Bayesian, it is impossible to seek evidence that *confirms* a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on *average*) than before. You can only ever seek evidence to *test* a theory, not to confirm it.

This realization can take quite a load off your mind. You need not worry about how to interpret every possible experimental result to confirm your theory. You needn’t bother planning how to make *any* given iota of evidence confirm your theory, because you know that for every expectation of evidence, there is an equal and oppositive expectation of counterevidence. If you try to weaken the counterevidence of a possible “abnormal” observation, you can only do it by weakening the support of a “normal” observation, to a precisely equal and opposite degree. It is a zero-sum game. No matter how you connive, no matter how you argue, no matter how you strategize, you can’t possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.

You might as well sit back and relax while you wait for the evidence to come in.

. . . Human psychology is *so* screwed up.

## 80 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

One minor correction, Eliezer: the link to your essay uses the text "An Intuitive Expectation of Bayesian Reasoning." I think you titled that essay "An Intuitive EXPLANATION of Bayesian Reasoning." (I am 99.9999% sure of this, and would therefore pay especial attention to any evidence inconsistent with this proposition.)

I guess I was a Bayesian before I knew what it meant....

Perhaps this formulation is nice:

0 = (P(H|E)-P(H))*P(E) + (P(H|~E)-P(H))*P(~E)

The expected change in probability is zero (for if you expected change you would have already changed).

Since P(E) and P(~E) are both positive, to maintain balance if P(H|E)-P(H) < 0 then P(H|~E)-P(H) > 0. If P(E) is large then P(~E) is small, so (P(H|~E)-P(H)) must be large to counteract (P(H|E)-P(H)) and maintain balance.

Hey, sorry if it's mad trivial, but may I ask for a derivation of this? You can start with "P(H) = P(H|E)P(E) + P(H|~E)P(~E)" if that makes it shorter.

(edit):

Never mind, I just did it. I'll post it for you in case anyone else wonders.

1} P(H) = P(H|E)P(E) + P(H|~E)P(~E) [CEE]

2} P(H)P(E) + P(H)P(~E) = P(H|E)P(E) + P(H|~E)P(~E) [because ab + (1-a)b = b]

3} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [subtract P(H) from every value to be weighted]

4} (P(H) - P(H))P(E) + (P(H) - P(H))P(~E) = P(H) - P(H) = 0 [because ab + (1-a)b = b]

(conclusion)

5} 0 = (P(H|E) - P(H))P(E) + (P(H|~E) - P(H))P(~E) [by identity syllogism from lines 3 and 4]

P(H) = P(H|E)P(E) + P(H|~E)P(~E)

P(H)*(P(E)+P(~E))=P(H|E)P(E) + P(H|~E)P(~E)

P(H)P(E)+P(H)P(~E)=P(H|E)P(E) + P(H|~E)P(~E)

P(H)P(~E)=(P(H|E)-P(H))*P(E) + P(H|~E)P(~E)

0=(P(H|E)-P(H))*P(E) + (P(H|~E)-P(H))*P(~E)

The trick is that P(E)+P(~E)=1, and so you can multiply the left side by the sum and the right side by 1.

Eliezer,

Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?

More precisely, I see no reason why there need be no change in the confidence level. As long as the probability is greater than 50% in one direction or the other, I have an expectation of a certain outcome. So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?

One reason is Cox's theorem, which shows any quantitative measure of plausibility must obey the axioms of probability theory. Then this result, conservation of expected evidence, is a theorem.

What is the "confidence level"? Why is 50% special here?

"Of course you are assuming a strong form of Bayesianism here. Why do we have to accept that strong form?"

Because it's mathematically proven. You might as well ask "Why do we have to accept the strong form of arithmetic?"

"So, if some evidence slightly moves the expectation in a particular direction, but does not push it across the 50% line from wherever it started, what is the big whoop?"

Because (in this case especially!) small probabilities can have large consequences. If we invent a marvelous new cure for acne, with a 1% chance of death to the patient, it's well below 50% and no specific person using the "medication" would *expect* to die, but no sane doctor would ever sanction such a "medication".

"Why is 50% special here?"

People seem to have a little arrow in their heads saying whether they "believe in" or "don't believe in" a proposition. If there are two possibilities, 50% is the point at which the little arrow goes from "not believe" to "believe".

People seem to have a little arrow in their heads saying whether they "believe in" or "don't believe in" a proposition. If there are two possibilities, 50% is the point at which the little arrow goes from "not believe" to "believe".

And if I am following you, this is irrational. Correct?

[Bayesianism i]s mathematically proven.

More importantly, it's physically proven. The fact that the math is consistent (and elegant!) would not have been so powerful if it wasn't also true, particularly since Bayesianism implies some very surprising predictions.

Fortunately, it is the happy case that, to the best of my knowledge, no experiments thus far contradict Bayesianism, and not for the lack of trying, which is as much proof as physically possible.

Fortunately, it is the happy case that, to the best of my knowledge, no experiments thus far contradict Bayesianism, and not for the lack of trying, which is as much proof as physically possible.

Foundational issues like Bayesianism run into the old philosophy of science problems with a vengeance: which part of the total assortment of theory and observation do you choose to throw out? If someone proves a paradox in Bayesianism, do you shrug and start looking at alternatives - or do you 'defy the evidence' and patiently wait for an E.T. Jaynes to come along and explain how the paradox stems from taking an imprior limit or failing to take into account prior information etc.?

(I'll adopt the seemingly rationalist trait of never taking questions as rhetorical, though both your questions strongly have that flavor).

A central part of the modern scientific method is due to Popper, who gave an essentially Bayesian answer to your first question. However, Science wouldn't fall apart if it turned out that priors aren't a physical reality. Occam's razor is non-Bayesian, and it alone accounts for a large portion of our scientific intuitions. At the bottom line, the scientific method doesn't have to be itself true in order to be effective in discovering truths and discarding falsehoods.

The concept of "proving a paradox" is unclear to me (almost a paradox in itself...). Paradoxes are mirages. Also, it seems that you have some specific piece of scientific history in mind, but I'm uncertain which.

Luckily, we did have Jaynes and others to promote what I believe to be both a compelling mathematical framework and a physical reality. Before them, well, it would be wishful to think I could hold on to Bayesian ideas in the face of apparent paradoxes. The shoulders of giants etc.

Occam's Razor is non-Bayesian? Correct me if I'm wrong, but I thought it falls naturally out of Bayesian model comparison, from the normalization factors, or "Occam factors." As I remember, the argument is something like: given two models with independent parameters {A} and {A,B}, the P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) <= P(A model).

Even if the argument is wrong, I think the result ends up being that more plausible models tend to have fewer independent parameters.

You're not really wrong. The thing is that "Occam's razor" is a conceptual principle, not one mathematically defined law. A certain (subjectively very appealing) formulation of it does follow from Bayesianism.

P(AB model) \propto P(AB are correct) and P(A model) \propto P(A is correct). Then P(AB model) <= P(A model).

Your math is a bit off, but I understand what you mean. If we have two sets of models, with no prior information to discriminate between their members, then the prior gives less probability to each model in the larger set than in the smaller one.

More generally, if deciding that model 1 is true gives you more information than deciding that model 2 is true, that means that the maximum entropy given model 1 is lower than that given model 2, which in turn means (under the maximum entropy principle) that model 1 was a-priori less likely.

Anyway, this is all besides the discussion that inspired my previous comment. My point was that even without Popper and Jaynes to enlighten us, science was making progress using other methods of rationality, among which is a myriad of non-Bayesian interpretations of Occam's razor.

How does deciding one model is true give you more information? Did you mean "If a model allows you to make more predictions about future observations, then it is a priori less likely?"

How does deciding one model is true give you more information?

Let's assume a strong version of Bayesianism, which entails the maximum entropy principle. So our belief is the one that has the maximum entropy, among those consistent with our prior information. If we now add the information that some model is true, this generally invalidate our previous belief, making the new maximum-entropy belief one of lower entropy. The reduction in entropy is the amount of information you gain by learning the model. In a way, this is a cost we pay for "narrowing" our belief.

The upside of it is that it tells us something useful about the future. Of course, not all information regarding the world is relevant for future observations. The part that doesn't help control our anticipation is failing to pay rent, and should be evacuated. The part that does inform us about the future may be useful enough to be worth the cost we pay in taking in new information.

I'll expand on all of this in my sequence on reinforcement learning.

At what point does the decision "This is true" diverge from the observation "There is very strong evidence for this", other than in cases where the model is accepted as true *despite* a lack of strong evidence?

I'm not discussing the case where a model goes from unknown to known- how does *deciding* to believe a model give you more information than knowing what the model is and the reason for the model. To better model an actual agent, one could replace all of the knowledge about why the model is true with the value of the strength of the supporting knowledge.

How does deciding that things always fall down give you more information than observing things fall down?

I believe the idea was to ask "hypothetically, if I found out that this hypothesis was true, how much new information would that give me?"

You'll have two or more hypotheses, and one of them is the one that would (hypothetically) give you the least amount of new information. The one that would give you the least amount of new information should be considered the "simplest" hypothesis. (assuming a certain definition of "simplest", and a certain definition of "information")

Crystal clear. Sorry to distract from the point.

Because it's mathematically proven.

It's based on premises that may or may not be accurate. Just because it's mathematically proven, doesn't mean it's true.

Aaron, fixed.

Eliezer, when you're lost in an unfamiliar neighbourhood, do sit back, relax and wait for evidence of your location to come in? Obviously not, since you're still alive and haven't yet starved to death. Well guess what, none of *my* direct ancestors starved to death before they reproduced either. That's a scientific fact, and it just goes to show that when it comes to the thinking game, nothing succeeds like success.

And success is not the same as accuracy, except in a mystical world of spherical cows of uniform density. In the real world, the value of a perfectly correct decision which takes an infinate amount of time to evaluate is exactly 0. [Note the lack of units - renormalize if you dare.] I have empirical data showing that this world contains distinctly heterogenous cows. Reprints available upon request.

Tom,

Bayes' Theorem has its limits. The support must be continuous, the dimensionality must be finite. Some of the discussion here has raised issues here that could be relevant to these kinds of conditiosn, such as fuzziness about the truth or falsity of H. This is not as straightforward as you claim it is.

Furthermore, I remind one and all that Bayes' Theorem is asymptotic. Even if the conditions hold, the "true" probability is approached only in the infinite time horizon. This could occur so slowly that it might stay on the "wrong" side of 50% well past the time that any finite viewer might hang around to watch.

There is also the black swan problem. It could move in the wrong direction until the black swan datum finally shows up pushing it in the other direction, which, again, may not occur during the time period someone is observing. This black swan question is exactly the frame of discussion here, as it is Taleb who has gone on and on about this business about evidence and absence thereof.

You cannot predict a black swan. That's why it can screw up your expectation.

However, once you have a black swan you'd be an irrational fool not to include it in your expectation.

That's the point. That's why theories get updated - new data that nobody was aware of before does not match expectations. This new evidence adjusts the probability that the theory was correct, and it gets thrown out if a different theory now has a higher probability in light of the new evidence.

This is not a shortcoming of Bayes Theorem, it's a shortcoming of observation. *That* you should certainly be aware of. I.e. "I might not have all the facts."

*you can't possibly expect the resulting game plan to shift your beliefs (on average) in a particular direction.*

But you can act to change the probability distribution of your future beliefs (just not its mean). That's the entire point of testing a belief. If you have a 50% belief that a ball is under a certain cup, then by lifting the cup, you can be certain than your future belief will be in the set {0%,100%} (with equal probability for 0 and 100, hence the same mean as now).

Getting the right shape of the probability distribution of future belief is the whole skill in testing a hypothesis.

*But you can't have it both ways - as a matter of probability theory, not mere fairness.*

You've proved your case - but there's still enough wriggle room that it won't make much practical difference. One example from global warming, which predicts higher temperature on average in Europe - unless it diverts the gulf stream, in which case it predicts lower average temperatures. Consider the two statements: 1) If average temperatures go up in Europe, or down, this is evidence for global warming. 2) If average temperatures go up in Europe, and the gulf stream isn't diverted, or average temperatures go down, while the gulf stream is diverted, this is evidence of global warming.

1) is nonsense, 2) is true. Lots of people say statements that sound like 1), when they mean something like 2). Add an extra detail, and the symmetry is broken.

This weakens the practical power of your point; if an accused witch is afraid, that shows she's guilty; if she's not afraid, in a way which causes the inquisitor to be suspicious, she's also guilty. That argument is flawed, but it isn't a logical flaw (since the similar statement 2) is true).

Then we're back to arguing the legitimacy of these "extra details".

Stuart, if the extra details are observable and *specified in advance*, the legitimacy is clear-cut.

Barkley, I'm an infinite set atheist, all real-world problems are finite; and you seem to be assuming that priors are arbitrary but likelihood ratios are fixed eternal and known, which is a strange position; and in any case what does that have to do with something as simple as Conservation of Expected Evidence? If anyone attempts to make an infinite-set scenario that violates CEE, it disproves their setup by reductio ad absurdum, and reinforces the ancient wisdom of E. T. Jaynes that no infinity may be assumed except as the proven limit of a finite problem.

Eliezer,

I do not necessarily believe that likelihood ratios are fixed for all time. The part of me that is Bayesian tends to the radically subjective form a la Keynes.

Also, I am a fan of nonstandard analysis. So, I have no problem with infinities that are not mere limits.

Eliezer,

I just googled "law of conservation of expected evidence." This blog came up. Nothing else like it. Frankly, I don't think you are selling a law here. You are asserting one that nobody else is aware of.

*a more general law, which I would name Conservation of Expected Evidence*

I thought it was pretty clear that I was coining the phrase. I'm certainly not the first person to point out the law. E.g. Robin notes that our best estimate of anything should have no predictable trend. In any case, I posted the mathematical derivation and you certainly don't have to take my word about anything.

Eliezer,

Fair enough. You get credit, then, for coining the term. However, the problem remains, why should that equals sign be there? Sure, if you put it there, the logic holds up, my niggles about Bayes' Theorem and time to convergence and all that aside. But, it is not clear at all that the equals sign should be there, or is there in any meaningfully regular way. Your defense has been to cite an essentially empirical argument by Robin. But that empirical argument is much contested in many arenas. Sure, Burton Malkiel posed that financial markets are a random walk, but that argument has undergone a lot of modifications since he first posed it in a best-selling paperback. In that regard, your proof essentially amounts to one of these "proofs" of the existence of God, wherein the proof arises from another assumption that gets snuck in the backdoor that gets one the result, but that is itself as questionable or unprovable, much like the old complaint by Joan Robinson about the magician making a big deal about pulling the rabbit out of the hat after having put it into the hat in full view of the audience.

Barkley, it looks to me like Eli derived it using the sum and product rules of probability theory.

What Peter said. Barkley, do you question that P(H) = P(H,E) + P(H, ~E) or do you question that P(H,E) = P(H|E)*P(E)?

Eliezer and Peter, I think the problem is statics versus dynamics. Your set of equations are correct only at a specific point in time, which makes them irrelevant to saying anything about what happens later when new information arrives. That would entail subscripting H by time. For any given t, sure. But, that says nothing about what happens when new information arrives. P(H) might change.

The obvious example is indeed the black swan story, which we all know is what is lying behind this discussion. So, at a point in time before black swans are observed, let H be "all swans are white." Perhaps there were a few folks who thought this might not be true, so say P(H) was 95%. Sure, your equations hold at a point in time, but so what? The minute the word comes in about the observation of a black swan (assuming it is accepted), P(H) just went to zero, or not much above zero, perhaps after having gradually drifted over time to 95%. Remember, your story was one about new information coming in and changes over time. But that is not what your equations are about.

This is the fatal flaw in your nice new law, Eliezer.

...

Barkley, you don't realize that Bayes's Theorem is *precisely* what describes the *normative* update in beliefs over time? That this is the *whole point* of Bayes's Theorem?

Before black swans were observed, no one expected to encounter a black swan, and everyone expected to encounter another white swan on occasion. A black swan is huge evidence against, a white swan is tiny additional evidence for. Had they been normative, the two quantities would have balanced exactly.

I'm not sure what to say here. Maybe point to *Probability Theory: The Logic of Science* or A Technical Explanation of Technical Explanation? I don't know where this misunderstanding is coming from, but I'm learning a valuable lesson in how much Bayesian algebra someone can know without realizing which material phenomena it describes.

"no one expected to encounter a white swan, and everyone expected to encounter another black swan on occasion. A white swan is huge evidence against, a black swan is tiny additional evidence for." I presume you meant the reverse of this?

Oops, fixed.

per the Black Swan:

The set of potential multicolored variations of Swans is infinite (purple, brown, grey, blue, green, etc). We can not prove any one of them do not exist. But every day that proceeds where we don't see these swans gives us a higher probability they do not exist. It never equals 1, but it's darn close.

The problem with the Black Swan parable is not that it's untrue, but rather unimportant. The set of things we have no evidence of is infinite. To then pounce across an unexpected observation (eg, a Black Swan, that Kevin Federline is a relatively good parent, last week's liquidity run on mortgage lenders), and say, "aha! You were all wrong!" merely sets up a staw man, that everything we reasonably don't anticipate and plan for is assumed to have had a probability of zero.

In reality, when you want to pay money for extreme events you overpay, that is, the implied probability is overweighted because sellers can't insure against these events. London bookmakers offer only 250-1 odds against a perpetual motion machine being discovered, 100-1 that aliens won't be proven. In option markets you have a volatility smile so that extreme events get higher and higher implied volatilities as you move away from the mean, meaning their probability is not assumed Gaussian.

The bottom line is that "absence of evidence is not evidence of absence" merely uses hindsight to attack a caricature of beliefs, and seems to suggests something practically important. In practice, people lose money on lottery tickets (or hurricane insurance, or buing a 3-delta put), so exploiting this is a fool's game.

Eliezer,

This is about to scroll off, but, frankly, I do not know what you mean by "normative" in this context. The usual usage of this term implies statements about values or norms. I do not see that anything about this has anything to do with values or norms. Perhaps I do not understand the "wholel point of Bayes' Theorem." Then again, I do not see anything in your reply that actually counters the argument I made.

Bottom line: I think your "law" is only true by assumption.

What I mean, Barkley, is that the expression P(H|E), as held at time t=0, *should* - normatively - describe the belief about H you will hold at time t=2 if you see evidence E at time t=1. Thus, statements true in probability theory about the decomposition of P(H) imply the *normative* law of Conservation of Expected Evidence, if you accept that probability theory is normative for real-world problems where no one has ever seen an infinite set.

If you don't think probability theory is valid in the real world, I have some Dutch Book trades I'd like to make with you. But that's a separate topic, and in any case, most readers of this blog will at least *understand what I intend to convey* when I speak from within the view that probability theory is normative.

Eliezer Yudkowsky, The word "normative" has stood in the way of my understanding what you mean, at least the first few times I saw you use it, before I pegged you as getting it from the heuristics and biases people. It greatly confused me many times when I first encountered them. It's jargon, so it shouldn't be surprising that different fields use it to mean rather different things.

The heuristics and biases people use it to mean "correct," because social scientists aren't allowed to use that word. I think there's a valuable lesson about academics, institutions, or taboos in there, but I'm not sure what it is. As far as I can tell, they are the only people that use it this way.

My dictionary defines normative as "of, relating to, or prescribing a norm or standard." It's confusing enough that it carries those two or three meanings, but to make it mean "correct" as well is asking for trouble or in-groups.

I agree - it can be especially ambiguous if you're also used to the economics context of normative, meaning "how subjectively desirable something is".

This post was one of the most helpful for me personally, but I recently realized this isn't true in an absolute sense: "There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before."

Suppose the statement "I perform action A" is more probable given position P than given not-P. Then if I start planning to perform action A, this will be evidence that I will perform A. Therefore it will also be evidence for position P. So there is some plan that I can devise such that I can expect my confidence in P to be higher than before I devised the plan.

In general, of course, unless P is a position relating to my actions or habits, this effect will not be very large.

Um, no, if a study shows that people who chew gum also have a gene GXTP27 or whatever, which also protects against cancer, I cannot plan to increase my subjective probability that I have gene GXTP27 by starting to chew gum.

See also: "evidential decision theory", why nearly all decision theorists do not believe in.

Here's an example which doesn't bear on Conservation of Expected Evidence as math, but does bear on the statement,

"There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before."

taken at face value.

It's called the Cable Guy Paradox; it was created by Alan Hájek, a philosopher the Australian National University. (I personally think the term Paradox is a little strong for this scenario.)

Here it is: the cable guy is coming tomorrow, but cannot say exactly when. He may arrive any time between 8 am and 4 pm. You and a friend agree that the probability density for his arrival should be uniform over that interval. Your friend challenges you to a bet: even money for the event that the cable guy arrives before noon. You get to pick which side of the bet you want to take -- by expected utility, you should be indifferent. Here's the curious thing: if you pick the morning bet, then almost surely there will be times in the morning when you would prefer to switch to the afternoon bet.

This would seem to be a situation in which "you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before," even though the equation P(H) = P(H|E)*P(E) + P(H|~E)*P(~E) is not violated. I'm not sure, but I think it's due to multiple possible interpretations of the word "before".

Here's the curious thing: if you pick the morning bet, then almost surely there will be times in the morning when you would prefer to switch to the afternoon bet.

You either have a new interval, or new information suggesting the probability density for the interval has changed.

Conservation of Expected Evidence does not mean Ignorance of Observed Evidence.

This is just a restatement of the black swan problem, and it's a non-issue. If evidence does not exist yet it does not exist yet. It doesn't cast doubt on your methods of reasoning, nor does it allow you make a baseless guess of what might come in the future.

If you count the amount of "wanting to switch" you expect to have because the cable guy hasn't arrived yet, it should equal exactly the amount of "wishing you hadn't been wrong" you expect to have if you pick the second half because the cable guy arrived before your window started.

I'm not sure how to say this so it's more easily parseable, but this equality is *exactly what conservation of expected evidence describes*.

At 10am tomorrow, I can legitimately express my confidence in the proposition "the cable guy will arrive after noon" is different to what it was today.

There are two cases to consider:

- The cable guy arrived before 10am (occurs with 25% probability). In this case, I expect that he has a close on zero probability of arriving after noon.
- The cable guy is known not to have arrived before 10am (occurs with 75% probability). At this point, I calculate that the odds of the cable guy turning up after noon are two in three.

But none of this takes anything away from the original statement:

"There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before."

This is because I am changing my probability estimate on the basis of *new information received* - it's not a fixed proposition.

Eliezer - what if the presence of the gene was decided by an omnipotent being called Omega? Then you'd break out the Spearmint, right?

I'll modify my advice. If the probability that "I do action A in order to increase my subjective probability of position P" is greater given P than given not P, then doing A in order to increase my subjective probability of position P will be evidence in favor of P.

So in many cases, there will such a plan that I can devise. Let's see Eliezer find a way out of this one.

Let's say you are organising a polar expedition. It will succeed (A) or fail (~A). There is a postulate that there are no man eating polar Cthulhu in the area (P). If there are some (~P), the expedition will fail (~A), thus entangling A with P.

You can do your best to prepare the expedition so that it will not fail for non-Cthulhu reasons, strengthening the entanglement - ~A becomes stronger evidence for ~P. You can also do your best to prepare the expedition to survive even the man eating polar Cthulhu, weakening the entanglement - by introducing a higher probability of A&~P, we're making A weaker evidence for P.

Do any of these preparations, in themselves, actually influence the amount of man eating polar Cthulhu in the area?

Before you have actually done A, since it might fail because of ~P (which is what the thing you said actually means), your confidence is still the same as before you came up with the plan. We're still at t=0. Information about your plan succeeding or not hasn't arrived yet.

Now if over the course of planning you realize that the very ability you have to make the plan shifts probability estimate of P, then we've already got the new evidence. We're at t=1, and the probability has shifted rightfully without violating the law. The evidence is no longer expected, it's already here!

Before you started planning, you didn't know that you would succeed and get this information. Not for certain. Or if you did, your estimate of probability of P was clearly wrong, but you hadn't noticed it yet, where the "yet" is the time factor that distinguishes between t0 and t1 again...

Can't cheat your way out of this at t=0, I'm afraid.

Actually, the Omega situation is a perfect example. Someone facing the two boxes would like to increase his subjective probability that there is a million in the second box, and he is able to do this by deciding to take only the second box. If he decides to take both, on the other hand, he should decrease his credence in the presence of the million, even before opening the box.

In this case, the decision he's leaning toward is evidence of the presence of $1M, by way of Omega's observed reliability in predicting decisions of agents like him.

Fantastic heuristic! It's like x=y·(z/y)+(1-y)·(x-z)/(1-y) for the rationalist's soul :)

It's worth noting, though, that you **can** rationally expect your credence in a certain belief "to increase", in the following sense: If I roll a die, and I'm about to show you the result, your credence that it didn't land 6 is now 5/6, and you're 5/6 sure that this credence it about to increase to 1.

I think this is what makes people feel like they can have a non-trivial expected value for their new beliefs: you can *expect an increase* or *expect a decrease*, but quantitatively the two possibilities exactly cancel each out in the *expected value* of your belief.

It's worth noting, though, that you can rationally expect your credence in a certain belief "to increase", in the following sense: If I roll a die, and I'm about to show you the result, your credence that it didn't land 6 is now 5/6, and you're 5/6 sure that this credence it about to increase to 1.

No, you can't, because you also expect with 1/6 probability that your credence will go down to zero: 5/6 + (5/6 *1/6) + (1/6 * -5/6) = 5/6.

In order to fully understand this concept, it helped me to think about it this way: any evidence shifting your expectated change in confidence will necessarily cause a corresponding shift in your actual confidence. Suppose you hold some belief B with confidence C. Now some new experiment is being performed that will produce more data about B. If you had some prior evidence that the new data is expected to shift your confidence to C', that same evidence would already have shifted C to C', thus maintaining the conservation of expected evidence.

Consider the following example: initially, if someone were to ask you to bet on the veracity of B, you would choose odds C:(1-C). Suppose an oracle reveals to you that there is a 1/3 chance of the new data shifting your confidence to C+ and a 2/3 chance of it shifting to C-, giving C'=(C + (C+)/3 - 2C(-)/3). What would you then consider to be fair odds on B's correctness?

I have a theory that I will post this comment. By posting the comment, I'm seeking evidence to confirm the theory. If I post the comment, my probability will be higher than before.

Similarly, in Newcomb's problem, I seek evidence that box A has a million dollars, so I refrain from taking box B. There was money in box B, but I didn't take it, because that would give me evidence that box A was empty.

In short, there's one exception to this: when your choice is the evidence.

The simple answer is that your choice is also probabilistic. Let's say that your disposition is one that would make it very likely you will choose to take only box A. Then this fact about yourself becomes evidence for the proposition that A contains a million dollars. Likewise if your disposition was to take both, it would provide evidence that A was empty.

Now let's say that you're pretty damn certain that this Omega guy is who he says he is, and that he was able to predict this disposition of yours; then, noting your decision to take only A stands as strong evidence that the box contains the million dollars. Likewise with the decision to take both.

But what if, you say, I already expected to be the kind of person who would take only box A? That is, that the probability distribution over my expected dispositions was 95% only box A and 5% both boxes? Well then it follows that your prior over the contents of box A will be 95% that is contains the million and 5% that it is empty. And as a result, the likely case of you actually choosing to take only box A need only have a small effect on your expectation of the contents of the box (~.05 change to reach ~1), but in the case that you introspect and find that really, you're the kind of person who would take both, then your expectation that the box has a million dollars will drop by exactly 19(=.95/.05) times as much as it would get raised by the opposite evidence (resulting in ~0 chance that it contains the million). Making the less likely choice will create a much greater change in expectation, while the more common choice will induce a smaller change (since you already expected the result of that choice).

Hope that made sense.

There is more discussion of this post here as part of the Rerunning the Sequences series.

Wouldn't the rule be something more like:

((P(H|E) > P(H)) if and only if (P(H) > P(H|~E))) and ((P(H|E) = P(H)) if and only if (P(H) = P(H|~E)))

So, if some statement is evidence of a hypothesis, its negation must be evidence against. And if some statement's truth value is independent of a hypothesis, then so is that statements negation.

This is implied by the expectation of posterior probabilities version. Since P(E) + P(~E) = 1, that means that P(H|E) and P(H|~E) are either equal, or one is greater than P(H) and one is less than. If they were both less than P(H), then P(H|E)P(E)+P(H|~E)P(~E) would have a lesser value than the largest conditional probability in that formula; suppose P(H|E) is the greater one, then P(H|E)P(E)+P(H|~E)P(~E) < P(H|E) and P(H|E) < P(H), so P(H|E)P(E)+P(H|~E)P(~E) ≠ P(H). If they are both larger than P(H), then P(H|E)P(E)+P(H|~E)P(~E) must be larger than the smallest conditional probability in that formula; suppose that P(H|E) is the smaller one, then we have P(H|E)P(E)+P(H|~E)P(~E) > P(H|E), and P(H|E) > P(H), so P(H) ≠ P(H|E)P(E)+P(H|~E)P(~E). And if both posterior probabilities are equal, then P(H|E)P(E)+P(H|~E)P(~E) = P(H|E), and both posteriors must eqaul the prior. Q.e.d.

I think that the formula that expresses the prior as the average of the posterior probability weighted by the probabilities of observing that evidence and not observing that evidence, is a great way to express the point of this article. But it might not be trivial for everyone to get:

((P(H|E) > P(H)) if and only if (P(H) > P(H|~E))) and ((P(H|E) = P(H)) if and only if (P(H) = P(H|~E)))

from

P(H) = P(H|E)P(E) + P(H|~E)P(~E)

That something is evidence in favor if and only if its negation is evidence against, and that some result is independent of some hypothesis if and only if not observing that result is independent of that hypothesis, are the take home messages of this post as far as i can tell. The law that "P(H) = P(H|E)P(E) + P(H|~E)P(~E)" says more than that, it also tells you how to get P(H|~E) from P(H|E), P(H) and P(E). But adding the boolean statement and its proof from the weighted average statement to the post, or at least to a comment on this post, not even necessarily using the boolean symbols or formalisms, might help a lot of students that come across this long after algebra class. I know it would have helped me.

Hi, I'm new here but I've been following the sequences in the suggested order up to this point.

I have no problem with the main idea of this article. I say this only so that everyone knows that I'm nitpicking. If you're not interested in nitpicking then just ignore this post.

I don't think that the example given bellow is a very good one to demonstrate the concept of Conservation of Expected Evidence:

If you argue that God, to test humanity's faith, refuses to reveal His existence, then the miracles described in the Bible >must argue against the existence of God.

Assuming I'm reading this correctly:

Our Prior is P(G) = The probability that God Exists (let's assume this is the Judeo-Christain God since that seems to be the intended target)

P(T) = the probability that God is Testing Humanity by not revealing his existence

P(M) = the probability that the Miracles of the bible are true.

The issue that I find with this is that P(G|T) = 1

If God is testing Humanity by hiding his existence there is a 100% chance that God Exists. I was going to write out the whole Bayesian equation to explain why this is true, but I think it's pretty intuitive. P(T) cannot be evidence for P(G) since it assumes that P(G) is true.

Another issue is that the way this is written you're implying that P(M) = P(~T). But this is not true, since the Miracles of the bible existing is not the direct opposite of God testing humanity by not reveling his existence. Unless you intend to completely twist the argument that most people are making when they say assert P(T) as truth. They aren't saying or even implying that God wants there to be no evidence at all of his existence. Most theist would instead argue that the existence of miracles are a part of God's test for humanity. They say that God sent us miraculous signs and prophets instead of just coming down and saying "Hey humanity, I'm God" because he wanted to test our faith. Had they the mathematical language, they would say that P(T|M) > P(T), meaning M serves as evidence of T. Not P(M) = P(~T)

Though this whole concept of God "testing humanity by not revealing himself" does seem more like an example of Belief in belief, where P(T) was devised as a means to justify the existence of an invisible God, I still feel like the example you've given is a bit of a stretch.

I would say, rather, that:

G = God exists

N = The existence of God is not revealed directly to humanity

M = Miracles occur

...and we're talking about P(G|N) and P(G|M) and not talking about P(T) at all.

More generally, T seems to be a red herring here.

That said, I agree that there's a presumption that M implies ~N... that is, that if miracles occurred, that would constitute the direct revelation of God's existence.

And yes, one could argue instead that no, miracles aren't a revelation of God's existence at all, but rather a test of faith. A lot depends here on what counts as a miracle; further discussion along this line would benefit from specificity.

I agree that T in and of itself is problematic.

Your N seems more likely what the author intended, now that you point it out.

Though I still don't think anyone who thought about it for more than 20 seconds would ever assert that N could be used as evidence for G.

But using that as a model would probably serve well to underscore the point of Conservation of Evidence

If the fact that God has not been revealed directly to humanity is evidence for the existence of God. Then should God ever reveal himself directly to humanity, it would be evidence against his existence.

That's probably the statement Eliezer intended to make.

(nods)

And I would not be in the least surprised to find theologians arguing that the absence of direct evidence of God's existence is itself proof of the existence of God, and I would be somewhat surprised to find that none ever had, but I don't have examples.

That said, straw theism is not particularly uncommon on LW; when people want a go-to example of invalid reasoning, belief in god comes readily to hand. It derives from a common cultural presumption of atheism, although there are some theists around.

Is this the same as Jaynes' method for construction of a prior using transformation invariance on acquisition of new evidence?

Does conservation of expected evidence always uniquely determine a probability distribution? If so, it should eliminate a bunch of extraneous methods of construction of priors. For example, you would immediately know if an application of MaxEnt was justified.

**[deleted]**· 2013-07-10T16:38:05.661Z · score: 0 (0 votes) · LW · GW

Therefore, for every expectation of evidence, there is an equal and opposite expectation of counter-evidence.

Eliezer, isn't the "equal" part untrue? I like the parallel with Newton's 3rd law, but the two terms P(H|E)*P(E) and P(H|~E)*P(~E) *aren't* numerically equal - we only know that they sum to P(H).

P(H) is the belief where you start, and P(H|E) and P(H|~E) are the possible beliefs where you end. You could go to one with probability P(E) and to the other with probability P(~E), but due to the identity you quote, in expectation you do not move at all.

The *changes* are equal and opposite:

[ P(H|E) - P(H) ]*P(E) + [ P(H|~E) - P(H) ]*P(~E) = 0

See Nick Hay's much earlier comment.

For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.

Old post, but isn't evidence that disconfirms the theory X equal to confirming ~X? Is ~X ineligible to be considered a theory?

Everything in that quote applies just as much to disconfirming a theory as it does to confirming a theory. Conservation of expected evidence means that you cannot legitimately expect your confidence in a theory to go *down* either.

The hyperlink "An Intuitive Explanation of Bayesian Reasoning" is broken. The current location of that essay is here: http://yudkowsky.net/rational/bayes

**[deleted]**· 2015-11-09T11:08:41.102Z · score: 0 (0 votes) · LW · GW

Mantel cox log rank tests compare observations and expectations too...

Can someone tell me if I understand this correctly : He is saying that we must be clear before hand what constitutes evidence for and what constitutes evidence against and what doesn't constitute evidence either way?

Because in his examples it seems that what is being changed is what counts as evidence. It seems that no matter what transpires (in the witch trials for example) it is counted as evidence for. This is not the same as changing the hypothesis to fit the facts. The hypothesis was always 'she's a witch'. Then the evidence is interpreted as supportive of the hypothesis no matter what.

You don't necessarily have to figure it out beforehand (though it's certainly harder to fool yourself if you do). But if X is evidence for Y then not-X has to be evidence for not-Y.

And yes, one thing that's going wrong in those witch trials is that both X and not-X are being treated as evidence for Y, which can't possibly be correct. (And the way in which it's going wrong is that the prosecutor correctly observes that Y *could produce* X or not-X, whichever of the two actually happened to turn up, and fails to distinguish between that and showing that Y is *more likely to produce* that outcome than not-Y, which is what would actually make the evidence go in the claimed direction.)

This is not the same as changing the hypothesis to fit the facts.

Did anyone say it is? I'm not seeing where.

Hi, new here.

I was wondering if I've interpreted this correctly:

'For a true Bayesian, it is impossible to seek evidence that confirms a theory. There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (on average) than before. You can only ever seek evidence to test a theory, not to confirm it.'

Does this mean that it is impossible to prove the truth of a theory? Because the only evidence that can exist is evidence that falsifies the theory, or supports it?

For example, something people know about gravity and objects under it's influence, is that on Earth objects will accelerate at something like 9.81ms^-2. If we dropped a thousand different objects and observed their acceleration, and found it to be 9.81ms^-2, we would have a thousand pieces of evidence supporting the theory, and zero pieces to falsify the theory. We all believe that 9.81 is correct, and we teach that it is the truth, but we can never really know, because new evidence could someday appear that challenges the theory, correct?

Thanks

It is correct that we can never find enough evidence to make our certainty of a theory to be exactly 1 (though we can get it very close to 1). If we were absolutely certain in a theory, then *no* amount of counterevidence, no matter how damning, could ever change our mind.

"For a true Bayesian, it is impossible to seek evidence that confirms a theory"

The important part of the sentence here is *seek*. The isn't about falsificationism, but the fact that no experiment you can do can confirm a theory without having some chance of falsifying it too. So any observation can only provide evidence for a hypothesis if a different outcome could have provided the opposite evidence.

For instance, suppose that you flip a coin. You can seek to *test* the theory that the result was `HEADS`

, by simply looking at the coin with your eyes. There's a 50% chance that the outcome of this test would be "you see the `HEADS`

side", confirming your theory (`p(HEADS | you see HEADS) ~ 1`

). But this only works because there's also a 50% chance that the outcome of the test would have shown the result to be `TAILS`

, falsifying your theory (`P(HEADS | you see TAILS) ~ 0`

). And in fact there's no way to measure the coin so that one outcome would be evidence in favour of `HEADS`

(`P(HEADS | measurement) > 0.5`

), without the opposite result being evidence against `HEADS`

(`P(HEADS | ¬measurement) < 0.5`

).

Closely related is the law of total expectation: https://en.wikipedia.org/wiki/Law_of_total_expectation

It states that E[E[X|Y]]=E[X].

I do not understand the validity of this statement:

There is no possible plan you can devise, no clever strategy, no cunning device, by which you can legitimately expect your confidence in a fixed proposition to be higher (onaverage) than before.

Given a temporal proposition A among a set of other mututally exclusive temporal propositions {A, B, C...}, demonstrating B, C, and other candidates do not meet the evidence so far *while* A meets the evidence so far does raise our confidence in the proposition *continuing to hold*. This is standard Bayesian inference applied to temporal statements.

For example, we have higher confidence in the statement "the sun will come up tomorrow" than the statement "the sun will not come up tomorrow", because the sun *has *come up in the past, whereas it has *not *not come up comparably fewer times. We have relied on the prior distribution to make confident statements about the result of an impending experiment, and can constrain our confidence using the number of prior experiments that conform to it - further, every new experiment that confirms "the sun will come up" makes it harder to argue that "the sun will not come up" because the latter statement now has to explain *why* it failed to apply in the prior cases as well as why it will work now.

It would seem quantifying the prior distribution against a set of mutually-exclusive statements thus *is* a valid strategy for raising confidence in a specific statement.

Maybe I'm misinterpreting what "fixed proposition" means here or am missing something more fundamental?