Bayesianism for Humans

chrishallquist

Bayesianism for Humans

post by ChrisHallquist · 2013-10-29T23:54:14.890Z · LW · GW · Legacy · 37 comments

37 comments

Recently, I completed my first systematic read-through of the sequences. One of the biggest effects this had on me was considerably warming my attitude towards Bayesianism. Not long ago, if you'd asked me my opinion of Bayesianism, I'd probably have said something like, "Bayes' theorem is all well and good when you know what numbers to plug in, but all too often you don't."

Now I realize that that objection is based on a misunderstanding of Bayesianism, or at least Bayesianism-as-advocated-by-Eliezer-Yudkowsky. "When (Not) To Use Probabilities" is all about this issue, but a cleaner expression of Eliezer's true view may be this quote from "Beautiful Probability":

No, you can't always do the exact Bayesian calculation for a problem. Sometimes you must seek an approximation; often, indeed. This doesn't mean that probability theory has ceased to apply, any more than your inability to calculate the aerodynamics of a 747 on an atom-by-atom basis implies that the 747 is not made out of atoms. Whatever approximation you use, it works to the extent that it approximates the ideal Bayesian calculation - and fails to the extent that it departs.

The practical upshot of seeing Bayesianism as an ideal to be approximated, I think, is this: you should avoid engaging in any reasoning that's demonstrably nonsensical in Bayesian terms. Furthermore, Bayesian reasoning can be fruitfully mined for heuristics that are useful in the real world. That's an idea that actually has real-world applications for human beings, hence the title of this post, "Bayesianism for Humans."

Here's my attempt to make an initial list of more directly applicable corollaries to Bayesianism. Many of these corollaries are non-obvious, yet eminently sensible once you think about them, which I think makes for a far better argument for Bayesianism than Dutch Book-type arguments with little real-world relevance. Most (but not all) of the links are to posts within the sequences, which hopefully will allow this post to double as a decent introductory guide to the parts of the sequences that explain Bayesianism.

Watch out for base rate neglect. It's why even experts screw up in one of the standard problems used to explain Bayes' Theorem. Even when you don't know what the base rate is, there are times when you ought to expect it to be low, particularly if you're trying to detect a rare phenomenon like a new disease or IDing terrorists.
Absence of Evidence is Evidence of Absence. If observing E would increase the probability of H, observing not-E should decrease the probability of H. E and not-E sure as hell shouldn't both increase the probability of H.
Relatedly, there's Conservation of Expected Evidence: roughly, if you think more evidence would probably increase your confidence in a belief, you should think there's small chance it would cause a larger change in the opposite direction.
Conservation of expected evidence means that a rational person can't seek to confirm their beliefs, only to test them. If your expectation of how a test will affect your belief violates conservation of expected evidence, you should update your beliefs now based on how you expect the test to turn out.
Also related closely related to the above, when it comes to gathering evidence, "If you know your destination, you are already there." Evidence gathered through a biased method designed to turn out one way is worthless.
On the other hand, you can't dismiss a hypothesis due to a lack of a particular piece of evidence that you wouldn't expect to have even if the hypothesis were true.
Even when an argument on one side is overcome by a stronger argument on the other side, you still need to take the first argument into account when assigning confidence to your belief, lest you gradually dismiss each piece of evidence on the other side because no piece is (individually) as strong as the one piece of evidence on your side.
Burdensome Details: Every detail added to a claim makes it less probable.
Reversed Stupidity Is Not Intelligence: If you would expect to see flying saucer cults regardless of whether or not extraterrestrials were visiting us, flying saucer cults are not evidence against extraterrestrials.
Don't get caught up in arguing about definitions when you should be looking at what's actually indicative of what. (I take it that that's the Bayesian take-away from the sequence on words, though Eliezer doesn't quite put it that way.)
Rationality and the rules of Science are not the same thing. The latter are social rules designed to make science work in spite of the irrationality of its practitioners. They're not the same as the rules of rationality an ideal reasoner would follow.
An example of the science vs. rationality issue: to an ideal reasoner, successful retrospective predictions are as valuable as prospective predictions. (To us non-ideal reasoners, prospective predictions are can be extra valuable as protection against fooling ourselves, but we still shouldn't discount retrospective predictions entirely.)
This is also reason to be careful about dismissing evo psych claims as just-so stories.
Another example: contrary to old-fashioned statistical procedure, a researcher's state of mind shouldn't affect the significance of their results.
Last example is from a post of my own: Expert opinion should be discounted when the expert's opinions could be predicted solely from information not relevant to the truth of the claims. But when the state of expert opinion surprises you, beware discounting their opinions just because you can think of some explanation for why they'd be wrong.

37 comments

Comments sorted by top scores.

comment by [deleted] · 2013-10-29T19:20:50.498Z · LW(p) · GW(p)

You should expect that, on average, a test will leave your beliefs unchanged.

Not quite. You do a test because you expect your beliefs to change. A better phrasing is "You should not expect that a test will move your beliefs in any particular direction." Of course this doesn't capture the theorem that "prior = expected posterior", but that is very hard to communicate accurately in English without referring directly to probability theory concepts. At least strive for not having alternate interpretations that are wrong.

I would add There are two kinds of "no evidence". There's "no evidence for X" because there's no evidence either way because X hasn't been tested, and "no evidence for X" where it's been tested and all the evidence points to not-X. People often use the first kind of "no evidence" as if had the same force as the second. This is totally obvious under Bayesianism, but not widely understood among the scientifically literate.

Replies from: Stabilizer

↑ comment by Stabilizer · 2013-11-08T05:26:08.983Z · LW(p) · GW(p)

See this for another example of confusion between the two kinds of "no evidence". I summarize:

People think that Cognitive Behavioral Therapy (CBT) is better than psychodynamic/Freudian therapies. This is because CBT has been tested to be better than placebo, but Freudian therapies have not been tested at all; mainly due to historical reasons. Of course, the fact that psychodynamic therapies have not been tested and therefore have no evidence in their favor, isn't evidence against psychodynamic therapies. They simply have no evidence either way.

And when they went out tested CBT, psychodynamic therapies and placebo, they found that CBT and psychodynamics were about equally better than placebo.

comment by David_Gerard · 2013-10-29T08:44:50.222Z · LW(p) · GW(p)

Richard Carrier's Proving History has a couple of chapters of worked examples of applying Bayesian considerations to real-life historical arguments.

Replies from: Jayson_Virissimo

↑ comment by Jayson_Virissimo · 2013-10-30T16:44:26.773Z · LW(p) · GW(p)

Are they good examples?

Replies from: David_Gerard

↑ comment by David_Gerard · 2013-10-31T12:01:46.178Z · LW(p) · GW(p)

They worked for me, showed me how I could usefully apply this stuff qualitatively (comparisons) without working out the actual numbers.

The actual examples are the question of the historical Jesus. You could say that's an inherently controversial example, therefore bad. However, the question is then "compared to what?" If there is a better set of worked examples, then please present it.

comment by [deleted] · 2013-10-30T18:26:09.858Z · LW(p) · GW(p)

Bayesianism was recently an important boon for me, and the credit belongs entirely to LW. My newborn son has several warning signs for a genetic disease called NF-1. Almost everybody in my family panicked, but I was able to calm myself, and my family members down by pointing out that even if these signs rarely appear in those without the disease, nevertheless the disease is rare enough that it was still quite unlikely that he was afflicted. This in part helped to prevent my son from getting an expensive and painful genetic test. We've since talked to a doctor who assured us that he is very unlikely to have NF-1.

And I didn't do any fine grained math. Bayesianism just led me to be aware of the question of the incidence rate of the disease as a factor.

Replies from: Lumifer

↑ comment by Lumifer · 2013-10-30T19:06:28.748Z · LW(p) · GW(p)

an expensive and painful genetic test

I thought nowadays genetic testing is completely not painful (requiring a cheek swab at most) and relatively inexpensive. Is that not so?

Replies from: gattsuru

↑ comment by gattsuru · 2013-10-30T20:07:48.070Z · LW(p) · GW(p)

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Replies from: ChristianKl

↑ comment by ChristianKl · 2013-11-01T08:24:45.510Z · LW(p) · GW(p)

Common NF-1 genetic tests require blood samples or cultured biopsy cells, rather than buccal swabs, and can cost over a thousand US dollars. The large size of the gene, and its fairly unusual expression methods, seem to leave that as the preferred tool in the medical literature, even for future technique discovery.

Why can't you just target a SNP? Why does the size of the gene matter?

Replies from: None

↑ comment by [deleted] · 2013-11-01T10:44:12.471Z · LW(p) · GW(p)

A large gene is a large target for spontaneous mutation. Most people with the disease did not inherit it but instead had something inside the large gene go wrong between their parents and them. For the spontaneous mutations, you likely have never seen that particular difference before.

You also have no idea where in the gene the problem could be and you just ned to sequence the thing. With current sequencing technology you basically need to either throw the entire genome into an Illumina sequencer for many thousands of dollars, or do a number of small custom Sanger sequencing reactions which read you out about 600-800 specific base pairs at a time which are individually not that expensive or difficult but can add up when you need to tile them over a large area. Seeing as the gene is 350 kilobases, in this case it adds up both in terms of cost and in terms of source DNA you need.

SNPs are only useful when there is one or a few ancestral mutant alleles that have spread through the population and in which you can either look for one known causitive change, or a nearby unique SNP that gets dragged along for the ride with the disease allele because it is quite close to it.

EDIT to clear up some questions from a few layers up in the chain: These days looking for known, relatively common genetic variants is very easy, as the success of 23andme illustrates. These tests use microarrays to look for SNPs - this process does not involve sequencing though, but instead only tests the sequence similarity (via binding affinity) of a sample to a set of short reference strands. In order to identify a particular allele with this technique though it needs to have been detected in previous work. The only way to confidently figure out rare or unique variants is to outright sequence and that gets expensive for regions larger than a few kilobases. And hilariously enough, due to the multiple forms of sequencing technology that exist if you need to sequence an area larger than a megabase or two it becomes cheaper to just sequence the entire genome.

comment by Peter Wildeford (peter_hurford) · 2013-10-29T03:12:29.814Z · LW(p) · GW(p)

Recently, I completed my first systematic read-through of the sequences.

What was your methodology for the read-through? How much time did it take? Was it worth the time investment?

Replies from: ChrisHallquist

↑ comment by ChrisHallquist · 2013-10-29T04:41:57.154Z · LW(p) · GW(p)

I loaded the mobi version on to my Kindle and reading it at every spare moment. (I get a lot of reading done by taking out my Kindle at every spare moment.) I didn't have a more sophisticated "methodology" than that. A substantial percentage, maybe half of it, ended up getting read on a long weekend camping trip, when I was without other electronics to distract me. I'd estimate it took ~60 hours of actual time spent reading total, though I don't really know. And yeah I'd say it was worth it.

Replies from: Benito

↑ comment by Ben Pace (Benito) · 2013-10-29T06:31:45.550Z · LW(p) · GW(p)

Just to check, do you mean the sequences, or the complete blog posts? The latter took me flipping ages to get through...

Replies from: ChrisHallquist

↑ comment by ChrisHallquist · 2013-10-29T17:01:37.835Z · LW(p) · GW(p)

Yup, the complete blog posts.

comment by bartimaeus · 2013-10-29T16:33:47.883Z · LW(p) · GW(p)

The post What Bayesianism Taught Me is similar to this one; your post has some elements that that one doesn't have, and that one has a few that you don't have. Combining the two, you end up with quite a nice list.

Replies from: ChrisHallquist

↑ comment by ChrisHallquist · 2013-10-29T16:58:20.711Z · LW(p) · GW(p)

I want to like that post, because the formatting is so much tidier than the formatting on my post, but I actually disagree with the first two points. I'm in favor of just rolling with the fact that "Bayesian evidence" isn't what we ordinarily mean by "evidence," as useful as the former is. Also, Eliezer's "I don't know" post misses the pragmatics of saying, "I don't know"; we say "I don't know" if we don't have any information the other person is going to care about (the other person usually won't care that there are 10-1000 apples in a tree outside).

Replies from: Tyrrell_McAllister, bartimaeus

↑ comment by Tyrrell_McAllister · 2013-10-29T18:26:47.624Z · LW(p) · GW(p)

The problem isn't with "I don't know", but with "I don't know anything about that." I agree that "I don't know" is useful.

↑ comment by bartimaeus · 2013-10-29T18:19:56.822Z · LW(p) · GW(p)

That's true, those points ignore the pragmatics of a social situation in which you use the phrase "I don't know" or "There's no evidence for that". But if you put yourself in the shoes of the boss instead of the employee (in the example given in "I don't know"), where even if you have "no information" you still have to make a decision, then remembering that you probably DO know something that can at least give you an indication of what to do, is useful.

The points are also useful when the discussion is with a rationalist.

comment by b1shop · 2013-10-31T15:27:36.877Z · LW(p) · GW(p)

This was a great post. I'll use it to introduce people to key concepts in the future.

Many of these focus on the posterior's first moment. For continuous distributions, the higher moments matter, too. A test that I expected to lower the variance in my posterior would be considered "confirming" as I use the word. I can't lower the variance before the test is done because it's still possible the mean will change.

comment by homunq · 2013-10-30T08:19:27.239Z · LW(p) · GW(p)

Even for an ideal reasoner, successful retrospective predictions clearly do not play the same role as prospective predictions. The former must inevitably be part of locating the hypothesis; they thus play a weaker role in confirming it. Eliezer's story you link to is about how the "traditional science" dictum about not using retrospective predictions can be just reversed stupidity; but just reversing young Eliezer's stupidity in the story one more time doesn't yield intelligence.

Edit: this comment has been downvoted, and in considering why that may be, I think there's ambiguities in both "ideal reasoner" and "play the same role". Yes, the value of evidence does not change depending on when a hypothesis was first articulated, so some limitless entity that was capable of simultaneously evaluating all possible hypotheses would not care. However, a perfectly rational but finite reasoner could reasonably consider some amount old evidence to have been "used up" in selecting the hypothesis from an implicit background of alternative hypotheses, without having to enumerate all of those alternatives; and thus habitually avoid recounting a certain amount of retrospective evidence. Any "successful prediction" would presumably be by a hypothesis that had already passed this threshold (otherwise it's just called a "lucky wild-ass guess"). I'm speaking in simple heuristic terms here, but this could be made more rigorous and numeric, up to and including a superhuman level I'd consider "ideal".

comment by V_V · 2013-10-29T11:31:11.556Z · LW(p) · GW(p)

Relatedly, there's Conservation of Expected Evidence. A rational person can't seek to confirm their beliefs, only to test them. You should expect that, on average, a test will leave your beliefs unchanged. If not, you should update your beliefs now based on how you expect the test to turn out.

This appears to be wrong:
Shake a box containing a coin. What is your belief that the coin landed heads? 50% . Will your belief change if you open the box and look inside it? Sure it will.

Replies from: ygert, Richard_Kennaway, IlyaShpitser, ChrisHallquist, shminux

↑ comment by ygert · 2013-10-29T11:55:47.123Z · LW(p) · GW(p)

You should expect that, on average, a test will leave your beliefs unchanged.

Emphasis mine.

When I shake the box, my belief that the coin landed heads is 50%. When I look inside, my belief changes, yes, but two one of two options of equal probability: 0% (I see it came out tails), or 100% (I see it came out heads.)

It is trivial to see that my expected posterior belief is 0% 1/2 + 100% 1/2 = 50%, or in other words, it's exactly equal to my prior belief.

Replies from: twanvl, V_V

↑ comment by twanvl · 2013-10-30T13:30:16.514Z · LW(p) · GW(p)

The question is whether 'change' signifies only a magnitude or also a direction. The average magnitude of the change in belief when doing an experiment is larger than zero. But the average of change as vector quantity, indicating the difference between belief after and before the test, is zero.

If you drive your car to work and back, then the average velocity of your trip is 0, but the average speed is positive.

↑ comment by V_V · 2013-10-29T13:22:26.693Z · LW(p) · GW(p)

You should expect that, on average, a test will leave your beliefs unchanged.

Emphasis mine.

The statement is still wrong:
Opening the box always changes your beliefs, therefore, it also changes your beliefs on average.

The correct version of this statement is "your belief over the beliefs that you will have after performing a test must be equivalent to your current belief", which seems to be a trivial claim.

Replies from: army1987, AnthonyC, aspera

↑ comment by A1987dM (army1987) · 2013-10-30T11:22:48.307Z · LW(p) · GW(p)

The correct version of this statement is "your belief over the beliefs that you will have after performing a test must be equivalent to your current belief", which seems to be a trivial claim.

It may seem trivial but then again so does the claim that P(A and B) <= P(A), and still...

In particular, I've sometimes caught myself simultaneously having aliefs like ‘if she flees, then she must be a witch’, ‘if she stays, then she must be a witch’, and ‘she may or may not be a witch, and I can't know until I see whether she flees or stays’, and until I read the post about conservation of expected evidence I never realized there was something wrong with that.

↑ comment by AnthonyC · 2013-10-29T14:04:09.727Z · LW(p) · GW(p)

The statement, "You should expect that, on average, a test will leave your beliefs unchanged," means that you cannot expect an unbiased test to change you beliefs in a particular direction, as is clear from the original post.

Of course you expect to hold different beliefs after the test. If you didn't, the test would not be worth doing. But you are not more likely to end up at (100% heads, 0% tails) than (0% heads, 100% tails).

On the other hand, if you think it is more likely that you will end up at, say, (0% heads, 100% tails), then you cannot rightly claim that you currently believe the coin to be fair (your 50%, 50% estimate does not reflect your true expectations).

Replies from: TheOtherDave, Jiro, V_V

↑ comment by TheOtherDave · 2013-10-29T14:14:37.207Z · LW(p) · GW(p)

That said, it's far from the most easily accessible formulation of that meaning imaginable.

I mean, sure, the future state in which half of my measure has ~1 confidence in "heads" and half my measure has ~0 confidence in "heads" is in some sense not a change from my current state where I have .5 confidence in "heads", but that's not the interpretation most people will adopt of "leave your beliefs unchanged."

It seems more accessible to say that if I expect a test to update my beliefs in a particular direction, I should go ahead and update my beliefs in that direction now (and perform the test as confirmation).

Of course, this advice presumes that I won't anchor on my new belief. Which, given that I'm human, is not a safe assumption.

↑ comment by Jiro · 2013-10-29T14:57:34.710Z · LW(p) · GW(p)

I would suggest that you expect your beliefs to be changed in 100% of cases. Currently, you believe in a 50% probability. After doing the tests, we have a set of universes, some of in which you believe a 100% probability and some of in which you believe a 0% probability. Your belief changed in every single one.

X and Y can be averaged out, but belief in number X and belief in number Y don't average out to be "belief in the average of X and Y".

↑ comment by V_V · 2013-11-01T13:15:11.477Z · LW(p) · GW(p)

The statement, "You should expect that, on average, a test will leave your beliefs unchanged," means that you cannot expect an unbiased test to change you beliefs in a particular direction, as is clear from the original post.

Actually you can:
Shake a box with a coin you know to be biased. Before you look into the box, your belief for heads is, say, 80%. You expect that is more likely that, when you open the box, your belief will change to 100% heads rather than 0%.

I don't think there is an useful way to patch the statement without making explicit reference to the technical definition of Bayesian belief.

↑ comment by aspera · 2013-10-29T15:58:56.631Z · LW(p) · GW(p)

I agree that the statement is not crystal clear. It makes it possible to confuse the (change in the average) with the (average of the change).

Mathematically speaking, we represent our beliefs as a probability distribution on the possible outcomes, and change it upon seeing the result of a test (possibly for every outcome). The statement is that “if we average the possible posterior probability distributions weighted by how likely they are, we will end up with our original probability distribution.”

If that were not the case, it would imply that we were failing to make use of all of the prior information we have in our original distribution.

A misunderstood reading of the statement is that “the average of the absolute change in the probability distribution on measurement is zero.” This is not the case, as you rightly point out. It would imply that we expect the test to yield no information.

↑ comment by Richard_Kennaway · 2013-11-01T14:25:52.062Z · LW(p) · GW(p)

The thread descending from this comment exemplifies a pit that is easy to fall into when reading an informal moral drawn from a precise mathematical result: mistaking the former for the latter, and arguing about the former instead of going to the latter. The whole nugatory discussion would be avoided had people gone back to the original mathematics, which is not deep, and is given in one of the Sequence posts the OP linked to.

This mathematics, which is simple and straightforward, but not a complete triviality, says precisely what is meant by the informal phrase, "Conservation of Expected Evidence", and provides an immediate answer to questions such as "but making an observation will change your belief, so you can expect your belief to change!", or "but what about a lottery ticket, you expect that to lose, don't you?"

There's no point in basing an argument on secondary sources when the primary source is right there.

Replies from: V_V

↑ comment by V_V · 2013-11-01T15:00:06.904Z · LW(p) · GW(p)

I think the problem is that people tend to derive incorrect, or at least misleading, informal beliefs from the correct math.

↑ comment by IlyaShpitser · 2013-11-01T14:35:38.401Z · LW(p) · GW(p)

http://en.wikipedia.org/wiki/Law_of_total_expectation

Expectation of your belief E(X) is not the same as your belief X.

↑ comment by ChrisHallquist · 2013-10-29T17:00:10.478Z · LW(p) · GW(p)

There's a sense in which what I said is true (see ygert's comment), but I agree it's confusing. Suggested re-word? Or maybe I should just cut that point.

Replies from: Lumifer, Vaniver

↑ comment by Lumifer · 2013-10-29T17:25:18.789Z · LW(p) · GW(p)

I think that problem is in the sentence

You should expect that, on average, a test will leave your beliefs unchanged.

That happens to be not true. A test which ouputs useful information WILL change your beliefs. Especially given point 2, one can say "Any informative test will always change your beliefs".

What's tricky here is expectation. You expect your beliefs to change but you don't know in which direction. So your expectation is for zero change even though you know that you'll get some non-zero change.

This looks paradoxical, but is the entirely standard way in which statistics (in particular random variables) operate. Consider a toss of a fair coin. The expectation is half heads half tails which is guaranteed not to happen. You know you'll get either heads or tail but not which one of those two. The expectation will not match the outcome -- all it can do is be equidistant (appropriately weighted) from all possible outcomes.

↑ comment by Vaniver · 2013-10-29T19:40:07.188Z · LW(p) · GW(p)

I might go with:

Your expectation of the possible beliefs you could have after seeing the test results should match your current belief.

Another option is to try to illustrate both CoEE and Beliefs Pay Rent in Anticipated Experiences at the same time, since I think failing BPRiAE demonstrates an easy way to fail CoEE.

↑ comment by Shmi (shminux) · 2013-10-29T16:20:07.481Z · LW(p) · GW(p)

All the passage says is that if you believe the coin is unbiased, then you expect to see a roughly 50-50 split between heads and tails. If you expect to see 70:30 split of heads:tails, you ought to believe that the coin is so biased before you do the experiment. It looks trivial when applied to coins, but less so in other contexts. This is a statement about priors, not posteriors, hence the term "expectation". In Eliezer's example, if you are p% confident that an accused is a witch, then you should expect a definitive witch test to exonerate the accused (100-p)% of the time. If any outcome "confirms witchiness", then the test in question is not a test of witchiness.

Bayesianism for Humans

Contents

37 comments