What's the big deal about Bayes' Theorem?

avoropaev

What's the big deal about Bayes' Theorem?

post by AVoropaev · 2021-01-26T06:08:45.188Z · LW · GW · 1 comment

This is a question post.

  Answers
    16 Vladimir_Nesov
    5 Darren
    4 Carlos Javier Gil Bellosta
    3 AnthonyC
    3 DaveEtCircenses
None
1 comment

I guess that this kind of question gets asked (and answered) a lot. But I've tried to read a few posts here about Bayes' Theorem and they seem to talk about slightly different things than the question that is bothering me. Maybe I should've read a few more, but since I'm also interested in how people use this theorem in their everyday life, I've decided to ask the question anyway.

Bayes' Theorem is a (not very) special case of this nameless theorem:

If D, E and F are mutually-exclusive events with non-zero probabilities d, e and f respectively, then $\frac{d}{d + e} = \frac{\frac{d}{d + f} \times (d + f)}{d + e}$ .

Which is true because that's how real numbers work. To translate this theorem into more familiar form, you can simply replace D with $A \land B$ , E with $\neg A \land B$ and F with $A \land \neg B$ and look up the definition of $P (A | B)$ , which is $\frac{P (A \land B)}{P (A \land B) + P (\neg A \land B)}$ .

You might notice that this theorem is not exactly hard to prove. It should probably be obvious to anybody who understood the definition of probability space in university. You don't need to understand probability spaces that well either -- you can boil down (losing some generality in the process) everything to this theorem:

If D, E and F are non-intersecting figures with non-zero areas d, e and f, drawn inside rectangle with area 1, then $\frac{d}{d + e} = \frac{\frac{d}{d + f} \times (d + f)}{d + e}$ .

You might think of the rectangle as a target you are shooting and of D, E and F as some of the places on the target your bullet can hit.
And you can boil it down even further, losing almost all of the generality, but keeping most of the applicability to real-life scenarios.

If D, E and F are non-empty sets with d, e and f elements respectively, then $\frac{d}{d + e} = \frac{\frac{d}{d + f} \times (d + f)}{d + e}$ .

Okay, so I think that Bayes' Theorem is very simple. Do I think that it is useless? Not at all -- it is used all the time. Perhaps, it is used all the time in part because it is so simple. So we have a mathematical concept that is easy to use, easy to understand (if you know a bit about probabilities) and there are many cases, when it gives counterintuitive (but correct) results. So why I am not happy with declaring it the Best Thing in the World? Well, if you put it that way, maybe I am. But there still are many other mathematical concepts that fit the bill.

For example, the Bayes' Theorem tells us something about conditional probabilities. There is a related concept of independent events. Basically, A and B are independent iff probability of A happening does not change wherever B happens or not. $P (A) = P (A | B)$ . (The definition used in math is a bit different because of the trade-off between generality and clarity. I used less general but easier to understand version.) For example it is (probably) true for A="6 comes up on a d6 roll" and B="it is raining" and is not true for A="6 comes up on a d6 roll" and B="even number comes up on the same roll". There are a lot of questions about independence with somewhat counterintuitive answers. For example:

A and B are not independent. B and C are also not independent. Are A and C necessarily not independent?
In 90% of cases where A happens, B does not. In 90% of cases where B happens, A does not. Does it means that A and B are not independent?

Even more important than probability is logic (at leas, in my opinion). And people often make mistakes in basic logic. For example, implication. $A \Rightarrow B$ . If A is true then B is also true. People make all kinds of mistakes with it. I often give this problem to students when I give logic mini-course:

My cat sneezes every day when it's going to rain. It sneezed. Does it mean that it is going to rain today?

Some students tried to assure me that yes, the first statement about my cat implies that it is some kind of psychic. A similar logical mistake in real life situation could probably convince somebody of existence of supernatural powers of some kind. Or of some other dumb thing.

What I'm trying to say is, while some perfect rational human we strive to be should understand Bayes' Theorem really well and use it when appropriate, he also should understand a lot of other thing really well and use them when appropriate. And the first half of my question is "What makes this theorem so special compared to other very useful things?". If you think it was already covered in one of the posts here, please give me a link.

The over half of my question is "How exactly do you use this theorem in your life?". Because it seems to me that is is really hard to do that. If you are doing some serious research, you probably can obtain a lot of statistical data and sometimes you can obtain things like $P (A), P (B)$ and $P (B | A)$ , use theorem and it would give you $P (A | B)$ . But if you try to quickly use this theorem in your daily life, you likely won't know at least some of the $P (A), P (B)$ and $P (B | A) .$ So you would probably guess them? Even worse, at least some of them are probably going to be very small. And while humans are relatively good at distinguishing probability 50% from probability 0,5%, they are in many cases awful at distinguishing probabilities like 0,005% and 0,00005%. A wrong choice could leave you with $P (A | B) = 70 %$ instead of $P (A | B) = 0, 7 %$ .

So the over half of my question is this. If you use Bayes' Theorem in your daily life, how do you try to avoid making mistakes? If you only use it in at least somewhat serious research, do you really find it that useful compared to all over statements in probability theory?

Answers

answer by Vladimir_Nesov · 2021-01-27T07:22:49.719Z · LW(p) · GW(p)

In daily life, the basic move is to ask,

What are the straightforward and the alternative explanations (hypotheses) for what I'm seeing?
How much more likely is one compared to the other a priori (when ignoring what I'm seeing)?
What probabilities do they assign to what I'm seeing?

and get the ratio of a posteriori probabilities of the hypotheses (a posteriori odds) from that (by multiplying the a priori odds by the likelihood ratio). Odds measure the relative strengths of hypotheses, so the move is to obtain relative strengths of a pair of hypotheses after observing a piece of data, with the choice of hypotheses inspired by the same piece of data. This is a very easy computation that becomes a habit and routinely fixes intuitive misestimates. Usually it's about explanation of data/claim as correct vs. constructed by a specific sloppy process that doesn't ensure correctness.

That is, for the data/claim you are observing, and $x$ and $y$ hypotheses chosen as possible explanations for $D$ , $\frac{P (x | D)}{P (y | D)} = \frac{P (x)}{P (y)} \times \frac{P (D | x)}{P (D | y)} .$ This holds for any choice of two hypotheses $x$ and $y$ , which don't have to be mutually exclusive or exhaust all possibilities, and there may be many other plausible hypotheses.

↑ comment by AVoropaev · 2021-01-28T02:06:15.872Z · LW(p) · GW(p)

This formula is not Bayes' Theorem, but it is a similar simple formula from probability theory, so I'm still interested in how you can use it in daily life.

Writing P(x|D) implies that x and D are the same kind of object (data about some physical process?) and there are probably a lot of subtle problems in defining hypothesis as a "set of things that happen if it is true" (especially if you want to have hypotheses that involve probabilities).

Use of this formula allows you to update probabilities you prescribe to hypotheses, but it is not obvious that update will make them better. I mean, you obviously don't know real P(x)/P(y), so you'll input incorrect value and get incorrect answer. But it will sometimes be less incorrect. If this algorithm has some nice properties like "sequence of P(x)/P(y) you get repeating your experiment converges to the real P(x)/P(y) provided x and y are falsifiable by your experiment (or something like that)", then by using this algorithm you'll with high probability eventually update your algorithm. It would be nice to understand, for what kinds of x, y and D you should be at least 90% sure that your P(x)/P(y) will be more correct after a million of experiments.

I'm not implying that this algorithm doesn't work. More like it seems that proving that it works is beyond me. Mostly because statistics is one of the more glaring holes in my mathematical education. I hope that somebody has proved that it works at least in the cases you are likely to encounter in your daily life. Maybe it is even a well-known result.

Speaking of the daily life, can you tell me how people (and you specifically) actually apply this algorithm? How do you decide, in which situation it is worth to use it? How do you choose initial values of P(x) (e.g. it is hard for me to translate "x is probably true" into "I am 73% sure that x is true"). Are there some other important questions I should be asking about it?

Replies from: Vladimir_Nesov, Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2021-01-30T06:13:03.251Z · LW(p) · GW(p)

The above formula is usually called "odds form of Bayes formula". We get the standard form by letting $y = D$ in the odds form, and we get the odds form from the standard form by dividing it by itself for two hypotheses ( $P (D)$ cancels out).

The serious problem with the standard form of Bayes is the $P (D)$ term, which is usually hard to estimate (as we don't get to choose what $D$ is). We can try to get rid of it by expanding $P (D) = P (D | x) P (x) + P (D | \neg x) P (\neg x),$ but that's also no good, because now we need to know $P (D | \neg x)$ . One way to state the problem with this is to say that a hypothesis for given observations is a description of a situation that makes it possible to estimate the probability of those observations. That is, $x$ is a hypothesis for $D$ if it's possible to get a good estimate of $P (D | x)$ . To evaluate an observation, we should look for hypotheses that let us estimate that conditional probability; we do get to choose what to use as hypotheses. So the problem here is that if $x$ is a hypothesis for $D$ , it doesn't follow that $\neg x$ is a hypothesis for $D$ or for anything else of interest. The negation of a hypothesis is not necessarily a hypothesis. That is why it defeats some of the purpose of moving over to using the odds form of Bayes if we let $y = \neg x$ , as it's sometimes written.

↑ comment by Vladimir_Nesov · 2021-02-01T08:13:30.210Z · LW(p) · GW(p)

Here's an example [LW(p) · GW(p)] of applying the formula (to a puzzle).

answer by Darren · 2021-01-26T09:35:23.472Z · LW(p) · GW(p)

As you've noted, Bayes' Theorem is just a straight forward result of probability calculus. In that light, it is entirely uncontroversial.

What people really seem to get excited about is Bayesianism, which is something more than just the application of Bayes' Theorem.

To understand people's interest in Bayesianism, I think you then need to distinguish its use in two types of applications: how we use probabilities to deal with uncertainty when drawing inferences from data generated by scientific studies (i.e. statistical inference); and whether humans reason/learn, or should reason/learn, in a Bayesian manner.

The latter would be well outside my own expertise, but I once got a fair number of interesting responses to this question from people that would know better than I. Regarding its use in statistical inference, Bayesianism is similarly controversial, and the many controversies are the subject of hundreds and thousands of papers and books.

↑ comment by [deleted] · 2021-01-28T08:44:02.899Z · LW(p) · GW(p)

You might find this reference useful: Bayesian Epistemology.

Personal view: if you think you're capable of forming reasonable priors, you're "probably" a Bayesian.

↑ comment by TAG · 2021-01-30T16:59:24.857Z · LW(p) · GW(p)

Yes, Bayesianism is more than one thing. (BEIMTOT)

Theres a plausible version of Bayes, which isn't very exciting, the update rule.

And an exciting version, Bayes as a complete system of epistemology, which isnt very plausible. In particular, it isnt able to answer questions like "what is evidence?" and 'where do hypotheses come from?" ... leaving most of the vexing questions you would want a complete system of epistemology to solve, unsolved.

So you have all the ingredients for motte-and-bailey confusions -- two things that come in exciting but implausible and plausible but boring versions, and they're called by the same name.

answer by Carlos Javier Gil Bellosta · 2021-01-27T08:15:51.361Z · LW(p) · GW(p)

The link between probability theory and logic is most opportune. There is a whole ---albeit not very popular--- branch of probability theory and statistics that goes by various names (e.g., logical inference) that considers probability an extension of logic in the sense that it helps us come up with adequate solutions to problems whose premises are not stated in terms of true or false but are assigned some credibility. Names associated with this school of though are Keynes (because of his 1921 book on probability theory), Carnap and, more recently, Ian Hacking.

In this program, of course, Bayes Theorem plays an important role. If you consider logic more important than probability is, think again how many times do you work with premises that are 100% true or false and whether it would make sense to extend it to allow for plausibilities rather than certainties.

↑ comment by Darren · 2021-01-28T10:08:00.727Z · LW(p) · GW(p)

Another key work here is Probability Theory: The Logic of Science by ET Jaynes. (you can download the entire book here). The early chapters are focused on deriving the probability calculus from logic.

↑ comment by AVoropaev · 2021-01-28T02:44:19.640Z · LW(p) · GW(p)

That's interesting. I've heard about probabilistic modal logics, but didn't know that not only logics are working towards statisticians, but also vice versa. Is there some book or videocourse accessible to mathematical undergraduates?

↑ comment by Carlos Javier Gil Bellosta (carlos-javier-gil-bellosta) · 2021-01-29T05:22:48.191Z · LW(p) · GW(p)

Actually, I said above that "Bayes Theorem plays an important role" but I did not state which or how. In a sense, Bayes Theorem is the soft version of Modus Tollens.

answer by AnthonyC · 2021-01-29T17:24:38.760Z · LW(p) · GW(p)

I think you're coming at this question from a very different perspective than I do, but for me, it isn't about a mathematical result, which, as you and others noted, is very simple and entirely uncontroversial. I also expect it to be uncontroversial for say that, for most people, most of the time, it is not a useful idea to try to explicitly calculate conditional probabilities for making predictions and decisions. Instead, for me what matters is that some of the very basic, mathematically obvious, and uncontroversial implications of conditional probability calculations have no bearing at all on how most of the people in our world make decisions, not just personal decisions but ones with enormous consequences for our society and our collective futures.

In my experience most people, when trying to reason logically, think by default in binary categories and have no idea how to deal with uncertainty. I work in a job where my goal is to advise others on making various decisions based on my interpretation of often ambiguous technical, industry, and other data, and to help train others to do the same. I have never once made an explicit calculation using Bayes' theorem, but I consistently observe that a surprisingly (to me) large fraction of people start out extremely resistant to making reasonable assumptions based on prior knowledge and seeing where they lead, and instead insist that if they don't have "certainty" they can't say anything useful at all. Those people much more often fail to make good predictions, miss out on large opportunities, and waste exorbitant amounts of time and effort chasing down details that do not meaningfully affect the final output of their prediction and decision making processes. They're also more likely to miss huge, gaping holes in their models of the world that make all their careful attempts to be certain completely meaningless.

In other words, for me the point is to develop a better intuitive understanding of how to make reasoned decisions that are likely to lead to desired outcomes - aka to become less wrong, and wrong less often. It's partly because Bayes' theorem is so simple that, for a certain type of person, it serves as a useful entry point on the path to that goal.

↑ comment by ejacob · 2021-01-30T07:00:36.906Z · LW(p) · GW(p)

This strikes me as both a good explanation of why people are excited by Bayes' theorem and why they/I can come off as frustrated that more people don't seem to worship it enough. Basically I view the equation as the mathematical way of saying "you should form and update your beliefs based on evidence." Which, I think as any rationalist would agree, is both (a) not as obvious and straightforward as it sounds, and (b) would make the world better if more people heeded the call.

answer by DaveEtCircenses · 2021-01-26T20:43:40.940Z · LW(p) · GW(p)

For your first half-question, "A Technical Explanation of Technical Explanation" [? · GW] [Edit: added link] sums up the big deal; Bayes Theorem is part of what actually underpins how to make a map reflect the territory (given infinite compute) - it is a necessary component. In comparison with other necessary components required to do this (i.e. logic, math, other basic probability) I would conjecture that Bayes is only special in that it is 'often' the last piece of the puzzle that is assembled in someone's mind, and thus takes on psychological significance.

For the second half-question, I think using Bayes in life is more about understanding just how much priors matter than about actually crunching the numbers.

As an example, commenters on LessWrong will sometimes refer to 'Outside View' vs 'Inside View'.

A quick example to summarise roughly what these mean. If predicting how long 'Project X' will take, an Outside View goes 'well, project A took 3 weeks, and project B took 4, and project C took 3, and this project is similar, so 3-4 weeks', whereas the Inside View goes 'well to do project X, I need to do subprojects Y1, Y2, Y3 and Y4, and these should take me 3, 4, 5 and 4 days respectively, so 16 days = 2 and a bit weeks'. The Inside View is susceptible to the planning fallacy, etc etc.

General perspective: Outside View Good, Inside View Can Go Very Wrong.

You've probable guessed the punchline; this is really about Bayes Theorem. The Outside View goes "I'm going to use previous projects as my prior (technically this forms a prior distribution of estimated project lengths), and then just go with that and try to avoid updating very much, because I have a second prior that says 'all the details of similar projects didn't matter in the past, so I shouldn't pay too much attention to them now", whereas the Inside View Going Very Wrong is what happens when you throw out your priors; you can end up badly calibrated very quickly.

↑ comment by AVoropaev · 2021-01-28T07:58:23.006Z · LW(p) · GW(p)

I've skimmed over A Technical Explanation of Technical Explanation (you can make links end do over stuff by selecting the text you want to edit (as if you want to copy it); if your browser is compatible, toolbar should appear). I think that's the first time in my life when I've found out that I need to know more math to understand non-mathematical text. The text is not about Bayes' Theorem, but it is about application of probability theory to reasoning, which is relevant to my question. As far as I understand, Yudkowski writes about the same algorithm that Vladimir_Nesov describes in his answer to my question. Some nice properties of the algorithm are proved, but not very rigorously. I don't know how to fix it, which is not very surprising, since I know very little about statistics. In fact, I am now half-convinced to take a course or something like that. Thank you for that.

As for the other part of your answer, it actually makes me even more confused. You are saying "using Bayes in life is more about understanding just how much priors matter than about actually crunching the numbers". To me it sounds similar to "using steel in life is more about understanding just how much whole can be greater than the sum of its parts than about actually making things from some metal". I mean, there is nothing inherently wrong with using a concept as a metaphor and/or inspiration. But it can sometimes cause miscommunication. And I am under impression that some people here (not only me) talk about Bayes' Theorem in a very literal sense.

↑ comment by TAG · 2021-01-30T17:04:17.310Z · LW(p) · GW(p)

There are lots of ways of making the same point without bringing in Bayes.

1 comment

Comments sorted by top scores.

comment by Viliam · 2021-02-04T00:08:21.074Z · LW(p) · GW(p)

I think you get half of the value just by noticing that in general P(A|B) is not P(B|A).

Like when people tell you "this study has p=0.05, therefore there is a 95% chance this is true", and you know they are wrong, even if you don't know the actual base probability of "a published study being right".

What's the big deal about Bayes' Theorem?

Contents

Answers

1 comment