A breakdown of priors and posteriors - an example from medicine

post by masasin (jean-nassar) · 2020-12-24T23:35:15.461Z · LW · GW · 4 comments

This is a link post for https://old.reddit.com/r/UpliftingNews/comments/7v99xo

Contents

4 comments

This post is lifted (with edits) from my comment on a thread on Reddit that talked about a blood test that detected amyloid plaque, a potential precursor to Alzheimers. A top-level commenter felt that the fact that the test couldn't tell for certain whether someone had Alzheimers meant that it was useless.

"Not everyone with amyloid in their brains will turn out to have dementia, and not everyone who has dementia will be found to have amyloid in their brains."

So what is the point of this article again? To get a load of clicks I assume, cos there's no real content here.

I attempted to correct that misconception.

(If anyone knows how to do the whole thing with odds ratios rather than probabilities, please let me know! We can't update as normal here.)


tl;dr: No test is ever 100% certain, but it can still constrain your probabilities, making you more certain than you were before. Not everyone who is outside during a thunderstorm gets hit by lightning, and not everyone who gets hit by lightning was outside during a thunderstorm. Going outside in a thunderstorm is still much more dangerous than staying inside.


Notation: is read as the probability of having dementia given that you have amyloid. To make things fit in a LessWrong post, I'll abbreviate having dementia to , and having amyloid to . A positive test result is , and a negative test result is . Finally, a negation is , so would be no amyloid.

, read as the odds of dementia to no dementia, is the odds ratio that is true compared to the odds ratio that is false. means that it's 3 times as likely that somebody has dementia than that they don't. In general, odds doesn't say anything about the magnitude of the probabilities, so they could be small, like 3% and 1%, or big, like 60% and 20%. (Here, of course, because the two choices represent all possibilities, so the probabilities have to be 75% and 25% respectively.)


Imagine you have ten thousand people in the right demographics. How about these (the numbers are made up):

Dementia status Has amyloid Does not have amyloid Total
Has dementia 1000 300 1300
Does not have dementia 100 8600 8700
Total 1100 8900 10000

So, a person chosen at random from that population has a chance of having dementia, and a chance of not having dementia. Their odds ratio for dementia is . They're almost 7 times as likely to have no dementia than they are to have it.

Now, let's say that if you have amyloid, the test will say that you have amyloid 95% of the time, while if you do not have amyloid, the test will say that you have amyloid 10% of the time. In other words, , and . The probability of negative results is just 100% minus the probability of the corresponding positive result. (The actual status is on the top row, and the test response is on the left.)

Test result Has amyloid Does not have amyloid
Positive test 95% 10%
Negative test 5% 90%

Now, let's say that you take the test and it says that you do have amyloid. What can we say about your probability of dementia? Can we do better than that 13%?

If the test is positive, one of four things might be true:

Now, we can find the probability that you do have dementia if you get a positive test result:

here is simply the 1000 people with dementia out of the 1100 with amyloid, or . Similarly, is 300 people with dementia out of the 8900 people without amyloid, or .

What about the and ? Let's calculate the odds ratio for that first.

The prior odds ratio, from the distribution in the population, is . The Bayes Factor for a positive test is 95% for people with amyloid vs 10% for people without, or . So, . Changing that into probabilities gives us and .

Bringing it all together, we get:

We can also find the probability that you have dementia if you get a negative test result. The Bayes Factor here is , and . Therefore, and .

Substitute the numbers in, and we get:

So, we end up with one final table. The probability that someone doesn't have dementia given a test result is just 100% minus the probability of having dementia. (What the test says is on top, and whether you actually have dementia is on the left.)

Dementia status Positive test Negative test
Has dementia 50.6% 4.0%
Does not have dementia 49.4% 96.0%

In other words, a random person would have a 13% chance of having dementia, . However, if your test is positive for amyloid, we adjust to a 50.6% chance that you have dementia, , and a 49.4% chance that you don't, . This is considering the fact that some people have amyloid but no dementia, the fact that some people have no amyloid and dementia, and the fact that the test can just be wrong.

On the other hand, if the test is negative, it would go down from 13% to 4%, but that still isn't zero. It could be that you were just unlucky and the test didn't register your amyloid.

If the test is positive, you're about as certain that you have dementia than that you don't have dementia, since . You are now almost 7 times more sure that you have dementia than you were before. ()

If the test is negative, you're 24 times more certain that you don't have dementia than that you do, since . Compared to your prior belief, you're more than 3.5 times as sure that you don't have dementia than you were before. () (Notice how we flipped the fractions since we're testing the negative case.)

I wouldn't consider a negative case that big of a deal, but the positive one certainly warrants further testing. If you use another test, which checks for things other than amyloid, or for amyloid using a different mechanism, you'll be much more sure of your result because your starting probability would now be the 50.6%, not the original 13%. (The order of the tests doesn't matter, but a first screening would probably be with a test that is cheap and/or quick.)


edit 1: Fixed the explanation of odds ratios. Thanks to Joachim Bartosik for pointing it out.

edit 2: Fixed the calculation of the posterior, and added step-by-step calculations.

4 comments

Comments sorted by top scores.

comment by Joachim Bartosik (joachim-bartosik) · 2018-10-03T21:52:25.557Z · LW(p) · GW(p)

I'm pretty sure you got math wrong here:

O(D:¬D), read as the odds of dementia to no dementia, is the odds ratio that D is true compared to the odds ratio that D is false. O(D:¬D)=3:1 means that it's 3 times as likely that somebody has dementia than that they don't. It doesn't say anything about the magnitude of the probability, so it could be small, like 3% and 1%, or big, like 90% and 30%.

P(D or ¬D) = 1 (with P=1 one either has dementia or doesn't have it) and P(D and ¬D) = 0 (probability of having dementia and not having it is 0), so if O(D:¬D)=3:1 then P(D) = 75% and P(¬D) = 25%.

I mean in your examples.. if :P(D) = 3% and P(¬D) = 1% then what happens in other 96+% of cases (when patient neither has dementia nor doesn't have it)? If P(D) = 90% and P(¬D) = 30% what is the state of the 20+% of patients who both have dementia and don't have it?

Replies from: jean-nassar, joachim-bartosik
comment by masasin (jean-nassar) · 2018-10-04T19:28:33.059Z · LW(p) · GW(p)

You're completely right here. I meant odds of 3:1 in general, as opposed to when they're a complement. (Also, 90 + 30 is more than 100%.) I'll edit it.

It's only 75% and 25% when the sum of probabilities is 100%, but O(red car:green car) can be 3:1 when 60% of cars are red and 20% are green, or when 3% of cars are red and 1% are green. The remainder are different colours.

comment by Joachim Bartosik (joachim-bartosik) · 2018-10-03T22:48:12.225Z · LW(p) · GW(p)

I kept on reading and wanted to check your numbers further (concrete math I could do in my head seems correct but I wanted to check moar) but I got lost in my tiredness and spreadseets. If you're interested in feedback on the math you're doing.. smaller steps are easier to verify. For example when you give the formula for P(D|+) in order to verify it I have to check the formula, value of each conditional probability (including figuring out formula for each of those), and the result at the same time.

It would be much easier to verify if you wrote down the intermediate steps (possibly simplifying verification from 30 minutes of spredsheet munching to a few in-head multiplications).

Replies from: jean-nassar
comment by masasin (jean-nassar) · 2018-10-04T19:29:07.227Z · LW(p) · GW(p)

I'll keep that in mind for next time. Thanks!