How much should you update on a COVID test result?

mayleaf

How much should you update on a COVID test result?

post by mayleaf · 2021-10-17T19:49:42.001Z · LW · GW · 43 comments

  Background
    Using Bayes factors
  List of COVID tests with Bayes factors
    Rapid Antigen Test
      Are you symptomatic?
      Rapid antigen tests: if you don't have symptoms
      Rapid antigen tests: if you have symptoms that developed <1 week ago
      Rapid antigen tests: if you have symptoms that developed >1 week ago
      The Abbot BinaxNow At-Home Test
    Nucleic Acid Amplification Test (NAAT)
      All Rapid NAATs
      Cepheid Xpert Xpress Molecular Test
      Abbot ID Now Molecular Test
    Summary of all tests
  Caveats about infectiousness
  What if I take multiple tests?
  Use with microCOVID
  Acknowledgements 
  References
None
44 comments

This is a writeup of COVID test accuracies that I put together for my own interest, and shared with friends and housemates to help us reason about COVID risk. Some of these friends suggested that I post this to LessWrong. I am not a statistician or an expert in medical research.

Background

We often hear that some kinds of COVID tests are more accurate than others — PCR tests are more accurate than rapid antigen tests, and rapid antigen tests are more accurate if you have symptoms than if you don't. A test's accuracy is often presented as two separate terms: sensitivity (what proportion of diseased patients the test accurately identifies as diseased) and specificity (what proportion of healthy people the test accurately identifies as healthy). But it's not obvious how to practically interpret those numbers: if you test negative, what does that mean about your odds of having COVID?

This writeup attempts to answer to the question, "how much more (or less) likely am I to have COVID given a positive (or negative) test result?" In particular, this is an attempt to calculate the Bayes factor for different types of COVID test results.

The Bayes factor is a number that tells you how much to update your prior odds of an event (in this case, your initial guess at how likely someone is to have COVID) given some piece of new evidence (in this case, a test result). It's calculated based on the test's sensitivity and specificity. If a test has a Bayes factor of 10x for a positive test result, and you test positive, then you should multiply your initial estimated odds of having COVID by 10x. If the same test has a Bayes factor of 0.3x for a negative test result, and you test negative, then you should update your prior odds of having COVID by 0.3x.

Using Bayes factors

(For an excellent explanation of Bayes factors and the motivation behind using them to interpret medical tests, I highly recommend this 3Blue1Brown video, which inspired this post.)

There's a well-known anecdote where doctors in a statistics seminar were asked how they would interpret a positive cancer test result for a routine mammogram taken by an asymptomatic patient. They were told that the test has a sensitivity of 90% (10% false negative rate), a specificity of 91% (9% false positive rate), and that the base rate of cancer for the patient's age and sex is 1%. Famously, nearly half of doctors incorrectly answered that the patient had a 90% probability of having cancer. [1] The actual probability is only 9%, since the base rate of cancer is low in the patient's population. One important lesson from this anecdote is that test results are an update on your priors of having the disease; the same positive test result implies different probabilities of disease depending on the disease's base rate.

Bayes factors help make it easy to make this update. A test's Bayes factor is a single number that, when multiplied by your prior odds, gives you your posterior odds. For a COVID test, you can start with your initial estimate of how likely you are to have COVID (based on the prevalence in your area combined with your vaccination status, or your current number of microCOVIDs) and update from there.

To calculate the Bayes factor for a negative COVID test, you take the probability that you'd test negative in the world where the you do have COVID and divide it by the probability that you'd test negative in the world where the you do not have COVID. Expressed mathematically:

Similarly, the Bayes factor for a positive COVID test is the probability of a positive result in the world where the you do have COVID, divided by the probability of a positive result in the world where the you do not have COVID.

Bayes factor (+) = \frac{p (+ | COVID)}{p (+ | no COVID)} = \frac{true positive rate}{false positive rate} = \frac{sensitivity}{1 - specificity}

To interpret the test result, express your prior probability of having COVID as an odds, and then multiply those odds by the Bayes factor. If you initially believed you had a 10% chance of having COVID, and you got a negative test result with a Bayes factor of 0.1x, you could multiply your prior odds (1:9) by 0.1 to get a posterior odds of 0.1:9, or about 1%.

List of COVID tests with Bayes factors

Below are my calculations for the Bayes factors of rapid nucleic acid amplification tests (which includes rapid PCR tests) as well as rapid antigen tests (the type available for home use in the US). I used sensitivity and specificity estimates from a Cochrane metastudy on rapid tests [2] initially published in August 2020 and last updated in March 2021.

Rapid Antigen Test

This is a test for fragments of SARS-Cov-2 protein [3]. It's typically administered via nasal swab, is available to purchase in the US as at-home test kits, and can be very quick (15 minutes for some brands). It has lower sensitivity (aka more false negatives) than most nucleic acid tests.

Are you symptomatic?

The Cochrane metastudy reviewed 3 brands of rapid antigen test (Coris Bioconcept COVID-19 Ag, Abbot Panbio COVID-19 Ag, and SD Biosensor Standard Q COVID-19 Ag) and found that the sensitivity of all these tests were notably higher for symptomatic patients compared to patients with no symptoms. They also found that these tests were most sensitive within the first week of developing symptoms.

The review's estimates for sensitivity were:

No symptoms: 58.1% (95% CI 40.2% to 74.1%)
Symptomatic, symptoms first developed <1 week ago: 78.3% (95% CI 71.1% to 84.1%)
Symptomatic, symptoms first developed >1 week ago: 51.0% (95% CI 40.8% to 61.0%)

The review found that specificity was similar across all patients regardless of symptom status — about 99.6% (95% CI 99.0% to 99.8%).

Rapid antigen tests: if you don't have symptoms

Estimated Bayes factor for a negative result: about 0.4x ( $\frac{1 - 0.581}{0.996} \approx 0.42$ )
Estimated Bayes factor for a positive result: about 145x ( $\frac{0.581}{1 - 0.996} \approx 145$ )

So, if you got a negative result, you can lower your estimated odds that you have COVID to 0.4x what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 145x what they were before.

Rapid antigen tests: if you have symptoms that developed <1 week ago

Estimated Bayes factor for a negative result: about 0.2x ( $\frac{1 - 0.783}{0.996} \approx 0.22$ )
Estimated Bayes factor for a positive result: about 196x ( $\frac{0.783}{1 - 0.996} \approx 196$ )

So, if you got a negative result, you can lower your estimated odds that you have COVID to 0.2x what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 196x what they were before.

Rapid antigen tests: if you have symptoms that developed >1 week ago

Estimated Bayes factor for a negative result: about 0.5x ( $\frac{1 - 0.510}{0.996} \approx 0.49$ )
Estimated Bayes factor for a positive result: about 128x ( $\frac{0.510}{1 - 0.996} \approx 128$ )

So that if you got a negative result, you can lower your estimated odds that you have COVID to 0.5x what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 128x what they were before.

The Abbot BinaxNow At-Home Test

Update: @Tornus has posted a detailed writeup of the BinaxNow test here: Rapid antigen tests for COVID [LW · GW]

Unfortunately the Cochrane metastudy didn't include data for the Abbot BinaxNOW at-home test, which I was particularly interested in because it's the most common at-home test in the US, and is the test my household uses most frequently. I've seen a few sources (e.g. [4]) that claim that the Abbott BinaxNOW test is slightly more sensitive and about as specific than the Abbott Panbio Ag test which was reviewed by the Cochrane metastudy, so it's possible that this test has a slightly higher predictive power than the ones reviewed above.

Nucleic Acid Amplification Test (NAAT)

This test looks for viral RNA from the SARS-Cov-2 virus [3]. It is typically administered via nasal swab. It's also called a "nucleic acid test" or "molecular test". PCR tests are a type of NAAT. The Cochrane metastudy indicated that sensitivity and specificity differed by brand of test.

All Rapid NAATs

If you got a rapid NAAT but don't know what brand of test it was, you could use these numbers, which are from the initial August 2020 revision of the Cochrane metastudy. This version analyzed data from 11 studies on rapid NAATs, and didn't break up the data into subgroups by brand. They calculated the average sensitivity and specificity of these tests to be:

Sensitivity: 95.2% (95% CI 86.7% to 98.3%)
Specificity: 98.9% (95% CI 97.3% to 99.5%)
Estimated Bayes factor for a negative result: about 0.05x ( $\frac{1 - 0.952}{0.989} \approx 0.05$ )
Estimated Bayes factor for a positive result: about 87x ( $\frac{0.952}{1 - 0.989} \approx 87$ )

So if you get a negative test result, you can lower your estimated odds of having COVID to 0.05 times what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 87x what they were before.

Cepheid Xpert Xpress Molecular Test

This is an RT-PCR test [5]. The March 2021 revision of the Cochrane metastudy included a separate analysis for this brand of test.

EDIT: @JBlack points out in the comments [LW(p) · GW(p)] that the metastudy only included 29 positive COVID cases (out of 100 patients total) for this test, which is a low enough sample size that the below calculations may be significantly off.

Sensitivity: 100% (95% CI 88.1% to 100%)
Specificity: 97.2% (95% CI 89.4% to 99.3%)
Estimated Bayes factor for a negative result: very very low?
If we use the Cochrane study's figures for sensitivity and specificity, we get $\frac{false negative rate}{true negative rate (specificity)} = \frac{1 - 1.00}{0.972} = 0$

If the sensitivity is actually 100%, then we get a Bayes factor of 0, which is weird and unhelpful — your odds of having COVID shouldn't go to literally 0. I would interpret this as extremely strong evidence that you don't have COVID, though (EDIT: although with a positive case count of only 29 COVID cases, perhaps these numbers aren't that meaningful). I'd love to hear from people with a stronger statistics background than me if there's a better way to interpret this.
Estimated Bayes factor for a positive result: about 36x ( $\frac{1.00}{1 - 0.972} \approx 36$ )

So if you get a positive test result, your estimated odds of having COVID is increased by a factor of 36.

Abbot ID Now Molecular Test

This is an isothermal amplification test [5]. The March 2021 revision of the Cochrane metastudy included a separate analysis for this brand of test.

Sensitivity: 73.0% (95% CI 66.8% to 78.4%)
Specificity: 99.7% (95% CI 98.7% to 99.9%)
Estimated Bayes factor for a negative result: about 0.3x ( $\frac{1 - 0.732}{0.997} \approx 0.27$ )
Estimated Bayes factor for a positive result: about 244x ( $\frac{0.732}{1 - 0.997} \approx 243$ )

So if you get a negative test result, you can lower your estimated odds of having COVID to 0.3 times what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 244x what they were before.

I was surprised to see how different the accuracies of Abbott ID Now and Cepheid Xpert Xpress tests were; I'd previously been thinking of all nucleic acid tests as similarly accurate, but the Cochrane metastudy suggests that the Abbott ID Now test is not meaningfully more predictive than a rapid antigen test. This is surprising enough that I should probably look into the source data more, but I haven't gotten a chance to do that yet. For now, I'm going to start asking what brand of test I'm getting whenever I get a nucleic acid test.

Summary of all tests

Test	Bayes factor for negative result	Bayes factor for positive result
Rapid antigen test, no symptoms	0.4x	145x
Rapid antigen test, symptoms developed <1 week ago	0.2x	196x
Rapid antigen test, symptoms developed >1 week ago	0.5x	128x
Rapid NAAT, all brands	0.05x	87x
Rapid NAAT: Cepheid Xpert Xpress	probably very low, see calculation	36x
Rapid NAAT: Abbot ID Now	0.4x	243x

Caveats about infectiousness

From what I've read, while NAATs are highly specific to COVID viral RNA, they don't differentiate as well between infectious and non-infectious people. (Non-infectious people might have the virus, but in low levels, or in inactive fragments that have already been neutralized by the immune system) [6] [7]. I haven't yet found sensitivity and specificity numbers for NAATs in detecting infectiousness as opposed to illness, but you should assume that the Bayes factor for infectiousness given a positive NAAT result is lower than the ones for illness listed above.

Relatedly, the sensitivity of rapid antigen tests is typically measured against RT-PCR as the "source of truth". If RT-PCR isn't very specific to infectious illness, then this would result in underreporting the sensitivity of rapid antigen tests in detecting infectiousness. So I'd guess that if your rapid antigen test returns negative, you can be somewhat more confident that you aren't infectious than the Bayes factors listed above would imply.

What if I take multiple tests?

A neat thing about Bayes factors is that you can multiply them together! In theory, if you tested negative twice, with a Bayes factor of 0.1 each time, you can multiply your initial odds of having the disease by $(0.1)^{2} = 0.01$ .

I say "in theory" because this is only true if the test results are independent and uncorrelated, and I'm not sure that assumption holds for COVID tests (or medical tests in general). If you get a false negative because you have a low viral load, or because you have an unusual genetic variant of COVID that's less likely to be amplified by PCR*, presumably that will cause correlated failures across multiple tests. My guess is that each additional test gives you a less-significant update than the first one.

*This scenario is just speculation, I'm not actually sure what the main causes of false negatives are for PCR tests.

Use with microCOVID

If you use microCOVID.org to track your risk, then you can use your test results to adjust your number of microCOVIDs. For not-too-high numbers of microCOVIDs, the computation is easy: just multiply your initial microCOVIDs by the Bayes factor for your test. For example, if you started with 1,000 microCOVIDs, and you tested negative on a rapid NAAT with a Bayes factor of 0.05, then after the test you have $1000 \cdot 0.05 = 50$ microCOVIDs.

The above is an approximation. The precise calculation involves converting your microCOVIDs to odds first:

Express your microCOVIDs as odds:
1,000 microCOVIDs → probability of 1,000 / 1,000,000 → odds of 1,000 : 999,000
Multiply the odds by the Bayes factor of the test you took. For example, if you tested negative on a rapid nucleic acid test (Bayes factor of 0.05):
1,000 / 999,000 * 0.05 = 50 / 999,000
Convert the resulting odds back into microCOVIDs:
odds of 50 : 999,000 → probability of 50 / 999,050 ≈ 0.00005 ≈ 50 microCOVIDs

But for lower numbers of microCOVIDs (less than about 100,000) the approximation yields almost the same result (as shown in the example above, where we got "about 50 microCOVIDs" either way).

Acknowledgements

Thank you to swimmer963 [LW · GW], gwillen [LW · GW], flowerfeatherfocus [LW · GW], and landfish [LW · GW] for reviewing this post and providing feedback.

References

[1] Doctor's don't know Bayes theorem - Cornell blog

[2] Rapid, point‐of‐care antigen and molecular‐based tests for diagnosis of SARS‐CoV‐2 infection - Cochrane Reviews

[3] Which test is best for COVID-19? - Harvard Health

[4] Performance and Implementation Evaluation of the Abbott BinaxNOW Rapid Antigen Test in a High-Throughput Drive-Through Community Testing Site in Massachusetts - Journal of Clinical Microbiology

[5] EUAs - Molecular Diagnostic Tests for SARS-CoV-2 - US Food & Drug Administration

[6] Nucleic Acid Amplification Testing (e.g. RT-PCR) - Infectious Disease Society of America

[7] Antigen tests as contagiousness tests - rapidtests.org

43 comments

Comments sorted by top scores.

comment by Lukas Finnveden (Lanrian) · 2021-10-18T09:42:04.205Z · LW(p) · GW(p)

Anecdata: A person in my house has over the course of 4-5 weeks had ~5 positive antigen tests (not all of the same brand) and equally many negative PCRs (and no symptoms). Also some number of negative antigen tests (maybe a similar number negative as positive).

I also know another person who had 4 positive antigen tests over the course of a couple of days; and 4 negative PCRs over the course of like a week. And then a negative antibody test a long time later.

My interpretation of this is that false positives correlate a lot within specific people, even if the tests are taken at very different times.

Replies from: gwillen

↑ comment by gwillen · 2021-10-18T16:50:33.942Z · LW(p) · GW(p)

That first one is ... very odd.

Odd enough that I'd be curious to hear more details from that person, if they'd be willing to talk about it. I don't have a model of how that could happen, and I'd like one!

Replies from: None

↑ comment by [deleted] · 2021-10-18T21:01:57.702Z · LW(p) · GW(p)

I am that person. I'll try to summarize my (bizarre) experience here, and I'd be happy to answer any further questions. I'm keen to see what people think could be happening, and would be willing to take some further rapid tests to test out good hypotheses (though this might take some time, as getting positive test results is very inconvenient).

I had my first positive antigen test after I had taken 5 negative antigen tests (daily) during the preceding week, and I had not taken any antigen tests before that. I took a second test after about one hour, which was also positive, and I took a PCR in less than 30 minutes after this second positive result. After receiving the result of the PCR on the following morning, and getting a negative result from another rapid test, I took a second PCR, which was also negative. I didn't take rapid tests for a few days, but when I took my next one, 5 days after the first positive test, it was positive again, and the follow-up PCR test was again negative. I then started taking rapid tests less frequently, waiting for around a week between them, and have had two positives (followed by negative PCRs) and two negatives, in alternating weeks.

Some relevant information: no one around me has had covid during that time, and I haven't had any symptoms. Someone watched me take a rapid test (which later turned out to be positive) and didn't notice anything wrong with what I was doing. I took 13 rapid tests in total, 5 of which were positive, and 4 of these positive tests were from the same brand. 4 out of 5 of the tests of that brand that I have taken were positive. My first and last positive rapid test results are spread 32 days apart. I take my tests in the morning, after about 30 minutes from when I wake up, and before I eat or drink anything, though my second positive test was taken after I had breakfast.

Replies from: gwillen, mayleaf

↑ comment by gwillen · 2021-10-19T19:43:45.049Z · LW(p) · GW(p)

Thank you, that is super interesting and informative.

I am wondering if what we're seeing here is cross-reactivity. (I'm not sure if that's the right term for it, but: repeatable false positives from the test reacting to some antigen that is "close enough" to the covid antigen it's looking for.) I recall seeing a table of things that one of the tests was checked for cross-reactivity against -- they generally aren't good at distinguishing SARS from covid, they mostly don't react to other stuff, but the rate is not zero. (It's possible the thing I am thinking of was an isothermal NAAT test, not antigen.)

Looking for info about this kind of thing with antigen tests, I eventually found this:

https://onlinelibrary.wiley.com/doi/10.1111/ped.14582

They have three case studies of antigen test false-positives, in Japan, in children infected with Human Rhinovirus A, and one of them remained positive on a test from a different production batch (although not a different brand/test.) And they mention that other countries have reported cross-reactivity with "other coronaviruses, influenza virus, and Mycoplasma pneumoniae".

If it eventually went away and didn't come back, I think after reading this I'm going to put my money on "cross-reactivity to another asymptomatic viral infection." If it didn't go away, and you still test positive on further tests, that becomes even more interesting and maybe you could get someone to write a case study about it in exchange for figuring out what the heck it is.

Replies from: None

↑ comment by [deleted] · 2021-10-20T07:34:35.349Z · LW(p) · GW(p)

Thank you, this is quite interesting! I did consider the possibility of some other virus being the cause of the positive tests, but it strikes me as somewhat odd to have 32 days in between positive tests. In any case, the explanation to this situation is probably something very unlikely, so this could be it. This situation is recent (my last positive antigen test was a few days ago), so I might wait for a few more weeks and do some tests again to see what happens.

Replies from: gwillen

↑ comment by gwillen · 2021-10-20T20:32:30.191Z · LW(p) · GW(p)

Would love an update if you do!

Replies from: None

↑ comment by [deleted] · 2022-01-09T13:04:05.232Z · LW(p) · GW(p)

[Update]
Over the course of the past month (starting ~7 weeks after my last false positive), I've taken 8 rapid tests, 4 of which from Acon (the brand that is most correlated with my false positives). All of them were negative, and one of the Acon ones had a very faint test line (which I still interpret as being negative, though I didn't confirm with a PCR).
The only thing I noticed I did differently this time around was that I took most of these tests (6 of them) past 11am, and not right after I woke up, as I used to do before, and the one with a faint test line was among the two I took within 2 hours of waking up. I never eat or drink anything 30 minutes before testing, as the instructions recommend.
My impression is that this is reasonably good evidence that whatever was triggering the false positives is now mostly gone, as per in gwillen's comment about cross-reactivity. Besides the observation above I have no reason to think that the time of the day when I take it should make much of a difference to the test results; I might test that out in the coming months. As before, I'm happy to read any other possible explanations for this, and might also test them out in the future.

Replies from: gwillen

↑ comment by gwillen · 2022-01-13T08:34:31.310Z · LW(p) · GW(p)

Thanks for the update! This is really interesting to follow along with.

↑ comment by mayleaf · 2021-10-19T03:25:05.369Z · LW(p) · GW(p)

Wow, that is surprising, thanks for sharing. Am I reading correctly that you got no positive NAAT/PCR tests, and only got positives from antigen tests?

I took 13 rapid tests in total, 5 of which were positive, and 4 of these positive tests were from the same brand. 4 out of 5 of the tests of that brand that I have taken were positive.

Would you be up for sharing what brand that was?

I don't yet know enough about what causes false positives and false negatives in either antigen tests or NAATs to speculate much, but I appreciate this datapoint! (Also, glad you're feeling well and didn't develop any symptoms)

Replies from: None

↑ comment by [deleted] · 2021-10-19T12:45:54.873Z · LW(p) · GW(p)

You're right, only the antigen tests were positive.
The brand was this one: https://www.aconlabs.com/sars-cov-2-antigen-rapid-test/.

Replies from: KPier

↑ comment by KPier · 2021-10-19T14:48:17.951Z · LW(p) · GW(p)

Were the positive tests from the same batch/purchased all together?

Replies from: None

↑ comment by [deleted] · 2021-10-19T17:23:55.282Z · LW(p) · GW(p)

Three of them (two positives, one negative) were from the same box, and the other two from different boxes. Other people at my house have taken tests from the same boxes, but no one else had positive results.

comment by chanamessinger (cmessinger) · 2021-10-17T21:04:37.955Z · LW(p) · GW(p)

This is so helpful, thank you!

comment by Lukas Finnveden (Lanrian) · 2021-10-18T09:55:31.727Z · LW(p) · GW(p)

Antigen tests are more likely to be positive on high viral loads than low viral loads. This means that you should have a stronger update towards not-likely-to-infect-others than you should have towards being not-sick. There's some previous discussion of that here [EA · GW], with the most useful source of data being this paper that looks at contact-tracing data to figure out the likelihood that people who actually went on to infect others would've tested positive. The headline result is "The most and least sensitive LFDs would detect 90.5% (95%CI 90.1-90.8%) and 83.7% (83.2-84.1%) of cases with PCR-positive contacts respectively." I haven't criticially examined the methodology.

comment by Stephen Bennett (GWS) · 2021-10-18T04:03:19.729Z · LW(p) · GW(p)

Re: “ If the sensitivity is actually 100%, then we get a Bayes factor of 0, which is weird and unhelpful — your odds of having COVID shouldn't go to literally 0. I would interpret this as extremely strong evidence that you don't have COVID, though. I'd love to hear from people with a stronger statistics background than me if there's a better way to interpret this.”

The test doesn’t actually have 100% sensitivity. That’s an estimate based on some study they ran that had some number of true positives out of some number of tests on true cases. Apparently it got all of those right, and from that they simply took the point estimate to equal the sample rate.

The Bayesian solution to this is to assume a prior distribution (probably a Beta(1,1)), which will update in accordance to incoming evidence from the outcomes of tests. If the study had 30 tests (I haven’t read it since I’m on mobile, so feel free to replace that number with whatever the actual data are), that’d correspond to a posterior of a Beta(31,1) (note that in general Betas update by adding successes to the first parameter and failures to the second parameter, so the prior of 1 becomes a posterior of 31 after 30 successes). Taking a point estimate based on the mean of this posterior would give you a (n+1)/(n+2) percent sensitivity. In my toy example, that’d be 31/32% or ~97%. Again, replace n with the sample size of the actual experiment.

Some notes:

A real Bayes factor would be slightly more complicated to compute since the point estimate given the posterior involves some loss of information, but would give very similar values in practice because a Beta is a pretty nice function

The Beta(1,1) is probably better known as the Uniform distribution. It’s not the only prior you can use, but it’ll probably be from the beta family for this problem.

As a test with a true 100% sensitivity accumulates more data, the point estimate of its sensitivity given this method will approach 100% (since (n+1)/(n+2) approaches 1 as n approaches infinity), which is a nice sanity check.

When the test fails to detect covid, it will increment the second number in the beta distribution. For an intuition of what this distribution looks like for various values, this website is pretty good: https://keisan.casio.com/exec/system/1180573226

Replies from: mayleaf, gwillen

↑ comment by mayleaf · 2021-10-19T03:10:24.753Z · LW(p) · GW(p)

I haven't had time to read up about Beta distributions and play with the tool you linked, but I just wanted to say that I really appreciate the thorough explanation! I'm really happy that posting about statistics on LessWrong has the predictable consequence of learning more statistics from the commenters :)

↑ comment by gwillen · 2021-10-18T16:22:21.943Z · LW(p) · GW(p)

Obviously this correction is relatively most important when the point estimate of the sensitivity/specificity is 100%, making the corresponding Bayes factor meaningless. Do you have a sense of how important the correction is for smaller values / how small the value can be before it's fine to just ignore the correction? I assume everything is pulled away from extreme values slightly, but in general not by enough to matter.

Replies from: GWS

↑ comment by Stephen Bennett (GWS) · 2021-10-19T01:15:34.358Z · LW(p) · GW(p)

Simple answer first: If the sensitivity and specificity are estimated with data from studies with large (>1000) sample sizes it mostly won’t matter.

Various details:

Avoiding point estimates altogether will get you broader estimates of the information content of the tests, regardless of whether you arrive at those point estimates from Bayesian or frequentist methods.

Comparing the two methods, the Bayesian one will pull very slightly towards 50% relative to simply taking the sample rate as the true rate. Indeed, it’s equivalent to adding a single success and failure to the sample and just computing the rate of correct identification in the sample.

The parameters of a Beta distribution can be interpreted as the total number of successes and failures, combining the prior and observed data to get you the posterior.

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-19T03:09:01.439Z · LW(p) · GW(p)

Thanks, I was wondering if the answer would be something like this (basically that I should be using a distribution rather than a point estimate, something that @gwillen also mentioned when he reviewed the draft version of this point).

If the sensitivity and specificity are estimated with data from studies with large (>1000) sample sizes it mostly won’t matter.

That's the case for the antigen test data; the sample sizes are >1000 for each subgroup analyzed (asymptomatic, symptoms developed <1 week ago, symptoms developed >1 week ago).

The sample size for all NAATs was 4351, but the sample size for the subgroups of Abbot ID Now and Cepheid Xpert Xpress were only 812 and 100 respectively. Maybe that's a small enough sample size that I should be suspicious of the subgroup analyses? (@JBlack mentioned this concern below and pointed out that for the Cepheid test, there were only 29 positive cases total).

comment by JBlack · 2021-10-18T03:45:58.102Z · LW(p) · GW(p)

The confidence interval in the Cepheid analysis does not inspire confidence.

Usually when a test claims "100% sensitivity", it's based on all members of some sample with the disease testing positive. The lower end of the 95% interval is a lower bound on the true sensitivity such that there would still be at least 5% chance of getting no false negatives.

That's where it starts to look dodgy: Normally it would be 2.5% to cover upper and lower tails of the distribution, but there is no tail below zero false negatives. It looks like they used 2.5% anyway, incorrectly, so it's really a 97.5% confidence interval. The other problem is that the positive sample size must have been only 29 people. That's disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.

There are a number of assumptions you can use to turn this into an effective false negative rate for Bayesian update purposes. You may have priors on the distribution of true sensitivities, priors on study validity, and so on. They don't matter very much, since they mostly yield a distribution with an odds ratio geometric mean around the 15-40 range anyway. If I had to pick a single number based only on seeing their end result, I'd go with 96% sensitivity under their study conditions, whatever those were.

I'd lower my estimate for real life tests, since real life testing isn't usually nearly as carefully controlled as a validation study, but I don't know how much to lower it.

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-19T02:58:21.644Z · LW(p) · GW(p)

Thanks, I appreciate this explanation!

The other problem is that the positive sample size must have been only 29 people. That's disturbingly small for a test that may be applied a billion times, and seriously makes me question their validation study that reported it.

Thanks for flagging this. The review's results table ("Summary of findings 1") says "100 samples" and "29 SARS-COV-2 cases"; am I correctly interpreting that as 100 patients, of which 29 were found to have COVID? (I think this is what you're saying too, just want to make sure I'm clear on it)

If I had to pick a single number based only on seeing their end result, I'd go with 96% sensitivity under their study conditions, whatever those were.

Can you say more about how you got 96%?

Replies from: JBlack

↑ comment by JBlack · 2021-10-19T06:09:11.504Z · LW(p) · GW(p)

I hadn't actually read the review, but yes I meant that the sample must have had 29 people who were known (through other means) to be positive for SARS-cov-2, and all tested positive.

Can you say more about how you got 96%?

Educated guessing, really. I did a few simple models with a spreadsheet for various prior probabilities including some that were at each end of being (subjectively, to me) reasonable. Only the prior for "this study was fabricated from start to finish but got through peer review anyway" made very much difference in the final outcome. (If you have 10% or more weight on that, or various other "their data can't be trusted" priors then you likely want to adjust the figure downward)

So with a rough guess at a prior distribution, I can look at the outcomes from the point of view of "what single value has the same end effect on evidence weight as this distribution". I make it sound fancy, but it's really just "if there was a 30th really positive test subject in these dozen or so possible worlds that I'm treating as roughly equally likely, and I only include possible worlds where the validation detected all of the first 29 cases, how often does that 30th test come up positive?" That come out at close to 96%.

Replies from: gwillen

↑ comment by gwillen · 2021-10-19T19:53:17.450Z · LW(p) · GW(p)

I'm having trouble discerning this from your description and I'm curious -- is this approach closely related to the approach GWS describes above, involving the beta distribution, which basically seems to amount to adding one "phantom success" and one "phantom failure" to the total tally?

Replies from: JBlack

↑ comment by JBlack · 2021-10-20T00:14:10.926Z · LW(p) · GW(p)

It is related in the sense that if your prior for sensitivity is uniform, then the posterior is that beta distribution.

In my case I did not have a uniform prior on sensitivity, and did have a rough prior distribution over a few other factors I thought relevant, because reality is messy. Certainly don't take it as "this is the correct value", and the approach I took almost certainly has some major holes in it even given the weasel-words I used.

Replies from: gwillen

↑ comment by gwillen · 2021-10-20T02:47:09.663Z · LW(p) · GW(p)

Thanks for the info!

comment by KPier · 2021-10-17T21:04:31.120Z · LW(p) · GW(p)

Are test errors going to be highly correlated? If you take two tests (either of the same type or of different types) and both come back negative, how much of an update is the second test?

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-17T21:12:58.075Z · LW(p) · GW(p)

I'm not super sure; I wrote about this a little in the section "What if you take multiple tests?":

If you get a false negative because you have a low viral load, or because you have an unusual genetic variant of COVID that's less likely to be amplified by PCR*, presumably that will cause correlated failures across multiple tests. My guess is that each additional test gives you a less-significant update than the first one.

*This scenario is just speculation, I'm not actually sure what the main causes of false negatives are for PCR tests.

but that's just a guess. I'd love to hear from anyone who has a more detailed understanding of what causes failures in NAATs and antigen tests.

Naively, I'd expect that if the test fails due to low viral load, that would probably cause correlated failures across all tests taken on the same day. Waiting a few days between tests is probably a good idea, especially if you were likely to be in the early-infection stage (and so likely low viral load) during your first test. The instructions for the BinaxNOW rapid antigen test say that if you get a negative result, you shouldn't repeat the test until 3 days later.

Replies from: Tornus, KPier

↑ comment by Tornus · 2021-10-18T15:50:48.919Z · LW(p) · GW(p)

Yes, accuracy in antigen tests seems to correlate very strongly with viral load (and presumably therefore with infectivity). This paper found 100% agreement with PCR for Ct 13-19.9 (massive viral load), all the way down to 8% agreement for Ct 30-35.

Ct (cycle time) measures how many amplification cycles were needed to detect nucleic acid. Lower Ct values indicate exponentially more nucleic acid than higher values, although Ct values are not standardized and can't be directly compared between testing facilities.

↑ comment by KPier · 2021-10-17T21:32:41.235Z · LW(p) · GW(p)

And same question for a positive test: if you get a positive and then retest and get a negative, do you have a sense of how much of an overall update you should make? I've been treating that as 'well, it was probably a false positive then', but multiplying the two updates together would imply it's probably legit?

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-18T03:00:04.454Z · LW(p) · GW(p)

Yeah, based on the Cochrane paper I'd interpret "one positive result and one negative result" as an overall update towards having COVID. In general, both rapid antigen tests and NAATs are more sensitive than they are specific (more likely to return false negatives than false positives.)

Though also see the "Caveats about infectiousness" section, which suggests that NAATs have a much higher false positive rate for detecting infectiousness than they do for detecting illness. I don't have numbers for this, unfortunately, so I'm not sure if 1 positive NAAT + 1 negative NAAT is overall an update in favor or away from infectiousness.

comment by leggi · 2021-10-18T04:59:20.489Z · LW(p) · GW(p)

A new paper that might be of interest:

Recalibrating SARS-CoV-2 Antigen Rapid Lateral Flow Test Relative Sensitivity from Validation Studies to Absolute Sensitivity for Indicating Individuals Shedding Transmissible Virus

Replies from: gwillen

↑ comment by gwillen · 2021-10-18T16:54:17.592Z · LW(p) · GW(p)

I have a sense that this is the formal academic paper version of the webpage that's footnote 7 in the post. Michael Mina (one of the authors on the paper, and one of the advisors on the webpage) is an epidemiologist (who I follow on Twitter), who has been a big cheerleader for the "antigen tests as contagiousness tests" concept. I was really happy to finally see it written up more formally.

(He's not first author on the paper, and I don't want to imply it's all just him -- he's just the loudest voice I've been seeing for this, over a pretty long time.)

comment by Insub · 2021-10-18T01:41:03.240Z · LW(p) · GW(p)

I remember hearing from what I thought was multiple sources that your run-of-the-mill PCR test had something like a 50-80% sensitivity, and therefore a pretty bad bayes factor for negative tests. But that doesnt seem to square with these results - any idea what Im thinking of?

Replies from: gwillen

↑ comment by gwillen · 2021-10-18T17:03:51.364Z · LW(p) · GW(p)

I remember something like what you're talking about, I think -- googling finds e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0251661 making this case.

I think a lot of these numbers are unfortunately sensitive to various conditions and assumptions, and PCR has been taken as the "gold standard" in many ways, which means that PCR is often being compared against just another PCR.

My impression was that, when properly performed, RT-PCR should be exquisitely sensitive to RNA in the sample, but that doesn't help if the sample doesn't have any RNA in it (e.g. when someone is very newly infected.) I had assumed that's where the discrepancy comes from. But then in googling for the limit of sensitivity, I found this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7302192/ assessing different PCR tests against each other. The best had a "limit of detection" of 100 copies of RNA per mL of sample. But apparently there is a LOT of variation between commercially-available PCR tests. :-(

comment by Adam Zerner (adamzerner) · 2021-10-18T16:39:05.842Z · LW(p) · GW(p)

As an alternative to the 3Blue1Brown video, Arbital has a good post on the odds form of bayes' theorem.

Also, Microcovid has a good blog post on how to interpret negative test results. In particular, there is some more detailed information on how the sensitivity depends on the number of days it's been since you've been exposed.

I wonder how the delta variant affects all of this.

It seems to me like the most important question is what your prior should start out as. Eg. if you live in this area, develop a 102 degree fever, and have been exposed to roughly 1000 microcovids over the past N days, how likely is it that you have covid? I'd find it really helpful to get some help thinking about those sorts of questions.

Replies from: gwillen

↑ comment by gwillen · 2021-10-18T17:10:29.557Z · LW(p) · GW(p)

I think you get into trouble fairly quickly when trying to ask these questions, and even with some of the parameters already covered by microcovid, due to non-independence and non-linearity of various parameters. E.g. microcovid roughly accounts for the fact that hours of exposure to the same person are not independent exposure events, vs adding more people. But it does that with a hard cap on the number of microcovids you can get from a single person in a single event (IIRC), which is a pretty crude approximation. (Not a single hard numeric cap, but a cap based on the nature of the exposure -- I think it's a good approach, it's just definitely an approximation of a smoother nonlinear curve that we don't know how to draw.)

And I don't think anybody (outside of academic papers in epidemiology) is really accounting for things like the very uneven distribution of spread between people. If almost all spread is from a tiny number of superspreaders, your precautions look very different than if it's pretty much even across everyone. I think our rough models tend to assume the latter, but the reality is somewhere in between. We mostly hope the various nonlinearities are small or cancel each other out, but I think that's often not true.

comment by Tornus · 2021-10-18T15:41:42.767Z · LW(p) · GW(p)

Thank you for this! I have a few thoughts about antigen tests.

1: I'd recommend the BinaxNOW as the "standard" home antigen test in the US. Broadly speaking it's better studied, more accurate, cheaper, and more widely available than the others. Regarding data...

2: I think the best current source of general data on home antigen tests is this meta analysis from September. The results from multiple papers over the last year have been pretty consistent, but this adds a little more power to the numbers. They come up with:

Overall: sensitivity 68%, specificity 99-10%. Sensitivity for symptomatic individuals: 72%. Sensitivity for asymptomatic individuals: 52%

Sensitivity for Ct < 25: 94%, Ct > 3: 30%. (I'll be writing more about these results in a bit, but the short version is that this strongly supports the belief that test sensitivity depends strongly on viral load and will be highest during peak infectivity).

3: Two additional excellent papers are this one for subgroup analysis and this one for subgroup analysis and discussion of how user error affects accuracy.

4: Related to the above: accuracy seems strongly correlated with viral load, which strongly suggests multiple tests on the same individual at the same time would be highly correlated.

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-19T03:16:59.600Z · LW(p) · GW(p)

Thanks for linking the meta-analysis and the other papers; will read (and possibly update the post afterwards)! I especially appreciate that the meta-analysis includes studies of BinaxNOW, something I'd been looking for.

Sensitivity for Ct < 25: 94%, Ct > 3: 30%. (I'll be writing more about these results in a bit, but the short version is that this strongly supports the belief that test sensitivity depends strongly on viral load and will be highest during peak infectivity).

Nice, I'd been hearing/reading about using cycle count to determine how much a test's results track infectiousness, and that's really to see the results so starkly supporting that. Looking forward to your writeup!

comment by DPiepgrass · 2021-10-21T17:44:43.703Z · LW(p) · GW(p)

They were given the information that the test has a sensitivity of 90% (10% false negative rate), a specificity of 91% (9% false positive rate), and that the base rate of cancer for the patient's age and sex is 1%. Famously, nearly half of doctors incorrectly answered that the patient had a 90% probability of having cancer. [1] The actual probability is only 9%

The probability surely isn't 90%, but if the scenario presented to the doctors was anything other than "routine cancer screening that we do for everybody who comes in here", the probability isn't 9% either.

Most people are tested for cancer because they have one or more symptoms consistent with cancer. So the base rate of 1% "for the patient's age and sex" isn't the correct prior, because most of the people in the base rate have no symptoms that would provoke a test. The correct prior would be adjusted for the patient's symptoms. But how do we actually adjust the prior for symptoms? I don't know. It sounds difficult.

But I expect that usually a test has been used in the past in exactly this way: restricted to those with symptoms. So as long as someone is in charge of gathering data on this, we should already have an empirical prior for P(has cancer | positive cancer test result + symptoms), e.g. 30%, that doctors can use directly. This information should be available because, as time passes after a test, it should eventually become clear whether the patient really had cancer or not, and that later information (aggregated over numerous patients) gives us a pretty good estimate of P(has cancer | positive + symptoms) and P(has cancer | negative + symptoms) for new patients. (But I am not a doctor and can't vouch for whether The System is bothering to calculate these things.)

It's good to see, then, that there are separate measures of sensitivity and specificity for symptomatic and asymptomatic patients.

This post doesn't tell me what I want to know, though, which is:

P(I am infected | positive test & symptoms)
P(I am infected | positive test & no symptoms)
P(I am infected | negative test & symptoms)
P(I am infected | negative test & no symptoms)

So, if you got a negative result, you can lower your estimated odds that you have COVID to 0.4x what they were before. If you got a positive result, you should increase your estimated odds that you have COVID to 145x what they were before.

My prior would just be a guess, and I don't see how multiplying a guess by 145x is helpful. We really need a computation whose result is a probability.

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-29T02:47:01.319Z · LW(p) · GW(p)

Most people are tested for cancer because they have one or more symptoms consistent with cancer. So the base rate of 1% "for the patient's age and sex" isn't the correct prior, because most of the people in the base rate have no symptoms that would provoke a test.

To clarify, the problem that Gigerenzer posed to doctors began with "A 50-year-old woman, no symptoms, participates in a routine mammography screening". You're right that if there were symptoms or other reasons to suspect having cancer, that should be factored into the prior. (And routine mammograms are in fact recommended to all women of a certain age in the US.)

We really need a computation whose result is a probability.

I agree - it would be ideal to have a way to precisely calculate your prior odds of having COVID. I try and estimate this using microCOVID to sum my risk based on my recent exposure level, the prevalence in my area, and my vaccination status. I don't know a good way to estimate my prior if I do have symptoms.

My prior would just be a guess, and I don't see how multiplying a guess by 145x is helpful.

I don't fully agree with this part, because regardless of whether my prior is a guess or not, I still need to make real-world decisions about when to self-isolate and when to seek medical treatment. If I have a very mild sore throat that might just be allergies, and I stayed home all week, and I test negative on a rapid test, what should I do? What if I test negative on a PCR test three days later? Regardless of whether I'm using Bayes factors, or test sensitivity or just my intuition, I'm still using something to determine at which point it's safe to go out again. Knowing the Bayes factors for the tests I've taken helps that reasoning be slightly more grounded in reality.

Edit: I've updated my post to make it clearer that the Gigerenzer problem specified that the test was a routine test on an asymptomatic patient.

comment by cistrane · 2021-10-20T07:42:44.470Z · LW(p) · GW(p)

Any prior calculation for covid infection must also include vaccination status and history of recovery from previous covid infection(s)

Replies from: gabriel-holmes

↑ comment by tkpwaeub (gabriel-holmes) · 2021-10-22T17:57:30.954Z · LW(p) · GW(p)

I was wondering about this too, but maybe the author has a good reason for not breaking it down by immunity status?

Replies from: mayleaf

↑ comment by mayleaf · 2021-10-29T02:13:59.145Z · LW(p) · GW(p)

An earlier draft of this actually mentioned vaccination status, and I only removed it for sentence flow reasons. You're right that vaccination status (or prior history of COVID) is an important part of your prior estimate, along with prevalence in your area, and your activities/level of exposure. The microCOVID calculator I linked factors in all three of these. I've also edited the relevant sentence in the "Using Bayes factors" section to mention vaccination status.

comment by michaelkeenan · 2021-11-26T23:00:51.814Z · LW(p) · GW(p)

How much should you update on a COVID test result?

Contents

Background

Using Bayes factors

List of COVID tests with Bayes factors

Rapid Antigen Test

Are you symptomatic?

Rapid antigen tests: if you don't have symptoms

Rapid antigen tests: if you have symptoms that developed <1 week ago

Rapid antigen tests: if you have symptoms that developed >1 week ago

The Abbot BinaxNow At-Home Test

Nucleic Acid Amplification Test (NAAT)

All Rapid NAATs

Cepheid Xpert Xpress Molecular Test

Abbot ID Now Molecular Test

Summary of all tests

Caveats about infectiousness

What if I take multiple tests?

Use with microCOVID

Acknowledgements

References

43 comments

Recalibrating SARS-CoV-2 Antigen Rapid Lateral Flow Test Relative Sensitivity from Validation Studies to Absolute Sensitivity for Indicating Individuals Shedding Transmissible Virus