In the early days of the pandemic, there wasn't great data available, and it wasn't easy to do better than trusting the standard epidemiological estimate that around 2% of people who got COVID-19 would die. My back of the envelope estimate at the time was way higher, but no one else I knew seemed to think that number made sense, so I let the matter drop. But now we have enough data to check.
Recently, my sister reached out to me to check her own thinking on the matter. She used the same method I initially tried and abandoned - simply dividing the number of deaths by the number of resolved cases (deaths + recoveries) - to estimate that in the US, COVID-19 kills around 1 in 6 people who get it.
The problem with using only resolved cases, in a country with an ongoing pandemic, is that if people die faster than they're marked recovered, death rates can be inflated - and if they recover faster, deflated. Ideally, you'd want to wait until all cases have been resolved one way or the other. Fortunately, there are now countries where that situation nearly holds.
I looked at countries with the 25 lowest active case counts, using the 91-DIVOC visualization tool, to see which ones seemed to be mostly done with the pandemic, at least for now:
I copied the current numbers (as of 7 Jun 2020) into a spreadsheet, to see what the effective death rate is in countries where the vast majority of cases are resolved. For the People's Republic of China, I only looked at numbers from Hubei.
In the countries and regions I looked at, active cases are around 1% of all confirmed cases, and around 5.8% of resolved COVID-19 cases end in death. But two thirds of cases came from Hubei. Excluding China, around 4% of confirmed cases are active, and around 3.5% of resolved cases ended in death. If I only look at other countries where fewer than 1% of confirmed cases are active, the average COVID-19 death rate is 2.6%, though in individual countries it ranges from 0.55% for Iceland to 4.67% for Croatia.
Infection fatality rate (IFR) is clearly the superior metric if you're trying to do something like forecast the spread of the virus, and total death counts, because it corresponds more directly to a statement about underlying reality than the case fatality rate (CFR) does; the denominator of CFR is determined in part by who gets tested.
But if you're trying to figure out what rough-and-ready multiplier to apply to the daily numbers reported in your area, then to use IFR estimates, you need to remember that reported cases are not the same as actual infections, and adjust accordingly.
Ben, I think you're failing to account for under-testing. You're computing the case fatality rate when you want the infection fatality rate. Most experts, as well as the well-done meta analyses, place the IFR in the 0.5%-1% range. I'm a little bit confused why you're relying on this back of the envelope rather than the pretty extensive body of work on this question.
IFR isn't that helpful when trying to use public case data to estimate a hazard rate. I'll add a note clarifying that in the post. Since what's reported are cases, case fatalities are the natural thing to multiply the rate of new cases by.
Some apparently expert-promoted models have been total nonsense, and I prefer a back-of-the-envelope calculation whose flaws are obvious and easy for me to understand, to comparatively opaque sophisticated estimates which I can't interpret.
Can you point me to a clear concise account that shows how to estimate IFR with available data and use it in a decision-relevant way?
I'm not confident in a 1% as an upper limit (especially in an overrun healthcare system) but I do think that comment gives good back-of-the-envelope estimates (as requested). Later on in that thread CBG also acknowledges [LW(p) · GW(p)] it may be higher in than 1% in some places and conditions.
Detail in this case is useful as it shows multiple sources and back-of-the-envelope calculations. I'm not really assessing CBG (except trusting that he isn't picking and choosing his arguments), rather I'm assessing his back-of-the-envelope calculation and where likely errors can creep in - exactly what the great-grandparent mentioned was preferred.
If "Greg Cochran says 1.2%" is the counter-argument then I don't really know what to say except how likely is it that he's wrong this time and by what factor might he be off? What's his confidence interval? If someone can provide his working then at least that's something I can assess. It seems he is looking specifically at places with high infection rates and more stretched healthcare systems.
Anyhow, you repudiated this. When I pushed you on it, you came up with the number 1.4%.
The naive central estimate of a single back-of-the-envelope estimate [LW(p) · GW(p)] where virus prevalence in Lombardy was estimated from one small town from a month previous isn't something I'd put much weight on. If pushed for an interquartile range based only on this calculation I would say 0.5<IFR<3.5. The point of that calculation wasn't to get an accurate answer but to show that 0.2% population fatality rate doesn't imply that the IFR is massive and 3,000,000 US coronavirus deaths this year is still highly unlikely.
except trusting that he isn't picking and choosing his arguments
Well, don't do that. I told you this before [LW(p) · GW(p)].
What's his confidence interval?
What's CBG's confidence interval? When he says 0.5-1%, does he mean something? Does he mean a confidence interval, or a distribution of "normal" situations or a distribution of more general situations? Or does he not mean anything?
Later on in that thread CBG also acknowledges [LW(p) · GW(p)] it may be higher in than 1% in some places and conditions.
It's nice that he says that, but that's exactly the situation that you cited him in the other thread, claiming <=1%. I'm guessing that the pseudo-detail is exactly what caused you to not understand his claims. If you don't know what he claims, how can you assess his work? At least with GC you're not fooling yourself about what you've done.
And I still don't know what he claims. He seems to claim that NYC had IFR <=1%. Was NYC normal or not? In any event he's wrong. If NYC defines the upper range, then this affects his conclusion. If NYC doesn't count, I dunno, but I'm pretty sure that people are equivocating on whether it counts.
The CFR will shift substantially over time and location as testing changes. I'm not sure how you would reliably use this information. IFR should not change much and tells you how bad it is for you personally to get sick.
I wouldn't call the model Zvi links expert-promoted. Every expert I talked to thought it had problems, and the people behind it are economists not epidemiologists or statisticians.
Regarding back-of-the-envelope calculations, I think we have different approaches to evidence/data. I started with back-of-the-envelope calculations 3 months ago. But I would have based things on a variety of BOTECs and not a single one. Now I've found other sources that are taking the BOTEC and doing smarter stuff on top of it, so I mostly defer to those sources, or to experts with a good track record. This is easier for me because I've worked full-time on COVID for the past 3 months; if I weren't in that position I'd probably combine some of my own BOTECs with opinions of people I trusted. In your case, I predict Zvi if you asked him would also say the IFR was in the range I gave.
I clicked through to the tweet you mentioned, which contains a screencap of a chart purporting to show "An Approximate Percentage of the Population That Has COVID-19 Antibodies." No dates or other info about how these numbers might have been generated.
Fortunately, Gottlieb's next tweet in the thread contains another screencap of the URLs of the studies mentioned in the chart. I hand-transcribed the Wuhan study URL, and found that while it was performed at a date that's probably helpful (April 20th) it's a study in a single hospital in Wuhan, and the abstract explicitly says it's not a good population estimate:
Here, we reported the positive rate of COVID‐19 tests based on NAT, chest CT scan and a serological SARS‐CoV‐2 test, from April 3 to 15 in one hospital in Qingshan Destrict, Wuhan. We observed a ~10% SARS‐CoV‐2‐specific IgG positive rate from 1,402 tests. Combination of SARS‐CoV‐2 NAT and a specific serological test might facilitate the detection of COVID‐19 infection, or the asymptomatic SARS‐CoV‐2‐infected subjects. Large‐scale investigation is required to evaluate the herd immunity of the city, for the resuming people and for the re‐opened city.
I'd need to know more about e.g. hospitalization rates in Wuhan to interpret this.
The New York numbers seem to come from a press release, with no clear info about how testing was conducted.
All of these are point estimates, and to get ongoing infection rates, I'd need to fit a time series model with too many degrees of freedom. Not saying no one can do this, but definitely saying it's not clear to me how I can make use of these numbers without working on the problem full time for a few weeks.
You've nonspecifically referred to experts and models a few times; that's not helpful and only serves to intimidate. What would be helpful would be if you could point to specific models by specific experts that make specific claims which you found helpful.
I'm not trying to intimidate; I'm trying to point out that I think you're making errors that could be corrected by more research, which I hoped would be helpful. I've provided one link (which took me some time to dig up). If you don't find this useful that's fine, you're not obligated to believe me and I'm not obligated to turn a LW comment into a lit review.
Given that it apparently took you some time to dig up even as much as a tweet with a screen cap of some numbers that with quite a lot of additional investigation might be helpful, I hope you're now at least less "confused" about why I am "relying on this back of the envelope rather than the pretty extensive body of work on this question."
If you want to see something better, show something better.
Because of false positives, seroprevalence is massively overestimated everywhere that there hasn't been a massive outbreak. In those places the IFR is 1-2%. But can we extrapolate to normal outbreaks? If, as widely believed, an overrun medical system has worse mortality, then maybe the normal IFR really is only 0.5-1%. But if your meta-analysis directly measures that, it is not well-done.
The intro paragraph seems to be talking about IFR ("around 2% of people who got COVID-19 would die") and suggesting that "we have enough data to check", i.e. that you're estimating IFR and have good data on it.
From your comment "But if you're trying to figure out what rough-and-ready multiplier to apply to the daily numbers reported in your area, then to use IFR estimates, you need to remember that reported cases are not the same as actual infections, and adjust accordingly." I still do not understand how you can translate CFR into the IFR that you really need. How do "adjust accordingly"?