Coronavirus: California case growth

post by VipulNaik · 2020-03-29T16:14:19.459Z · LW · GW · 11 comments

Contents

  A simple model from true currently-or-eventually-symptomatic cases to confirmed cases to deaths or recoveries
    The model
    Time lags in the model (1 -> 2 -> 3 -> 4)
  Looking at the California data
    Description of the data
    Extrapolating the number and timeline of confirmed positive cases for people already tested
    Thinking about the transitions till testing (1 -> 2 -> 3)
    Is the data good enough to know whether level 2 is sufficient, or whether we need level 3?
  Answers and lessons
    Answers
    Lessons
None
11 comments

In this post, I try to understand the case growth rate for coronavirus cases in California, and try to address questions such as:

Skip to answers and lessons for my (incomplete and tentative) answers.

NOTE: My original post was based on data till 2020-03-27 (row 17 in the spreadsheet). On 2020-04-01 (April 1, 2020) I made edits to this post of two kinds:

A simple model from true currently-or-eventually-symptomatic cases to confirmed cases to deaths or recoveries

The model

For simplicity, I will use the shorthand "true currently-or-eventually-symptomatic cases" only for cases where a person is already infected and will eventually become symptomatic (so this will include both currently symptomatic cases and cases that are presymptomatic, i.e., will become symptomatic later). I expect that most asymptomatic cases (i.e., cases that never become symptomatic) won't get diagnosed, and therefore won't count in the number of confirmed cases either, so this seems a reasonable approximation for the model I will present below. However, if incorrect, this could cause estimates to be off by a factor of two or more, depending on the fraction of cases that are asymptomatic.

The simplistic model identifies the following flow:

  1. Get infected
  2. Start showing symptoms
  3. Get a test
  4. Get test results
  5. Recover or die

Technically, 5 can happen before 3 or 4; the logical dependencies are 1 -> 2 -> 5 and 1 -> 2 -> 3 -> 4. It's also possible (and probably more likely) that 5 happens after 3 but before 4.

To keep this post focused, I will not discuss 5 here, though it's obviously very important.

Time lags in the model (1 -> 2 -> 3 -> 4)

The total time lag from 1 to 4 shows up as the lag between any trend change in the number of true currently-or-eventually-symptomatic cases, and the corresponding trend change in the number of confirmed cases. The more accurately we can estimate and measure this total time lag, the more accurately we can relate the timing of social distancing measures and the timing of case growth flatlining. Herei s what I know:

Using median estimates for each suggests that there is a lag of 3 weeks between trend changes in true currently-or-eventually-symptomatic cases and trend changes in confirmed cases. If this 3 weeks were precise, then the trend in confirmed cases will be a 3-week time translation of the trend in true cases. In practice, however, because each transition has a variable time range, varying across individuals, the true time range is more like 2 to 6 weeks. And rather than a crisp time translation, we see a fuzzy smear -- even if true currently-or-eventually-symptomatic cases flatline immediately after the escalation from level 2 to level 3 (flexible lockdown), the confirmed case count will show no such sharp trend change, instead showing a leveling off over time.

Looking at the California data

Description of the data

Original version written 2020-03-29, possibly edited for clarity but with no substantive model changes.

The California Department of Public Health publishes daily releases on coronavirus case counts as of the previous date. The reports have always included data on the number of confirmed positive cases and the number of deaths. Starting with the release for March 18 (published March 19), the release includes data on the total number of tests and the total number of test results returned.

I put the data together in a spreadsheet where I added columns for the daily increments to each value, as well as some percentages and comparisons of interest. ETA 2020-04-01: I have been updating the spreadsheet daily since writing this post; please see up to row 17 for 2020-03-27 in the spreadsheet to understand the part of it I had in front of me when writing the post. A few notes:

Extrapolating the number and timeline of confirmed positive cases for people already tested

Original version written 2020-03-29, possibly edited for clarity later but with no substantive model changes.

Let's go back to our simple model:

  1. Get infected
  2. Start showing symptoms
  3. Get a test
  4. Get test results
  5. Recover or die

It is quite hard to measure 1 and 2 from the data we have, but we can shed light on 3 and 4 based on the data collected here.

First, as noted in the previous section, the data seems consistent with a 3 -> 4 lag of 5 days or a little more. Specifically, the number of test results on a given day is around 75% to 90% of the number of tests about five days before that. This is consistent with test results taking five days, but some results getting delayed. See column M.

However, as the number of tests has increased quite a bit recently , the lag might increase a lot in the next few days if processing capacity has not kept pace.

Second, we see that right now, the majority of tests don't yet have results (i.e., there is a lot in the 3 -> 4 transition). Therefore, even assuming that there are no more true currently-or-eventually-symptomatic cases coming through 1 -> 2 -> 3 any more, there's still a lot in 3 -> 4 and much of it may be confirmed positive.

Third, at least so far, the cumulative confirmed positive rate (confirmed positive cases as a percentage of test results; see column L) has been going up, albeit slowly. The incremental confirmed positive rate (incremental confirmed positive cases as a percentage of incremental test results; see column K) is more noisy, but is also generally higher in recent days than it was in the beginning. The increase in confirmed positive rate could be because (a) the selection of who takes the test is getting more precise, as people better understand the right symptoms, flu test screening is instituted, and test criteria are improved, or (b) the false negative rate of tests is reduced as tests become more accurate.

With all these, we can make the following loose predictions:

Based on these considerations, I estimate that, just from the people who have gotten tested so far, we should expect a total of 10,000 to 40,000 cases in California. This is inclusive of the already-diagnosed 4,643 cases. I also expect that, if testing capacity keeps pace with the number of tests done, we will hit this number (somewhere between 10,0000 and 40,000) by around Friday, April 3, along with the number of test results getting to equal or exceed the current total number of tests (~89,000).

Further, I expect that (again assuming that test processing capacity roughly keeps pace) we will see another sharp increase in the incremental confirmed positive case count in the transition from March 28 to March 29 or March 29 to March 30. This will lag by about 5 days the sharp increase from March 23 to March 24 in the total number of tests. More specifically, I expect that the incremental number of confirmed positive cases will go up from its current daily value of ~800 to a few thousand.

Addendum 2020-04-01: Based on data from a few more days of tests (up to row 21 for 2020-03-31 in the spreadsheet), here are my updated thoughts:

Thinking about the transitions till testing (1 -> 2 -> 3)

Original version written 2020-03-29, possibly edited for clarity later but with no substantive model changes.

The data here doesn't give a clear idea of how the transitions from 1 to 2, or from 2 to 3 are proceeding. Nonetheless, it may offer some clues. So first, let's backtrack and think: let's say California going to level 2 or level 3 did in fact effectively stop coronavirus in its tracks. What should we see?

First, keep in mind that there's a time lag 1 -> 2 and a time lag 2 -> 3. When describing the model, we estimated these time lags as 1 week each, so that's a total of 2 weeks. This means that, about 2 weeks after coronavirus is stopped in its tracks, we should see a corresponding change in the trend of the number of true currently-or-eventually-symptomatic cases that are getting tests.

One complication is that, because there is huge variation between people and between regions in the 1 -> 2 time lag and in the 2 -> 3 time lag, we won't see a sharp trend change after 2 weeks. Rather, we'll see the trend change happening a little more gradually.

Another complication: even if the rate at which true currently-or-eventually-symptomatic cases are getting to the testing stage drops, the number of other cases (e.g., people with a cold, flu, or allergy) that's getting the test may increase. In that case, we may not see a decrease in the number of tests being done. So, more accurately, we should see at least one of these:

Unfortunately, we aren't seeing the second yet. As for the first, the transition data from March 26 to March 27 suggests that yes, we are seeng a drop in the incremental number of tests (the increment went down from 10,600 to 1,200). But that's just one day of data. If we see a similar drop persist, that might mean that we are finally seeing the lagged effects of escalating to level 2 or level 3. A week after that we should see a drop in the growth rate of confirmed positive cases.

Addendum 2020-04-01: In the above para, I noted a sharp drop in the incremental number of tests a day. The reduced number has been sustained over the days since then, but it's hard to get a clear idea because CDPH is also making adjustments to address double-counting of tests. Nonetheless, tentative evidence is consistent with (but doesn't strongly support) the idea that the growth of true eventually-asymptomatic cases slowed down a few weeks ago.

Is the data good enough to know whether level 2 is sufficient, or whether we need level 3?

My rough estimate is that California achieved level 2 starting around March 11 to March 13, and escalated to level 3 around March 17 to March 19. The gap is about one week. This is a really small gap, and is dwarfed by the range of variation in the time lag. If case counts level off in the next one or two weeks, we won't have good enough data to say whether level 2 was sufficient, or the escalation to level 3 was necessary.

Of course, while aggregate data may not say much, it is still possible that more detailed analysis of individual cases will answer the question. Specifically, we would need to identify the number of individual cases where we expect that they got the infection in the time period when California was level 2. However, because of the long period between getting exposed and showing symptoms, we may have a large number of cases where we are pretty uncertain.

Answers and lessons

Answers

I summarize the predictions from this post here.

Lessons

11 comments

Comments sorted by top scores.

comment by ChristianKl · 2020-03-30T08:37:27.855Z · LW(p) · GW(p)
For simplicity, I will use the term "true cases"

I don't think this is a time to make up new LW terminology without good reason. It would be worthwhile to look up the established term from the literature before making up terms like this.

Replies from: VipulNaik, Lukas_Gloor, leggi
comment by VipulNaik · 2020-04-02T01:57:28.549Z · LW(p) · GW(p)

Thank you for the feedback. I agree with Lukas Gloor's reply below that the choice of term is confusing as it differs from what people may intuitively think "true cases" means. I also agree with his remark that setting terminology that is consistent with reality isn't bad in and of itself.

I have therefore changed "true cases" to "true currently-or-eventually-symptomatic cases". I think that provides the level of precision needed for our purposes. I haven't found a better term after some searching (though not a lot); however, I'm happy to change to a more concise and medically accepted term if I get to learn of one.

comment by Lukas_Gloor · 2020-03-31T02:29:02.412Z · LW(p) · GW(p)
I don't think this is a time to make up new LW terminology without good reason. It would be worthwhile to look up the established term from the literature before making up terms like this.

Downvoted for tone and and the effect I tentatively think this might have on people's motivation to go through the trouble of writing up their interesting ideas.

(If you want LW to become increasingly more similar to a forum for academic discussions, then sure, might be good to give feedback this way. But I don't see why that should be the primary aim.)

comment by leggi · 2020-03-31T14:51:14.776Z · LW(p) · GW(p)

Strongly up-voted because I believe the tone of a comment shouldn't be as an important consideration as the point being made.

Interesting ideas are good, feedback for the further development should also be considered good.

I still think rationality means thinking rationally.

(And I've a couple of big doses of unexplained negative karma on the posts I've created and would have much preferred some comment/feedback whatever the tone it took and Christian was one of the few that provided some.)

Replies from: Lukas_Gloor
comment by Lukas_Gloor · 2020-03-31T20:21:38.055Z · LW(p) · GW(p)
the tone of a comment shouldn't be as an important consideration as the point being made.

Tentatively agree, but in this case the point was about a mostly aesthetic (though common) preference for established terminology, which has nothing to do with anything of substance. It's fair to point out that not everyone cares equally little about written appearances, but it seems uncalled for to frame it in a way as though the author was violating a norm. (If people now want strict academic norms for a community blog that initially was started by Eliezer Yudkowsky of all people, that's another discussion.)

I still think rationality means thinking rationally.

One feature of that is to notice instances when usually sound heuristics are drifting apart from the actual goal. Some people can't help but feel increasingly more averse to making posts on here if they frequently encounter feedback that makes them feel as though they did something wrong for sharing their thoughts in a suboptimal fashion. Maybe you're not high on neuroticism, maybe ChristianKI isn't high on it, and maybe Vipul isn't either. But I wouldn't be surprised if people high on neuroticism are overrepresented among rationalists – maybe just not among the ones who frequently post here (and that's my point). So just because some people wouldn't get discouraged by slightly pedantic criticism worded in a judgmental fashion doesn't mean it's not discouraging for anyone. And it doesn't help that you're implicitly suggesting that people are being less rational if criticism affects them more. If some portion of the population is afraid of spiders, you don't throw spiders at them and say "being rational is about not being affected by negative emotions." Okay, bad analogy: Criticism usually correlates with truth seeking; throwing spiders does not. However, I think many of the people who are unusually discouraged by judgmentally-worded criticism are discouraged precisely because they take criticism in general unusually seriously. That's often a virtue. I think LW culture has drifted toward an equilibrium where some traits that usually correlate with rationality are rewarded too much, and other qualities, which can often be virtuous too (in the right person/combination) are written off as attempts to undermine truth seeking. I think that's a an example of a common failure mode for communities, where signalling dynamics combine with selection effects created by the signalling until what's left is a culture that is unhealthily extreme on some dimensions, but few in that culture are aware to notice.

(And I've a couple of big doses of unexplained negative karma on the posts I've created and would have much preferred some comment/feedback whatever the tone it took and Christian was one of the few that provided some.)

It's good when people explain why they downvoted something, and I think harsh feedback can be really valuable. I also realize that for some people it's difficult to word their feedback nicely (this applies to me too if it concerns a dimension I strongly care about). Usually I agree with your sentiment that it's better to get the criticism in whatever form, if the alternative is not hearing it at all. But that stops to apply if the points are sufficiently minor and the tone sufficiently discouraging. (And continuing to try to give feedback well continues to be important even if we – reluctantly rather than triumphantly – have to agree that lapses are usually to be excused for the greater good of rationality.

Replies from: ChristianKl
comment by ChristianKl · 2020-04-01T11:56:06.977Z · LW(p) · GW(p)
Tentatively agree, but in this case the point was about a mostly aesthetic (though common) preference for established terminology, which has nothing to do with anything of substance.

The idea that good ontology is not about anything of substance is one with whom I have strong disagreement. I remember the time when Trump criticized the WHO's case fatality rate numbers as wrong because they weren't the infection fatality rate. You have a media that's not smart enough to tell the difference and repeat sanely on it to resolve such a conflict by saying "well Trump confused CFR with IFR". This unskilled way of dealing with ontology likely resulted in people thinking the WHO was less informed then warrented and Trump was more and as a result people died.

Getting ontology right is key to thinking as a society in a good way about this crisis. I think there are cases where introducing new concepts is fine but this isn't one of them.

It seems like you downvoted because you think I used a serious tone when the point I wanted to make was minor. I think you made a mistake and assessed the situation wrongly.

Furthermore, Vipul is a person who has payed research assistants (or at least had in the past) and who has been through bigger internet conflicts. I think he's a person for whom it's justified to have higher quality standards then for a random newbie.

Replies from: Lukas_Gloor
comment by Lukas_Gloor · 2020-04-01T13:31:31.963Z · LW(p) · GW(p)
It seems like you downvoted because you think I used a serious tone when the point I wanted to make was minor. I think you made a mistake and assessed the situation wrongly.

Yes, this is what happened. I didn't read closely enough and I thought what Vipul decided to call "true cases" was simply the total number of infections. But he wanted to specifically refer to only the infections that were going to become symptomatic at some point. I agree that this is making a distinction that doesn't carve reality at its joints. On top of that the label seems to have misleading connotations (evidenced by me having misunderstood what he meant:)). I agree that this can be risky in this context especially.

I'm reversing the downvote! I don't see though how outsiders could have immediately inferred from your comment that you object to how Vipul drew categories instead of merely his use of non-standard terminology. I think it's innocuous to use non-standard terminology if one is not the WHO, and if the choice of terminology is intuitive and carves reality at its joints.

And about the WHO example, I totally agree. I criticized the WHO for the same reason here: https://www.metaculus.com/questions/3755/what-will-be-the-ratio-of-fatalities-to-total-estimated-infections-for-covid-19-by-the-end-of-2020/#comment-23097

Replies from: VipulNaik
comment by VipulNaik · 2020-04-02T01:59:31.824Z · LW(p) · GW(p)

Thank you for the feedback (and also for discussing this at length which gave me better understanding of the nuances). I modified to a more clumsy but hopefully a more what-you-see-is-what-I-mean term: https://www.lesswrong.com/posts/mRkWTpH9mb8Wdpcn5/coronavirus-california-case-growth?commentId=GHSEwZwR2TSkyzpdm [LW(p) · GW(p)]

comment by cata · 2020-03-30T00:14:22.633Z · LW(p) · GW(p)
The data here doesn't give a clear idea of how the transitions from 1 to 2, or from 2 to 3 are proceeding. Nonetheless, it may offer some clues. So first, let's backtrack and think: let's say California going to level 2 or level 3 did in fact effectively stop coronavirus in its tracks. What should we see?

Ideally, we should see the number of people with coronavirus getting the test drop a lot. However, that doesn't necessarily mean that the total number of people getting the test drops, because many people who don't have the disease may also start getting tested, causing the total number of people getting tested to increase. So, more accurately, we should see one of these:

- A drop in the incremental number of tests each day.
- A drop in the confirmed positive rate on tests (but this metric is available at a further lag of 5 to 7 days).

I think it could take longer before either of these reflects the change in true cases. Here's an argument. Suppose:

- Current testing policy is declining to test many symptomatic people due to lack of capacity. I believe this is true (high-risk people, essential workers, and known contacts to existing cases are being prioritized.)
- As test availability improves, testing policy will change to test broader categories of symptomatic people, up to the testing capacity.
- The number of true cases is substantially higher than the number of other ailments that basically look the same. As a result, P(positive test | symptomatic) remains high and doesn't change much even if you halve the number of true cases. Probably variance in testing policy and test accuracy will drown out the change.

If this is right, as long as the number of true cases remains above the threshold of testing capacity, we get roughly the same number output on the metrics you mentioned, no matter whether it's 10 times above capacity or 1000 times above capacity. So if we're way above capacity right now, we won't see a decrease in true cases show up in those metrics for a while.

Replies from: VipulNaik
comment by VipulNaik · 2020-03-30T02:17:17.013Z · LW(p) · GW(p)

What I wrote there was assuming that the number of new true cases drops to a fairly low level. Whether that happens now or a week or two or three later is unclear; if the 2 -> 3 backlog is growing. then resolving that backlog will add more delay.

I posited us already being at this point as the "optimistic" scenario.

I'll reword the post to clarify this.

Replies from: VipulNaik
comment by VipulNaik · 2020-04-02T02:01:03.130Z · LW(p) · GW(p)

I did some rewording of the post that made it a little more wordy, but fingers crossed that that part has now become less confusing.