Posts

How to give effectively to US Dems 2024-09-24T14:38:29.678Z
GDP per capita in 2050 2024-05-06T15:14:30.934Z
Should we break up Google DeepMind? 2024-04-22T09:16:31.201Z
Let's Fund: Impact of our $1M crowdfunded grant to the Center for Clean Energy Innovation 2024-04-04T16:28:32.371Z
The Bletchley Declaration on AI Safety 2023-11-01T11:44:42.587Z
M&A in AI 2023-10-31T12:20:18.362Z
The AI Boom Mainly Benefits Big Firms, but long-term, markets will concentrate 2023-10-29T08:38:23.327Z
Overview of how AI might exacerbate long-running catastrophic risks 2023-08-07T11:53:29.171Z
When training AI, we should escalate the frequency of capability tests 2023-08-04T16:07:33.776Z
Hauke Hillebrandt's Shortform 2023-07-07T12:39:06.698Z
UK PM: $125M for AI safety 2023-06-12T12:33:21.372Z
The case for C19 being widespread 2020-03-28T00:07:27.878Z
Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? 2020-03-20T11:37:34.488Z

Comments

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Habryka's Shortform Feed · 2024-12-10T11:05:09.634Z · LW · GW

This lag effect might amplify a lot more when big budget movies about SBF/FTX come out.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on GDP per capita in 2050 · 2024-05-07T07:27:00.273Z · LW · GW

Yes, good catch, this is based on research from the World Value Survey - I've added a citation.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on GDP per capita in 2050 · 2024-05-06T17:44:11.062Z · LW · GW

I checked. It's 0.67.

 

This seems to come from European countries.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on GDP per capita in 2050 · 2024-05-06T16:44:54.123Z · LW · GW

Yeah I actually do cite that piece in the appendix 'GDP as a proxy for welfare' where I list more literature like this. So yeah, it's not a perfect measure but it's the one we have and 'all models are wrong but some are useful' and GDP is quite a powerful predictor of all kinds of outcomes: 

In a 2016 paper, Jones and Klenow used measures of consumption, leisure, inequality, and mortality, to create a consumption-equivalent welfare measure that allows comparisons across time for a given country, as well as across countries.[6] 

This measure of human welfare suggests that the true level of welfare of some countries differs markedly from the level that might be suggested by their GDP per capita. For example, France’s GDP per capita is around 60% of US GDP per capita.[7] However, France has lower inequality, lower mortality, and more leisure time than the US. Thus, on the Jones and Klenow measure of welfare, France’s welfare per person is 92% of US welfare per person.[8]

Although GDP per capita is distinct from this expanded welfare metric, the correlation between GDP per capita and this expanded welfare metric is very strong at 0.96, though there is substantial variation across countries, and welfare is more dispersed (standard deviation of 1.51 in logs) than is income (standard deviation of 1.27 in logs).[9]

 

GDP per capita is also very strongly correlated with the Human Development Index, another expanded welfare metric.[10] If measures such as these are accurate, this shows that income per head explains most of the observed cross-national variation in welfare. It is a distinct question whether economic growth explains most of the observed variation across individuals in welfare. It is, however, clear that it explains a substantial fraction of the variation across individuals.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Matt Goldenberg's Short Form Feed · 2024-04-24T14:38:09.910Z · LW · GW

You can compute where energy is cheap, then send the results (e.g. weights, inference) on where ever needed.

But Amazon just bought rented half a nuclear power plant (1GW) near Pennsylvania, so maybe it doesn't make sense now.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on AI #59: Model Updates · 2024-04-11T15:04:26.405Z · LW · GW

Gemini 1.5 Pro summary

This document explores recent developments in the AI landscape, focusing on language models and their potential impact on society. It delves into various aspects like capabilities, ethical considerations, and regulatory challenges.

Key Highlights:

Advancements in Language Models:

  • Claude 3 by Anthropic now utilizes tools, including other models, showcasing increased capability and potential risks like jailbreaking and influencing other AI systems.
  • Gemini 1.5 by Google is available to everyone with promises of future integrations, prompting discussions on its system prompt limitations and the need for more user control over responses.
  • GPT-4-Turbo receives substantial upgrades, especially in coding and reasoning, but concerns about transparency and potential performance variations remain.
  • OpenAI's potential development of GPT-5 sparks debates on the reasons for its delay, emphasizing the importance of rigorous safety testing before release.

Ethical and Societal Concerns:

  • The increasing persuasiveness of language models raises questions about manipulation and misinformation.
  • The use of copyrighted material in training data raises legal and ethical concerns, with potential solutions like mandatory licensing regimes being explored.
  • The rise of AI-generated deepfakes poses challenges to information authenticity and necessitates solutions like watermarking and detection software.
  • Job application processes might be disrupted by AI, leading to potential solutions like applicant review systems and matching algorithms.
  • The impact of AI on social media usage remains complex, with contrasting views on whether AI digests will decrease or increase time spent on these platforms.
  • Regulatory Landscape:
  • Experts propose regulations for AI systems that cannot be safely tested, emphasizing the need for proactive measures to mitigate potential risks.
  • Transparency in AI development, including timelines and safety protocols, is crucial for informed policy decisions.
  • The introduction of the AI Copyright Disclosure Act aims to address copyright infringement concerns and ensure transparency in data usage.
  • Canada's investment in AI infrastructure and safety initiatives highlights the growing focus on responsible AI development and competitiveness.

Additional Points:

  • The document explores the concept of "AI succession" and the ethical implications of potentially superintelligent AI replacing humans.
  • It emphasizes the importance of accurate and nuanced communication in discussions about AI, avoiding mischaracterizations and harmful rhetoric.
  • The author encourages active participation in shaping AI policy and emphasizes the need for diverse perspectives, including those of AI skeptics.
  • Overall, the document provides a comprehensive overview of the current AI landscape, highlighting both the exciting advancements and the critical challenges that lie ahead. It emphasizes the need for responsible development, ethical considerations, and proactive regulatory measures to ensure a safe and beneficial future with AI.


 

Comment by Hauke Hillebrandt (hauke-hillebrandt) on AI #59: Model Updates · 2024-04-11T15:03:41.158Z · LW · GW

Claude Opus AI summary:

The attached document is an AI-related newsletter or blog post by the author Zvi, covering a wide range of topics related to recent developments and discussions in the field of artificial intelligence. The post is divided into several sections, each focusing on a specific aspect of AI.

The main topics covered in the document include:

  1. Recent updates and improvements to AI models like Claude, GPT-4, and Gemini, as well as the introduction of new models like TimeGPT.
  2. The potential utility and limitations of language models in various domains, such as mental health care, decision-making, and content creation.
  3. The increasing capabilities of AI models in persuasive writing and the implications of these advancements.
  4. The release of the Gemini system prompt and its potential impact on AI development and usage.
  5. The growing concern about deepfakes and the "botpocalypse," as well as potential solutions to combat these issues.
  6. The ongoing debate surrounding copyright and AI, with a focus on the use of copyrighted material for training AI models.
  7. The ability of AI models to engage in algorithmic collusion when faced with existing oligopolies or auction scenarios.
  8. The introduction of new AI-related legislation, such as the AI Copyright Disclosure Act, and the need for informed policymaking in the AI domain.
  9. The importance of safety testing for advanced AI systems and the potential risks associated with developing AI that cannot be adequately tested for safety.
  10. The ongoing debate between AI alignment researchers and AI accelerationists, and the potential for accelerationists to change their stance as AI capabilities advance.
  11. A challenge issued by Victor Taelin to develop an AI prompt capable of solving a specific problem, which was successfully completed within a day, demonstrating the rapid progress and potential of AI.
  12. The controversial views of Richard Sutton on the inevitability of AI succession and the potential for human extinction, as well as the debate surrounding his statements.
  13. The growing public concern about AI posing an existential risk to humanity and the need for informed discussion and action on this topic.

Throughout the document, the author provides commentary, analysis, and personal opinions on the various topics discussed, offering insights into the current state of AI development and its potential future implications. The post also includes various tweets, quotes, and references to other sources to support the points being made and to provide additional context to the discussion.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Reverse Regulatory Capture · 2024-04-11T14:54:36.502Z · LW · GW

cf

The Bootleggers and Baptists effect describes cases where an industry (e.g. bootleggers) agrees with prosocial actors like regulators (e.g. baptists) to regulate more (here ban alcohol during the prohibition) to maximize profits and deter entry. This seems to be happening in AI where the industry lobbies for stricter regulation. Yet, in the EU, OpenAI lobbied to water down EU AI regulation to not classify GPT as 'high risk' to exempt it from stringent legal requirements.[1] In the US, the FTC recently said that Big Tech intimidates competition regulators.[2] Capture can also manifest by passively accepting industry practices, which is problematic in high-risk scenarios where thorough regulation is key. After all, AI expertise gathers in particular geographic communities. We must avoid cultural capture when social preferences interfere with policy, since regulators interact with workers from regulated firms. Although less of a concern in a rule-based system, a standard-based system would enable more informal influence via considerable regulator discretion. We must reduce these risks, e.g. by appointing independent regulators and requiring public disclosure of regulatory decisions.”

“Big Tech also takes greater legal risks by aggressively and (illegally) collecting data with negative externalities for users and third parties (similarly, Big Tech often violates IP [3] while lobbying against laws to stop patent trolling, claiming they harm real patents, but actually, this makes new patents from startups worth less and more costly to enforce.)[4] “

  1. ^
  2. ^
  3. ^
  4. ^
Comment by Hauke Hillebrandt (hauke-hillebrandt) on Problems with Robin Hanson's Quillette Article On AI · 2023-08-07T10:27:55.611Z · LW · GW

Hanson Strawmans the AI-Ruin Argument

 

I don't agree with Hanson generally, but I think there's something there that rationalist AI risk public outreach has overemphasized first principles thinking, theory, and logical possibilities (e.g. evolution, gradient decent, human-chimp analogy, ) over concrete more tangible empirical findings (e.g. deception emerging in small models, specification gaming, LLMs helping to create WMDs, etc.).

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Hauke Hillebrandt's Shortform · 2023-07-07T12:39:06.781Z · LW · GW

AI labs should escalate the frequency of tests for how capable their model is as they increase compute during training

Comments on the doc welcome.

Inspired by ideas from Lucius Bushnaq, David Manheim, Gavin Leech, but any errors are mine.

— 

AI experts almost unanimously agree that AGI labs should pause the development process if sufficiently dangerous capabilities are detected. Compute, algorithms, and data, form the AI triad—the main inputs to produce better AI. AI models work by using compute to run algorithms that learn from data. AI progresses due to more compute, which doubles every 6 months; more data, which doubles every 15 months; and better algorithms, which half the need for compute every 9 months and data every 2 years.

And so, better AI algorithms and software are key to AI progress (they also increase the effective compute of all chips, whereas improving chip design only improves new chips.)

While so far, training the AI models like GPT-4 only costs ~$100M, most of the cost comes from running them as evidenced by OpenAI charging their millions of users $20/month with a cap on usage, which costs ~1 cent / 100 words.  

And so, AI firms could train models with much more compute now and might develop dangerous capabilities.

We can more precisely measure and predict in advance how much compute we use to train a model in FLOPs. Compute is also more invariant vis-a-vis how much it will improve AI than are algorithms or data. We might be more surprised by how much effective compute we get from better / more data or better algorithms, software, RLHF, fine-tuning, or functionality (cf DeepLearning, transformers, etc.). AI firms increasingly guard their IP and by 2024, we will run out of public high-quality text data to improve AI. And so, AI firms like DeepMind will be at the frontier of developing the most capable AI. 

To avoid discontinuous jumps in AI capabilities, they must never train AI with better algorithms, software, functionality, or data with a similar amount of compute than what we used previously; rather, they should use much less compute first, pause the training, and compare how much better the model got in terms of loss and capabilities compared to the previous frontier model. 

Say we train a model using better data using much less compute than we used for the last training run. If the model is surprisingly better during a pause and evaluation at an earlier stage than the previous frontier model trained with a worse algorithm at an earlier stage, it means there will be discontinuous jumps in capabilities ahead, and we must stop the training. A software to this should be freely available to warn anyone training AI, as well as implemented server-side cryptographically so that researchers don't have to worry about their IP, and policymakers should force everyone to implement it.

There are two kinds of performance/capabilities metrics:

  1. Upstream info-theoretic: Perplexity / cross entropy / bits-per-character. Cheap. 
  2. Downstream noisy measures of actual capabilities: like MMLU, ARC, SuperGLUE, Big Bench. Costly.

AGI labs might already measure upstream capabilities as it is cheap to measure. But so far, no one is running downstream capability tests mid-training run, and we should subsidize and enforce such tests. Researchers should formalize and algorithmitize these tests and show how reliably they can be proxied with upstream measures. They should also develop a bootstrapping protocol analogous to ALBA, which has the current frontier LLM evaluate the downstream capabilities of a new model during training. 

Of course, if you look at deep double descent ('Where Bigger Models and More Data Hurt'), inverse scaling laws, etc., capabilities emerge far later in the training process. Looking at graphs of performance / loss over the training period, one might not know until halfway through (the eventually decided cutoff for training, which might itself be decided during the process,) that it's doing much better than previous approaches- and it could look worse early on. Cross-entropy loss improves even for small models, while downstream metrics remain poor. This suggests that downstream metrics can mask improvements in log-likelihood. This analysis doesn't explain why downstream metrics emerge or how to predict when they will occur. More research is needed to understand how scale unlocks emergent abilities and to predict. Moreover, some argue that emergent behavior is independent of how granular a downstreams evaluation metrics is (e.g. if it uses an exact string match instead of another evaluation metric that awards partial credit), these results were only tested every order of magnitude FLOPs.

And so, during training, as we increase the compute used, we must escalate the frequency of automated checks as the model approaches the performance of the previous frontier models (e.g. exponentially shorten the testing intervals after 10^22 FLOPs). We must automatically stop the training well before the model is predicted to reach the capabilities of the previous frontier model, so that we do not far surpass it. Alternatively, one could autostop training when it seems on track to reach the level of ability / accuracy of the previous models, to evaluate what the trajectory at that point looks like.

Figure from: 'Adjacent plots for error rate and cross-entropy loss on three emergent generative tasks in BIG-Bench for LaMDA. We show error rate for both greedy decoding (T = 0) as well as random sampling (T = 1). Error rate is (1- exact match score) for modified arithmetic and word unscramble, and (1- BLEU score) for IPA transliterate.'

Figure from: 'Adjacent plots for error rate, cross-entropy loss, and log probabilities of correct and incorrect responses on three classification tasks on BIG-Bench that we consider to demonstrate emergent abilities. Logical arguments only has 32 samples, which may contribute to noise. Error rate is (1- accuracy).'

Comment by Hauke Hillebrandt (hauke-hillebrandt) on UK PM: $125M for AI safety · 2023-06-13T16:40:04.240Z · LW · GW

ARC's GPT-4 evaluation is cited in the FT article, in case that was ambiguous.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on UK PM: $125M for AI safety · 2023-06-13T11:48:15.708Z · LW · GW

Agreed, the initial announcement read like AI safety washing and more political action is needed, hence the call to action to improve this.

But read the taskforce leader’s op-ed

  1. He signed the pause AI petition.
  2. He cites ARC’s GPT-4 evaluation and Lesswrong in his AI report which has a large section on safety.
  3. “[Anthropic] has invested substantially in alignment, with 42 per cent of its team working on that area in 2021. But ultimately it is locked in the same race. For that reason, I would support significant regulation by governments and a practical plan to transform these companies into a Cern-like organisation. We are not powerless to slow down this race. If you work in government, hold hearings and ask AI leaders, under oath, about their timelines for developing God-like AGI. Ask for a complete record of the security issues they have discovered when testing current models. Ask for evidence that they understand how these systems work and their confidence in achieving alignment. Invite independent experts to the hearings to cross-examine these labs. [...] Until now, humans have remained a necessary part of the learning process that characterises progress in AI. At some point, someone will figure out how to cut us out of the loop, creating a God-like AI capable of infinite self-improvement. By then, it may be too late.”

Also the PM just tweeted about AI safety

Generally, this development seems more robustly good and the path to a big policy win for AI safety seems clearer here than past efforts trying to control US AGI firms optimizing for profit. Timing also seems much better as things looks way more ‘on’ now.  And again, even if the EV sign of the taskforce flips, then $125M is .5% of the $21B invested in AGI firms this year.

Are you saying that, as a rule, ~EAs should stay clear of policy for fear of tacit endorsement, which has caused harm and made damage control much harder and we suffer from cluelessness/clumsiness? Yes, ~EA involvement has in the past sometimes been bad, accelerated AI, and people got involved to get power for later leverage or damage control (cf. OpenAI), with uncertain outcomes (though not sure it’s all robustly bad - e.g. some say that RLHF was pretty overdetermined). 

I agree though that ~EA policy pushing for mild accelerationism vs. harmful actors is less robust (cf. the CHIPs Act, which I heard a wonk call the most aggressive US foreign policy in 20 years), so would love to hear your more fleshed out push back on this - I remember reading somewhere recently that you’ve also had a major rethink recently vis-a-vis unintended consequences from EA work?

Comment by Hauke Hillebrandt (hauke-hillebrandt) on UK PM: $125M for AI safety · 2023-06-12T18:29:33.345Z · LW · GW

Ian Hogarth is leading the task force who's on record saying that AGI could lead to “obsolescence or destruction of the human race” if there’s no regulation on the technology’s progress. 

Matt Clifford is also advising the task force - on record having said the same thing and knows a lot about AI safety. He had Jess Whittlestone & Jack Clark on his podcast. 

If mainstream AI safety is useful and doesn't increase capabilities, then the taskforce and the $125M seem valuable.

If it improves capabilities, then it's a drop in the bucket in terms of overall investment going into AI.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Bing Chat is blatantly, aggressively misaligned · 2023-02-19T19:09:03.874Z · LW · GW

a large part of those 'leaks' are fake

 

Can you give concrete examples?

Comment by Hauke Hillebrandt (hauke-hillebrandt) on April Coronavirus Open Thread · 2020-04-17T10:12:34.838Z · LW · GW

[Years of life lost due to C19]

A recent meta-analysis looks at C-19-related mortality by age groups in Europe and finds the following age distribution:

< 40: 0.1%

40-69: 12.8%

≥ 70: 84.8%

In this spreadsheet model I combine this data with Metaculus predictions to get at the years of life lost (YLLs) due to C19.

I find C19 might cause 6m - 87m YYLs (highly dependending on # of deaths). For comparison, substance abuse causes 13m, diarrhea causes 85m YLLs.

Countries often spend 1-3x GDP per capita to avert a DALY, and so the world might want to spend $2-8trn to avert C19 YYLs (could also be a rough proxy for the cost of C19).

One of the many simplifying assumptions of this model is that excludes disability caused by C19 - which might be severe.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T18:28:20.113Z · LW · GW

Very good analysis.

I also thought your recent blog was excellent and think you should make it a top level post:

https://entersingularity.wordpress.com/2020/03/23/covid-19-vs-influenza/

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T16:15:37.612Z · LW · GW

Cheers - have taken this point out.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T16:09:54.494Z · LW · GW

Cruise Ship passenger are a non random sample with perhaps higher co-morbidities.

The cruise ships analysed are non-random sample: "at least 25 other cruise ships have confirmed COVID-19 cases"

Being on a cruise ship might increase your risk because of dose response https://twitter.com/robinhanson/status/1242655704663691264

Onboard IFR. as 1.2% (0.38-2.7%) https://www.medrxiv.org/content/10.1101/2020.03.05.20031773v2

Ioannidis: “A whole country is not a ship.”

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T15:46:43.107Z · LW · GW

Thanks Pablo for your comment and helping to clarify this point. I'm sorry if I was being unclear.

I understand what you're saying. However:

  • I realize that the Oxford study did not collect any new empirical data that in itself should cause us to update our views.
  • The authors make the assumption that the IFR is low and the virus is widespread and find that it fits the present data just as well as high IFR and low spread. But it does not mean that the model is merely theoretical: the authors do fit the data on the current epidemic.
  • This is not different from what the Imperial study does: the Imperial authors do not know the true IFR but just assuming a high one and see whether it fits the present data well.
  • But indeed, on a meta-level the Oxford study (not the modelling itself) is evidence in favor of low IFR. When experts believe something to be plausible then this too is evidence of a theory to be more likely to be true and we should update. An infinite number of models can explain any dataset and the authors only find these two plausible.
  • By coming out and suggesting that this is a plausible theory, especially by going to the media, the authors have gotten a lot of flag for this ("Irresponsible" - see twitter etc.). So they have indeed put their reputation on the line. This is despite the fact that the authors are prudent and saying that high IFR is also plausible and also fits the data.
Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T14:04:59.344Z · LW · GW

Cheers- corrected.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T10:45:28.298Z · LW · GW
It looks more like you listed all the evidence you could find for the theory and didn't do anything else.

That was precisely my ambition here - as highlighted in the title ("The case for c19 being widespread"). I did not claim that this was an even-handed take. I wanted to consider the evidence for a theory that only very few smart people believe. I think such an exercise can often be useful.

I don't think this is actually how selection effects work.

The professor acknowledges that there are problems with self-selection, but given that there are very specific symptoms (thousands of people with loss of smell), I don't think that selection effects can describe all the the data. Then he just argues for the Central Limit Theorem.

That the asymptomatic rate isn't all that high, and in at least one population where everybody could get a test, you don't see a big fraction of the population testing positive.

There's no random population wide testing antibody testing as of yet.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T10:30:28.158Z · LW · GW

I do not think that can be used as decisive evidence to falsify wide-spread.

This is a non-random village in Italy, so of course, some villages in Italy will show very high mortality just by chance.

That region of Italy has high smoking rates, very bad air pollution, and the highest age structure outside of Japan.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T09:44:05.659Z · LW · GW
By the end of its odyssey, a total of 712 of them tested positive, about a fifth.

Perhaps other on the ship had already cleared the virus and were asymptomatic. PCR only works for a week. Also there might have been false negatives. I disagree that the age and comorbidity structure can only lead to skewed results by a factor of two or three, because this assumes that there are few asymptomatic infections (I'm arguing here that the age tables are wrong).

In my post, I've argued why the data out of China might be wrong.

Iceland's data might be wrong because it is based on PCR not serology, which means that many people might have already cleared the infection, and it is also not random.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T02:39:31.136Z · LW · GW

That's true and that's what they were criticized for.

They argued that the current data we observe can be also be explained by low IFR and widespread infection. They called for widespread serological testing to see which hypothesis is correct.

If in the next few weeks we see high percentage of people with antibodies then it's true.

In the meantime, I thought it might be interesting to see what other evidence there is for infection being widespread, which would suggest that IFR is low.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T02:35:36.080Z · LW · GW

No. My ambition here was a bit simpler. I have presented a rough qualitative argument here that infection is already widespread and only a toy model. There are some issues with this and I haven't done formal modelling. For instance, this would be what would be called the "crude IFR" I think , but the time lag adjusted IFR (~30 days from infection to death) might increase the death toll.

Currently, also every death in Italy where coronavirus is detected is recorded as a C19 death.

FWIW, if UK death toll will surpass 10,000, then this wouldn't fit very well with this hypothesis here.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T02:32:43.350Z · LW · GW
The point remains: given that some people have such a different theory, it's unclear how many supporting pieces of evidence your should expect to see, and it's important to compare the evidence against the theory to the evidence for it.

Yes, that's what I'm trying to do here. I feel this is a neglected take and on the margin more people should think about whether this theory is true, given the stakes.

Presumably some of these people are hypochondriacs or have the flu? Also, I bet people with symptoms are more likely to use the app.
With all due respect it's not that hard to get data that you yourself find convincing, even if you're a professor.

""Our first analysis showed we're picking up roughly that one in 10 have the classical symptoms," he said. "So of the 650,000, we would expect to see 65,000 cases.

"Although you can have problems of self-selection and bias, when you’ve got big data like this you tend to trust it more. What we're seeing is a lot of mild symptoms, so I think having this data should help people relax a bit more and stop seeing it as an all or nothing Black Death situation.

"Other symptoms are cropping up. Thousands of people are coming forward to say they have loss of taste, and we may start to see clusters of symptoms.""

https://www.telegraph.co.uk/news/2020/03/25/monitoring-app-suggests-65-million-people-uk-may-already-have/

They do meet more different populations of people though. So if a small number of cities have relatively widespread infection, people who visit many cities are unusually likely to get infected.

You'd expect to see people to many severe cases amongst people who travelled for business a lot in January and February.

Not likely. About 1% of Icelanders without symptoms test positive, and all the stats on which tested people are asymptomatic that I've seen (Iceland, Diamond Princess) give about 1/2 asymptomatic at time of testing (presumably many later get sick).

I don't quite understand what you're saying here.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T01:48:52.622Z · LW · GW

I'm not impressed by the comment about this paper here on LW or the twitter link in it.

This paper was written by an international team of highly cited disease modellers who know about the Diamond Princess and have put their reputation on the line to make the case that this the hypothesis of high infections rate and low infection fatality might be true.

I think it is a realistic range that this many people are already infected and are asymptomatic. Above I've tried to summarize and review the relevant evidence that fits with this hypothesis.

But I'm not ruling out the more common theory (that we have maybe only 10x the 500k confirmed cases). I just find it less likely.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T01:38:34.045Z · LW · GW
There were a few dengue in Australia and Florida where it is unusual

Dengue "popping up in unusual places", makes me think that it's more likely that massive Dengue outbreaks in Latin America might have a high proportion of C19.

One person had persistent negative swab, but tested positive through fecal samples...
“Chinese journalists have uncovered other cases of people testing negative six times before a seventh test confirmed they had the disease.”

This is just to lend credence to the paper that shows there had been 2 million infections in China in January.

I find it very unlikely on the face of it that China, or any country for that matter, managed to suppress completely a disease so contagious that it's now on almost every country on earth.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T01:31:44.105Z · LW · GW
This seems pretty hard to evaluate because with a large number of published pre-prints on the outbreak, it's not very surprising that there would be many suggesting higher-than-expected spread.

No, this is different. I'm not just cherry picking the tail-end of a normal distribution of IFRs etc. The Gupta study in particular and some of the other studies suggest a fundamentally different theory of the pandemic.

Presumably some of these people are hypochondriacs or have the flu? Also, I bet people with symptoms are more likely to use the app.

Yes, but similarly there are many asymptomatic people who do not use the app. The King's Professor seems to find this number convincing.

Couldn't this be explained by those populations travelling more, shaking more hands, meeting more people, etc.?

Tom Hanks, Prince Charles and Boris Johnson don't talk meet more people everyday then your typical Uber driver cashier etc. There millions of people working in retail. We don't see them all having it. My theory is that they're tested often and not that "there's a lot of C19 in Westminster"

Iceland has 2 deaths and 97 recoveries. I would say that isn't good evidence for an IFR of under 0.3%.

Crucially depends on the asymptomatic rate, which might very well be very high.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on The case for C19 being widespread · 2020-03-28T01:00:52.970Z · LW · GW

If the Gupta study is true, then a rough approximation (ignoring lag) would be that it's:

IFR = Number of UK deaths (~750) / 36-68% of the UK population (66 million).

So 0.002% to 0.003%.

In Italy, with almost 10k deaths it would be 0.02%-0.04%

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-24T20:21:18.485Z · LW · GW

Another preprint suggesting that half or more of the UK population is already infected:

FT coverage:

https://www.ft.com/content/5ff6469a-6dd8-11ea-89df-41bea055720b

study:

https://www.dropbox.com/s/oxmu2rwsnhi9j9c/Draft-COVID-19-Model%20%2813%29.pdf?dl=0

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-21T16:55:04.805Z · LW · GW

from supplementary materials:

"DISCLAIMER: The following estimates were computed using 2010 US Census data with 2016 population projections and the percentages of clinical cases and mortality events reported in Mainland China by the Chinese Center for Disease Control as of February 11th, 2020. CCDC Weekly / Vol. 2 / No. 8, page 115, Table 1. The following estimates represent a worst-case scenario, which is unlikely to materialize. • Maximum number of symptomatic cases = 34,653,921 • Maximum number of mild cases = 28,035,022 • Maximum number of severe cases = 4,782,241 • Maximum number of critical cases = 1,628,734 • Maximum number of deaths = 3,439,516"

https://drive.google.com/drive/folders/18qaRKnQG1GoXamnzJwkHu2GG9xCe4w8_

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-21T16:45:49.804Z · LW · GW

And yet another preprint estimating the R0 to be 26.5:

Quotes from paper:

"The size of the COVID-19 reproduction number documented in the literature is relatively small. Our estimates indicate that R0= 26.5, in the case that the asymptomatic sub-population is accounted for. In this scenario, the peek of symptomatic infections is reached in 36 days with approximately 9.5% of the entire population showing symptoms, as shown in Figure 3."

I think they estimate about 1 million severe cases in the US alone if left unchecked at the peak.

"It is unlikely that a pathogen that blankets the planet in three months can have a basic reproduction number in the vicinity of 3, as it has been reported in the literature (19–24). SARS-CoV-2 is probably among the most contagious pathogens known. Unlike the SARS-CoV epidemic in 2003 (25), where only symptomatic individuals were capable of transmitting the disease. Asymptomatic carriers of the COVID-19 virus are most likely capable of transmission to the same degree as symptomatic."

"This study shows that the population of individuals with asymptomatic COVID-19 infections are driving the growth of the pandemic. The value of R0 we calculated is nearly one order of magnitude larger than the estimates that have been communicated in the literature up to this point in the development of the pandemic"

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-20T23:48:13.992Z · LW · GW

And another preprint saying there were +700k cases in China on 13th of March:

"Since severe cases, which more likely lead to fatal outcomes, are detected at a higher percentage than mild cases, the reported death rates are likely inflated in most countries. Such under-estimation can be attributed to under-sampling of infection cases and results in systematic death rate estimation biases. The method proposed here utilizes a benchmark country (South Korea) and its reported death rates in combination with population demographics to correct the reported COVID-19 case numbers. By applying a correction, we predict that the number of cases is highly under-reported in most countries. In the case of China, it is estimated that more than 700.000 cases of COVID-19 actually occurred instead of the confirmed 80,932 cases as of 3/13/2020."

also implying a lower CFR than previously thought (perhaps less than 0.5%). 3k deaths in China / 700k actual cases)

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-20T23:23:33.548Z · LW · GW

New editorial about the asymptomatic rate in Nature - the author of the preprint above are featured in this as well. They say asymptomatic and mild case rate might be up to 50% of all infections and that these people are infectious.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-20T15:29:58.879Z · LW · GW

As mentioned in a comment above, one of the (pretty highly credentialed) authors of this preprint has written two papers on the Diamond Princess, and so, excuse the appeal to authority, but any argument against this paper based on Diamond Princess doesn't seem likely to invalidate conclusions of this preprint .

Also this squares seemingly squares more with John Ioannidis take on Corona:

"no countries have reliable data on the prevalence of the virus in a representative random sample of the general population."

And that airborn-ish transmission is highly likely.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Preprint says R0=~5 (!) / infection fatality ratio=~0.1%. Thoughts? · 2020-03-20T15:19:02.104Z · LW · GW

Not sure: the Diamond Princess is mentioned in this preprint and in fact one of the authors of this preprint wrote two papers on the Diamond Princess:

https://scholar.google.com/citations?hl=en&user=OW5PDVgAAAAJ&view_op=list_works&sortby=pubdate

So I think they thought about this,

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Should we build thermometer substitutes? · 2020-03-20T12:00:35.820Z · LW · GW

The first paper that I cite has a very illustrative video and is a seminal paper in this field.

Table 8 in the review paper that you refer to shows a trend of estimation techniques getting better over time. In the latest study from 5 years ago the mean error was down to 6.47.

My broader point is:

  • the error rate might be brought down even further by better methods, video quality, and priors
  • this might so that it a valid proxy for fever
  • This might be very cost-effective on a population level, given the zero marginal cost of software

However, I do agree that this is not trivial.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Should we build thermometer substitutes? · 2020-03-19T21:25:43.795Z · LW · GW
That's false. The accuracy isn't high. I learned from the last conversation I had with EA who had a startup that did this, that the accuracy isn't high enough to be useful medically.

Interesting data point - there are several papers on this that say it's a reliable way to measure heart rate (less than 10bpm; see "Heart rate estimation using facial video"). Perhaps this could be brought down much further by throwing more engineering brains, computation and priors at it.

Where do those ≥38°C come from? From what I read the Chinese are using 37.3°C as a cut of for medical decision making with COVID-19.

I saw this number in some places - for instance:

https://www.who.int/csr/disease/coronavirus_infections/InterimRevisedSurveillanceRecommendations_nCoVinfection_03Dec12.pdf

https://www.nejm.org/doi/full/10.1056/NEJMc2003100

But perhaps your number is better (source: https://www.who.int/csr/disease/coronavirus_infections/InterimRevisedSurveillanceRecommendations_nCoVinfection_03Dec12.pdf ).

I think there might be non-trivial differences due to time of the day and ethnicity as well.

Comment by Hauke Hillebrandt (hauke-hillebrandt) on Should we build thermometer substitutes? · 2020-03-19T15:29:03.827Z · LW · GW

I had this idea below and pitched it to OpenAI - they said ""we looked into this and dont think we can do a great job with it :(" - but perhaps people here might be interested to explore it further.

Idea for zero marginal cost, digital thermometer to help contain coronavirus:

  1. Heart rate can be estimated via (webcam or smartphone) video of someone’s face with high accuracy (even with poor video quality).[1],[2]
  2. This heart rate might then be used to detect fever[3] (perhaps even to estimate core temperature).[4]  priors such as demographic data could be used to aid detection. For instance, mean heart rate over an hour of +80 in young healthy men seems to be a robust predictor of fever.3
  3. Fever (body temperature ≥38°C) is the most typical symptom of C19 - in 88% of confirmed cases.[5] (Though some C19 transmission might be asymptomatic[6] and presymptomatic.[7],[8])
  4. A smartphone or web app (ala donottouchyourface.com) could be a digital fever thermometer. A webcam could continuously monitor people’s temperature and alert them to it if they have a fever (might detect anomalous increases in heart rate).
  5. ‘Thermometer Guns’ have drawbacks: they’re more expensive, you need to get close to someone’s head to take temperature, they are not very accurate, they don’t provide continuous measurement- yet it is still used for coronavirus containment.[9]

This might be a very cost-effective intervention to diagnose coronavirus.

Audio could be recorded to detect dry cough.[10], [11]

Can Smart Thermometers Track the Spread of the Coronavirus?

Non-EEG Dataset for Assessment of Neurological Status v1.0.0

[1] "Detecting Pulse from Head Motions in Video - People.csail.mit ...." http://people.csail.mit.edu/balakg/pulsefromheadmotion.html. Accessed 18 Mar. 2020.

[2] "Heart rate estimation using facial video: A review - ScienceDirect." https://www.sciencedirect.com/science/article/abs/pii/S1746809417301362. Accessed 18 Mar. 2020.

[3] "Fever and Cardiac Rhythm | JAMA Internal Medicine | JAMA ...." https://sci-hub.tw/https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/606966. Accessed 18 Mar. 2020.

[4] "Real-time core body temperature estimation from heart ... - NCBI." 13 May. 2015, https://www.ncbi.nlm.nih.gov/pubmed/25967760. Accessed 18 Mar. 2020.

[5] "Report of the WHO-China Joint Mission on Coronavirus ...." https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf. Accessed 18 Mar. 2020.

[6] "Presumed Asymptomatic Carrier Transmission of COVID-19 ...." 21 Feb. 2020, https://jamanetwork.com/journals/jama/fullarticle/2762028. Accessed 18 Mar. 2020.

[7] "Potential Presymptomatic Transmission of SARS-CoV ... - NCBI." https://www.ncbi.nlm.nih.gov/pubmed/32091386. Accessed 18 Mar. 2020.

[8] "Transmission interval estimates suggest pre-symptomatic ...." 6 Mar. 2020, https://www.medrxiv.org/content/10.1101/2020.03.03.20029983v1. Accessed 18 Mar. 2020.

[9] "'Thermometer Guns' on Coronavirus Front Lines Are ...." 14 Feb. 2020, https://www.nytimes.com/2020/02/14/business/coronavirus-temperature-sensor-guns.html. Accessed 18 Mar. 2020.

[10] "A Cough-Based Algorithm for Automatic Diagnosis of ... - NCBI." 1 Sep. 2016, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008773/. Accessed 18 Mar. 2020.

[11] "Cough Sounds | SpringerLink." https://link.springer.com/chapter/10.1007/978-3-319-71824-8_15. Accessed 18 Mar. 2020.