Bayesian Reasoning on Maps
post by Sjlver (jonas-wagner) · 2025-01-22T10:45:03.584Z · LW · GW · 0 commentsThis is a link post for https://blog.purpureus.net/posts/bayesian-reasoning-on-maps/
Contents
Reasoning on maps, explained Choosing the right zoom level A personal takeaway None No comments
This is a linkpost for an article I've written for my blog. Readers of LessWrong may want to skip the intro about Bayesian Reasoning, but might find the application to the Peter Miller vs Rootclaim debate quite interesting.
I’ve been a fan of Bayesian Reasoning since the time I’ve read Harry Potter and the Methods of Rationality. In a nutshell, Bayesian Reasoning is a way to believe true things. It is a method to update one’s beliefs given some evidence, so that one ends up with more credence on beliefs that match the evidence.
While Bayesian Reasoning (Wikipedia) is not the only method to find true conclusions, it’s the method with the best mathematical explanation of why it works. However, the method can be difficult to use in practice.
One example that illustrates this well is the Rootclaim Covid Case. Rootclaim is a project by Saar Wilf. He uses the Bayesian method to make various claims, including the controversial claim that COVID-19 originated from a lab leak at the Wuhan Institute of Virology. This claim was examined in a long, high-stakes debate between Rootclaim and Peter Miller. Rootclaim lost the debate, but continues to argue that the Bayesian method overwhelmingly supports the lab leak theory. What went wrong?
Reasoning on maps, explained
Bayesian Reasoning updates the belief in a hypothesis based on how well the hypothesis matches available evidence. If there is a strong match, this is counted as points in favor of the hypothesis; if there is a weak match, the evidence counts against the hypothesis.
In this case, the hypothesis is “COVID-19 originated in a lab leak”. Rootclaim starts by saying that relevant Coronavirus research happens in just two labs worldwide, in North Carolina, USA (80%) and Wuhan, China (20%). They then argue that lab security is particularly bad at the Wuhan Institute of Virology, so they give it a 4x boost. This brings the odds between the labs from 80:20 to 80:80. In other words, if the cause of COVID is indeed a lab leak, it would start at the Wuhan Institute of Virology in 50% of the cases, in North Carolina in 50% of the cases, and nowhere else. See their argument here.
When visualized on a map, this makes the conclusion quite intuitive: If the origin is a lab leak, then we would expect to see it in the US or China. Thus, considering just the country of origin, the fact that initial cases were in China is weak evidence in favor of the hypothesis. In contrast, if the initial cases were observed in, say, Italy, that would be weak evidence against the hypothesis. On the map, I’ve visualized this by drawing the probability density in red. Instead of being uniformly spread over the entire world, it is concentrated on the US and China.
It’s essential to understand that the probability density has to sum up to one. In other words, if the map is a darker red somewhere, it has to be light elsewhere. The hypothesis defines how evidence would be distributed on a map if the hypothesis were true, and Bayesians judge the hypothesis by how well the observations fit these expectations.
Choosing the right zoom level
We know not just the country of origin of COVID-19. In fact, we know that the first cases appeared in Wuhan. Thus, a more precise map would contract all of China’s probability mass around the city of Wuhan. In graphical terms, the corresponding surface is now smaller, and a darker red.
Note how this changes the interpretation of evidence. Under this more precise model, most Chinese cities would now count against the hypothesis. For example, if patient zero were found in Beijing, that would reduce our belief in a lab leak. However, patient zero in Wuhan would be stronger evidence for the lab leak.
Rootclaim stops here, as far as maps are concerned. They argue that “Location Wuhan” warrants a 54x boost to their lab leak odds, and conclude that lab leak is the most likely hypothesis. Video here.
However, given our precise knowledge of where the first COVID-19 cases appeared, and given how the disease spreads from person to nearby person, the above map does not have the right zoom level. Instead, we can zoom in further and would expect the probability mass to be distributed more like this:
With this high-quality map, the initial cases have become evidence that points against a lab leak hypothesis. Indeed, if there were a lab leak, we would expect many of the initial cases to have a connection with the lab. But this is not the case, neither geographically nor socially.
Proponents of the lab leak hypothesis have made various arguments why significant probability mass should be on the Huanan Seafood Market: it is connected by public transport to other places of the city including the Institute of Virology, it has conditions that would favor the spread of the virus, etc. But this contradicts the earlier claim that the Institute of Virology is the origin. The laws of probability imply that we can’t have it both ways: The more probability mass we put on the Institute of Virology, the less mass we can put elsewhere.
We find that the location of the first COVID-19 cases provides arguments in favor of the lab leak hypothesis only under a specific zoom level, namely, when we look at cities. At this level, there is an apparent connection between the first cases (found in Wuhan) and the Institute of Virology (located in Wuhan). However, when using the zoom level that best fits the case at hand, this spurious connection disappears.
A personal takeaway
I admit that I was puzzled, and at times swayed, by Rootclaim’s use of Bayesian Reasoning. Their presentation looked sound initially, and quite different from the typical conspiracy theory. I had formed false beliefs. It is only after watching the debate between Peter Miller and Rootclaim, writing down my own thoughts, and visualizing the probabilities on maps, that I feel confident in two things:
- Bayesian Reasoning can be a good way to form true beliefs.
- COVID-19’s geographic origins do not support a lab leak hypothesis.
I want to express my thanks and admiration to Peter Miller, Eric Stansifer and Will Van Treuren. Peter debated Rootclaim and made a great case for COVID-19’s zoonotic origin. Eric and Will were the impartial judges in the debate. They saw the flaws in Rootclaim’s reasoning earlier and more clearly than I. Final thanks go to Scott Alexander, who has a wonderful blog and made an excellent summary of the debate.
0 comments
Comments sorted by top scores.