The Math of Suspicious Coincidences
post by Roko · 2024-02-07T13:32:35.513Z · LW · GW · 3 commentsContents
Apropos of What? Does this explain too much? None 3 comments
I've been doing a bit of thinking about covid-19 [LW · GW] origins and how to formalize the circumstantial evidence that covid-19 leaked from WIV. I am beginning to suspect that a careful formalization of this evidence could be overwhelmingly strong, like enough to convict without even needing to look at any other evidence.
But one needs to carefully deal with objections like https://slatestarcodex.com/2016/11/05/the-pyramid-and-the-garden/
The Great Pyramid at Giza's physical footprint intersects the latitude line which is the speed of light divided by 10⁷. If you examine this carefully it's about a 1 in 10⁵ coincidence since the Pyramid is 0.2km wide and there are about 20,000km of latitude lines.
There is seemingly a reductio ad absurdum of reasoning purely based off of coincidences here. We know for certain that this really is a coincidence, yet naively one might be 99.999% confident that time travelling aliens helped to place the pyramid.
However, I think one can explain away coincidences like this as a sort of hash collision between sets of prominent numbers.
If there are say, 50 prominent places (7 wonders of the ancient world, 12 wonders of the modern world, birthplaces of top scientists, locations of top universities, locations of prominent seats of government) and 25 prominent physical/mathematical numbers (15 top physical constants, 10 prominent mathematical constants - pi, e, phi, gamma, Feigenbaum constants, ln(2), √2, e^pi, √2 ^ √2 ), what is the chance of getting a 1 in 10⁵ coincidence match?
Naively you might not think it is that much - we have listed only 75 things so getting even one pair to line up randomly to a precision of 1 in 10⁵ seems unlikely.
But I will show that when you can collide anything in the left hand set with anything in the right hand set, you will rapidly accumulate a huge number of possible matches and eventually luck out and get one.
If there are are N things on the left and M on the right, there are NM possible collisions. 50 × 25 = 1250 is not enough to get us to 10⁵ though. But we can first slightly mutate the sets.
For example, each physical or mathematical constant can be mutated to either 2 or 3 possible coordinate values. Phi, the golden ratio = 1.6180339887... can be multiplied by 1 to get 1.6180339887°, by 10 to get 16.180339887° or by 100 to get 161.80339887° for longitude.
And each of those can be either East or West, giving 6 longitude lines. And you also get 1.6180339887° South and 16.180339887° South and 1.6180339887° North and 16.180339887° North. So just one number, Phi has expanded to 10 different lines on the map. So our modest pool of 25 constants expands to say, 200 once we assign all possible ordinates (we will ignore ordinate values less than 1° as these would look contrived).
The NM possible collisions is now 50 places × 200 ordinate lines = 10,000 possible collisions.
What is the probability of getting at least 1 hit to a precision of 1 in 10⁵ on 10000 tries? Turns out it's 0.095 - not statistically significant.
You can mutate the sets a bit more, for example you could include the height of every prominent place that had a height. Every constant and its multiples of 10 and 100 can now match with a height too, and the height can be in feet or meters, ...etc. Then you could interpret some of the physical constants in different units which might get us to 40 rather than 25 constants. Because of the quadratic growth of the collision set, you really don't have to push this process much until you're guaranteed to get a hit.
Apropos of What?
But isn't that 9.5% still Bayesian evidence that time travelling aliens helped to place the pyramid at Giza? That coincidence will still happen only 1 time out of 10 you check it! If you look at all possible worlds that have physical constants and prominent places, our world is in the right tail, which is suspicious!
I think these smaller coincidences are explained by a search not over place-constant collisions, but a sort of brute force search over different categories of coincidence.
I can easily come up with 10 categories of coincidences and look through all of them and find the most extreme one; memetic contagion will do this of its own accord so if you see a coincidence "apropos of nothing" you have to assume that this sort of preselection happened. I think you could come up with 10 different categories, but not a million. Probably not even 1000. Fisher's p=0.05 seems about right for this - anything that's a lot less than 0.05 after properly accounting for researcher degrees of freedom is actually evidence of something suspicious happening.
Does this explain too much?
Suppose that one of the ordinates of Newton's Birthplace matched the modern value of the Gravitational constant, big G, to 5 significant figures. Could we explain this away by doing a similar collision-based search? No, I don't think so. Because Newton is specifically associated with gravity, and Big G is only of only a few numbers that matter for gravity. Now, we could search over the set of {inventors of an important constant's relevant places such as birthplace, death-place and primary place of scholarship} and do a 1-1 matching with the values of those constants in various units. But because of the link between the man and the number this is not a quadratic collision problem where anything on the right can collide with anything on the left. So you're only going to get a small (few dozen) chances, and thus a match to within 5 significant figures has a miniscule probability.
Suppose furthermore that not only was the longitude of Newton's birthplace equal to big G to 5 figures, but his place of death had a latitude equal to little g to 4 figures. What can we conclude?
Well, you cannot stack multiple simultaneous independent coincidences by doing set collisions. So the probability of both the Big G and little g numbers matching a relevant place in Newton's life is going to be roughly 10⁻⁹, maybe multiplied by a few small degrees of freedom, like switching the roles of birth and death between big and little G.
So explaining a single coincidence between two unrelated things is pretty easy using these quadratic collisions and researcher degrees of freedom, but explaining multiple independent coincidences between related things is not possible - these are genuinely unlikely under the null hypothesis.
3 comments
Comments sorted by top scores.
comment by TeaTieAndHat (Augustin Portier) · 2024-02-08T16:07:23.285Z · LW(p) · GW(p)
Quite cool! Reminded me of a video taken from an old TV show where they had an archeologist against one of those pyramidiots (real word) whose favourite pastime is ‘discovering’ that sort of spurious coincidences and writing books about it. The archeologist made the same argument you did, that if you’re trying to find any two things that match among a set of a million things, you’ll find a lot of matches. Or, as he put it "you can find anything if you’re just looking for anything you fancy". He handled it rather well: before going into the studio, he had taken the measurements of a hot-dog stand or something, and then spend his time on the show going "see, if you add the length of the counter where the hot dogs are, plus twice the width of the roof, multiply by a billion, that’s the distance between the Earth and the Moon! And if you…"
That was glorious. The link’s here (in French).
comment by Shankar Sivarajan (shankar-sivarajan) · 2024-02-07T18:25:42.518Z · LW(p) · GW(p)
"Once is enemy action."
Some things simply aren't happenstance. Anyone trying to use the probability of unrelated events (Pyramids and math/physics constants or whatever) coinciding as part of an explanation is actively trying to deceive you: the obviously correct explanation is that the people who openly said they were trying to do something succeeded.
Replies from: Viliam