The Stopped Clock Problem

post by eapache (evan-huus) · 2020-06-04T12:07:37.417Z · score: 29 (14 votes) · LW · GW · 7 comments

When a low-probability, high-impact event occurs, and the world “got it wrong”, it is tempting to look for the people who did successfully predict it in advance in order to discover their secret, or at least see what else they’ve predicted. Unfortunately, as Wei Dai discovered recently [LW(p) · GW(p)], this tends to backfire.

It may feel a bit counterintuitive, but this is actually fairly predictable: the math backs it up on some reasonable assumptions. First, let’s assume that the topic required unusual levels of clarity of thought not to be sucked into the prevailing (wrong) consensus: say a mere 0.001% of people accomplished this. These people are worth finding, and listening to.

But we must also note that a good chunk of the population are just pessimists. Let’s say, very conservatively, that 0.01% of people predicted the same disaster just because they always predict the most obvious possible disaster. Suddenly the odds are pretty good that anybody you find who successfully predicted the disaster is a crank. The mere fact that they correctly predicted the disaster becomes evidence only of extreme reasoning, but is insufficient to tell whether that reasoning was extremely good, or extremely bad. And on balance, most of the time, it’s extremely bad.

Unfortunately, the problem here is not just that the good predictors are buried in a mountain of random others; it’s that the good predictors are buried in a mountain of extremely poor predictors. The result is that the mean prediction of that group is going to be noticeably worse than the prevailing consensus on most questions, not better.


Obviously the 0.001% and 0.01% numbers above are made up; I spent some time looking for real statistics and couldn't find anything useful; this article claims roughly 1% of Americans are "preppers", which might be a good indication, except it provides no source and could equally well just be the lizardman constant. Regardless, my point relies mainly on the second group being an order of magnitude or more larger than the first, which seems (to me) fairly intuitively likely to be true. If anybody has real statistics to prove or disprove this, they would be much appreciated.

7 comments

Comments sorted by top scores.

comment by Decius · 2020-06-04T19:22:16.031Z · score: 7 (3 votes) · LW(p) · GW(p)

There's also an element of "past performance is not a guarantee of future results". It's possible that someone correctly confidently predicted one thing for exactly the right reasons, and then confidently makes an error in the next thing for almost exactly the right reasons.


Likely, even, because the people who are confident about hard questions are more likely to be overconfident than have superpowers.

comment by riceissa · 2020-06-04T23:42:50.232Z · score: 6 (3 votes) · LW(p) · GW(p)

If randomness/noise is a factor, there is also regression to the mean when the luck disappears on the following rounds.

comment by ChristianKl · 2020-06-06T20:43:11.684Z · score: 4 (2 votes) · LW(p) · GW(p)

Wai Dai didn't chose the people he refered to for being right in hindsight but because they sounded sensible at t the time. Sensible enough to follow them before we had hindsight knowledge of how the Corono situation will evolve. 

comment by Viliam · 2020-06-04T19:38:15.292Z · score: 4 (2 votes) · LW(p) · GW(p)

Maybe I misunderstand how this works, but if a correct prediction is made by 1 genius and 100 cranks, making this prediction should still be treated as a smart thing. Because:

  • punishing the right answer just feels wrong;
  • you are not supposed to perfectly distinguish between geniuses and cranks based on one prediction;
  • if you evaluate many different predictions, then the crank will randomly succeed at one and fail at hundred, resulting in a negative score, while the genius will succeed at many and fail at a few, resulting in a positive score, so now everything works as expected.

It seems like a base-rate fallacy. Assuming that geniuses are generally better at predictions than cranks, the explanation why the difficult correct prediction was made by 1 genius and 100 cranks is that the population contains maybe 10 geniuses and 100 000 cranks, and on a specific hard answer, the genius has a 10% chance of success by thinking hard, and the crank has a 0.1% of success by choosing a random thing to believe.

But this means that awarding the "correctness point" to the 1 genius and 100 cranks is okay in long term, because the genius will keep collecting points, but for the crank it was the only point earned for a long time.

comment by eapache (evan-huus) · 2020-06-04T20:22:24.031Z · score: 1 (1 votes) · LW(p) · GW(p)

I think your understanding is generally correct. The failure case I see is where people say "this problem was really really really hard, instead of one point, I'm going to award one thousand correctness points to everyone who predicted it", and then end up surprised that most of those people still turn out to be cranks.

comment by AllAmericanBreakfast · 2020-06-04T15:53:59.682Z · score: 3 (3 votes) · LW(p) · GW(p)

You can filter out some of the cranks by checking the forecaster's reasoning, data, credentials, and track record, by looking for a consensus of similarly-qualified people, and by taking the incentives of the forecasters into account. But this comes with its own problems:

To a non-expert, it's hard to tell to what degree an expert's area of specialization overlaps with the question at hand. Is a hospital administrator a trustworthy source of guidance on the risk that a novel coronavirus turns into a pandemic?

To a non-expert, easy questions look hard, and hard questions sometimes look easy. [LW · GW] Can we distinguish between the two?

To a non-expert, it's hard to tell whether an expert consensus is really what it seems, or whether it's coalition-building by a political faction under the cloak of "objectivity."

These are just a few examples.

In the end, you have to decide whether it's easier to check the forecaster's reasoning or their trustworthiness.

comment by Pattern · 2020-06-04T16:40:17.564Z · score: 2 (1 votes) · LW(p) · GW(p)

It seems like a better model to say the strategies are:

1. Does (respectable source) say I should panic?

2. Researching a thing to see how serious it is.


So it's less "pessimism" and more people trying to ring the alarm earlier. Your perceived high quality sources may have good track records - but if they're all correlated (possibly from talking to each other and reaching a consensus), then looking at an independent source gives more information than looking at another one of them.


Are earlier alarms useful? Yes, but they go off more often, so more filtering is needed (since it hasn't been done before hand to the same standards).