Comment by natkozak on Shallow Review of Consistency in Statement Evaluation · 2019-09-13T06:05:30.083Z · score: 13 (5 votes) · LW · GW

Some relevant quotes from Scott’s review of Superforecasting are bolded below:

  1. “First of all, is it just luck? After all, if a thousand chimps throw darts at a list of stocks, one of them will hit the next Google, after which we can declare it a “superchimp”. Is that what’s going on here? No. Superforecasters one year tended to remain superforecasters the next. The year-to-year correlation in who was most accurate was 0.65; about 70% of superforecasters in the first year remained superforecasters in the second. This is definitely a real thing.
    1. Could imply that accuracy (in predictionmaking) correlates with consistency. Would need to look into whether there’s a relationship between the consistency of someone’s status as a superforecaster and their consistency with respect to the answer of a particular question, or group of questions
    2. Could also imply that our best method for verifying accuracy is by ascertaining consistency.
    3. Further steps: look into the good judgement project directly, try out some metrics that point at “how close is this question to another related question”, see how this metric varies with accuracy.
  2. “One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.”
    1. This seems to be a different kind of “precision” than the “consistency” that you’re looking for. Maybe it’s worth separating refinement-type precision from reliability-type precision.

Scott notably reports that IQ, well-informed-ness, and math ability only correlate somewhat with forecasting ability, and that these traits don’t do as good a job of distinguishing superforecasters.

On the other hand, AI Impacts did a review of data from the Good Judgement Project, the project behind Tetlock’s conclusions, that suggests that some of these traits might actually be important -- particularly intelligence. Might be worth looking into the GJP data specifically with this question in mind.