Shallow Review of Consistency in Statement Evaluation
score: 13 (5 votes) ·
Some relevant quotes from Scott’s review of Superforecasting are bolded below:
- “First of all, is it just luck? After all, if a thousand chimps throw darts at a list of stocks, one of them will hit the next Google, after which we can declare it a “superchimp”. Is that what’s going on here? No. Superforecasters one year tended to remain superforecasters the next. The year-to-year correlation in who was most accurate was 0.65; about 70% of superforecasters in the first year remained superforecasters in the second. This is definitely a real thing.”
- Could imply that accuracy (in predictionmaking) correlates with consistency. Would need to look into whether there’s a relationship between the consistency of someone’s status as a superforecaster and their consistency with respect to the answer of a particular question, or group of questions
- Could also imply that our best method for verifying accuracy is by ascertaining consistency.
- Further steps: look into the good judgement project directly, try out some metrics that point at “how close is this question to another related question”, see how this metric varies with accuracy.
- “One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.”
- This seems to be a different kind of “precision” than the “consistency” that you’re looking for. Maybe it’s worth separating refinement-type precision from reliability-type precision.
Scott notably reports that IQ, well-informed-ness, and math ability only correlate somewhat with forecasting ability, and that these traits don’t do as good a job of distinguishing superforecasters.
On the other hand, AI Impacts did a review of data from the Good Judgement Project, the project behind Tetlock’s conclusions, that suggests that some of these traits might actually be important -- particularly intelligence. Might be worth looking into the GJP data specifically with this question in mind.