Thoughts on the SPIES Forecasting Method?
post by Fer32dwt34r3dfsz (rodeo_flagellum) · 2022-03-19T15:22:43.277Z · LW · GW · 3 commentsThis is a question post.
Contents
Answers 4 SimonM None 3 comments
Here, I discuss the SPIES forecasting method, and ask for the community's thoughts on it.
Not too long ago, I came across the SPIES (Subjective Probability Interval Estimates) method for judgmental forecasting. The method was developed by Uriel Haran, and seems first to have been published as part of his 2011 dissertation Subjective Probability Interval Estimates: A Simple and Effective Way to Reduce Overprecision in Judgment. Haran writes
Overprecision in judgment is the most robust type of overconfidence, and the one least
susceptible to debiasing. It refers to people’s excessive certainty in the accuracy of their
estimates, predictions or beliefs. Research on overprecision finds that confidence intervals, estimated ranges that judges are confident will include the correct answer, tend to include the correct answer significantly less often than what their assigned confidence level would suggest. For example, 90% confidence intervals typically include the correct answer about 50% of the time (Klayman, Soll, González-Vallejo, & Barlas, 1999).
He eventually makes the claim that SPIES reduces the overprecision of confidence interval forecasts, and evidences this claim with the results of several forecasting experiments he conducted. Participants made forecasts using the following methods: provide a confidence interval that contains the target value 90% of the time; provide a 5% lower bound and 95% upper bound that they believe the target value would not be below and above, respectively; use the SPIES method, which consists of decomposing a numerical range into several intervals, then having participants assign a likelihood of 0-100 for each interval, then normalizing these likelihoods into a probability uniformly distributed over the values in the interval, and then finding the shortest subinterval of the numerical range that constitutes 90% of the cumulative probability.
For example, if we want to forecast the monthly rainfall for NYC in March, we can begin by looking at the following intervals 40-65mm, 66-90mm, 91-115mm, 116-140mm, and >140mm (40mm to 140+mm could have been partitioned into 5, 10, etc... intervals, I just chose 5 for this example). I do not know much about rainfall in NYC, but might assign these intervals the following likelihoods: 35/100, 55/100, 85/100, 25/100, and 5/100, respectively. My probabilities for these intervals would then be
- 35 / (35+55+85+25+5) = 0.1707 for 40-65mm
- 55 / (35+55+85+25+5) = 0.2683 for 66-90mm
- 85 / (35+55+85+25+5) = 0.4146 for 91-115mm
- 25 / (35+55+85+25+5) = 0.122 for 116-140mm
- 5 / (35+55+85+25+5) = 0.0244 for >140mm
The smallest subinterval of [40, >140] subsuming 90% of this probability produces the following estimate: With 90% confidence, I believe NYC's rainfall in March will be between 40mm and 125mm. Note that, to get this estimate, I had to use programming. Also, the 90% confidence interval was somewhat arbitrary; I'm also 75% confident that NYC's rainfall in March will be between 55mm and 115mm.
I haven't come across SPIES anywhere on LW, and first found out about it in this Harvard Business Review article A Simple Tool for Making Better Forecasts, which contains an interactive example (temperatures in June) of SPIES.
So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere. Would anyone be interested in performing some experiments with me on using SPIES in a greater variety of forecasting situations, or perhaps in improving SPIES or in building better methods to control for overconfident forecasts?
Answers
So, what do you think? Does this method seem at all promising? I'm debating with myself whether I should begin using SPIES on Metaculus or elsewhere.
I'm not super impressed tbh. I don't see "give a 90% confidence interval for x" as a question which comes up frequently? (At least in the context of eliciting forecasts and estimates from humans - it comes up quite a bit in data analysis).
For example, I don't really understand how you'd use it as a method on Metaculus. Metaculus has 2 question types - binary and continuous. For binary you have to give the probability an event happens - not sure how you'd use SPIES to help here. For continuous you are effectively doing the first step of SPIES - specifying the full distribution.
If I was to make a positive case for this, it would be - forcing people to give a full distribution results in better forecasts for sub-intervals. This seems an interesting (and plausible claim) but I don't find anything beyond that insight especially valuable.
3 comments
Comments sorted by top scores.
comment by Charlie Steiner · 2022-03-19T20:21:35.943Z · LW(p) · GW(p)
This seems like a fun thing to try out for yourself and compare to estimates without it.
comment by Garrett Baker (D0TheMath) · 2022-03-19T19:02:23.338Z · LW(p) · GW(p)
You could get far more rapid feedback on the usefulness of this method by using it in a calibration training.
Replies from: rodeo_flagellum↑ comment by Fer32dwt34r3dfsz (rodeo_flagellum) · 2022-03-19T20:26:24.540Z · LW(p) · GW(p)
Nice, I didn't know OpenPhil had calibration training.
It is difficult to use SPIES for the calibration training - I kept running out of time when using my implementation in Python. To still compare the methods, I copied some questions and gave a confidence interval and SPIES estimate. Here are the results; I've only included 5 questions, but from what I've done, it seems SPIES helps me to narrow might 80% confidence intervals.
1. In which year was the US Open decided for the first time by 'sudden death'?
- CI: 1900-2000
- SPIES: 1938-2000 : 1900-1924 16.54%; 1925-1948 24.63%; 1949-1972 29.41%; 1973-1996 29.41%
- Actual Value: 1990
2. In what year did Emerson Fittipaldi first win the World Championship?
- CI: 1910-2010
- SPIES: 1939-2010 : 1910-1935 18.18%; 1936-1960 11.36%; 1961-1985 36.36%; 1986-2010 34.09%
- Actual Value: 1972
3. In what year was rayon first produced in the United States?
- CI: 1780-2005
- SPIES: 1836-1996 : 1780-1836 16.28%; 1837-1892 27.91%; 1893-1948 27.91%; 1949-2005 27.91%
- Actual Value: 1910
4. When was the first Winter Olympics held?
- CI: 1880-1980
- SPIES: 1914-1980 : 1880-1905 13.04%; 1906-1930 21.74%; 1931-1955 26.09%; 1956-1980 39.13%
- Actual Value: 1924
5. In which year did Frankie Goes to Hollywood form?
- CI: 1910-2000
- SPIES: 1938-2000 : 1910-1932 15.0%; 1933-1954 20.0%; 1955-1976 30.0%; 1977-2000 35.0%
- Actual Value: 1980