Ten Commandments for Aspiring Superforecasters

post by Evan_Gaensbauer · 2018-04-25T04:55:37.642Z · LW · GW · 6 comments

Contents

  (1) Triage
None
6 comments

Cross-posted to the Effective Altruism Forum

In the last several years, political scientist and forecasting research pioneer Philip Tetlock has made waves for the success of his research program in geopolitical forecasting, published in the form of the popular book Superforecasting. It's been discussed much in the rationality community, and reviewed by Slate Star Codex. It turns out the skills of forecasting, such as the importance of taking the outside view into account, Fermi estimates, and Bayesian updating, will be familiar to aspiring rationalists. A lot of the value from the book, then, for readers here would be a summary of insights, so I've reproduced the appendix from the book that does just that below.


(1) Triage

Focus on questions where your hard work is likely to pay off. Don’t waste time either on “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most.


For instance, “Who will win the presidential election, twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952? If you think you could have known it would be a then-unknown colonel in the United States Army, Dwight Eisenhower, you may be afflicted with one of the worst cases of hindsight bias ever documented by psychologists.

Of course, triage judgment calls get harder as we come closer to home. How much justifiable confidence can we place in March 2015 on who will win the 2016 election? The short answer is not a lot but still a lot more than we can for the election in 2028. We can at least narrow the 2016 field to a small set of plausible contenders, which is a lot better than the vast set of unknown (Eisenhower-ish) possibilities lurking in 2028.

Certain cases of outcomes have well-deserved reputations for being radically unpredictable (e.g., oil prices, currency markets). But we usually don’t discover how unpredictable outcomes are until we have spun our wheels for a while trying to gain analytical traction. Bear in mind the two basic errors it is possible to make here. We could fail to try to predict the potentially predictable or we could waste our time trying to predict the unpredictable. Which error would be worse in the situation you face?

(2) Break seemingly intractable problems into tractable sub-problems.

Channel the playful but disciplined spirit of Enrico Fermi who--when he wasn’t designing the world’s first atomic reaction--loved ballparking answers to head-scratchers such as “How many extraterrestrial civilizations exist in the universe?” Decompose the problem into its knowable and unknowable parts. Flush ignorance into the open. Expose and examine your assumptions. Dare to be wrong by making you best guesses. Better to discover quickly than to hide them behind vague verbiage.

Superforecasters see Fermi-izing as part of the job. How else could they generate quantitative answers to seemingly impossible-to-quantify questions about Arafat’s autopsy, bird-flu epidemics, oil prices, Boko Haram, the Battle of Aleppo, and bond-yield spreads.

We find this Fermi-izing spirit at work even in the quest for love, the ultimate unquantifiable. Consider Peter Backus, a lonely guy in London, who guesstimated the number of potential female partners in his vicinity by starting with the population of London (approximately six million) and winnowing the number down by the proportion of women in the population (about 50%), by the proportion of singles (about 50%), by the proportion in the right age range (about 20%), by the proportion of university graduates (about 26%) by the proportion he finds attractive (only 5%), by the proportion likely to find him attractive (only 5%), and by the proportion likely to be compatible with him (about 10%). Conclusion: roughly twenty-six women in the pool, a daunting but not impossible search task.

There are no objectively correct answers to true-love questions, but we can score the accuracy of the Fermi estimates that superforecasters generate in the IARPA tournaments. The surprise is how often remarkably good probability estimates arise from a remarkably crude series of assumptions and guesstimates.

(3) Strike the right balance between inside and outside views.

Superforecasters know that there is nothing new under the sun. Nothing is 100% “unique”. Language purists be damned: uniqueness is a matter of degree. So superforecasters for comparison classes even for seemingly unique events, such as the outcome of a hunt for a high-profile terrorist (Joseph Kony) or the standoff between a new socialist government in Athens and Greece’s creditors. Superforecasters are in the habit of posing the outside-view question: How often do things of this sort happen in situations of this sort?


So too apparently is Larry Summers, a Harvard professor and former Treasury secretary. He knows about the planning fallacy: when bosses ask employees how long it will take to finish a project, employees tend to underestimate the time they need, often by factors of two or three. Summers suspects his own employees are no different. One former employee, Greg Mankiw, himself now a famous economist, recalls Summers’s strategy: he doubled the employee’s estimate, then moved to the next higher time unit. “So, if the research assistant says the task will take an hour, it will take two days. If he says two days, it will take four weeks.” It’s a nerd joke: Summers corrected for his employees’ failure to take the outside view in making estimates by taking the outside view toward employee’s estimates, and then inventing a funny correction factor.


Of course Summers would adjust his correction factor if an employee astonished him and delivered on time. He would balance his outside-view expectation of tardiness against the new inside-view evidence that a particular employee is an exception to the rule. Because each of us is, to some degree, unique.

(4) Strike the right balance between under- and overreacting to evidence.

Belief updating is to good forecasting as brushing and flossing are to good dental hygiene. It can be boring, occasionally uncomfortable, but it pays off in the long term. That said, don’t suppose that belief updating is always easy because sometimes it is. Skillful updating requires teasing subtle signals from noisy news flows--all the while resisting the lure of wishful thinking.

Savvy forecasters learn to ferret out telltale clues before the rest of us. They snoop for nonobvious lead indicators, about what would have to happen before X could, where X might be anything from an expansion of Arctic sea ice to a nuclear war in the Korean peninsula. Note the fine line here between picking up subtle clues before everyone else and getting suckered by misleading clues. Does the appearance of an article critical of North Korea in the official Chinese press signal that China is about the squeeze Pyongyang hard--or was it just a quirky error in editorial judgment? The best forecasters tend to be incremental belief updaters, often moving the probabilities of, say, 0.4 to 0.35 or from 0.6 to 0.65, distinctions too subtle with vague verbiage, like “might” or “maybe”, but distinctions that, in the long run, define the difference between good and great forecasters.

Yet superforecasters also know how to jump, or move their probability estimates fast in response to diagnostic signals. Superforecasters are not perfect Bayesian updaters but they are better than most of us. And that is largely because they value this skill and work hard at cultivating it.

(5) Look for the clashing causal forces at work in each problem.

For every good policy argument, there is typically a counterargument that is at least worth acknowledging. For instance, if you are a devout dove who believes the threatening military action never brings peace, be open to the possibility that you might be wrong about Iran. And the same advice applies if you are a devout hawk who believes that soft “appeasement” policies never pay off. Each side should list, in advance, the signs that would nudge them toward the other.

Now here comes the really hard part. In classical dialectics, thesis meets antithesis, producing synthesis. In dragonfly eye, one view meets another and another and another--all of which must be synthesized into a single image. There are no paint-by-number rules here. Synthesis is an art that requires reconciling irreducibly subjective judgments. If you do it well, engaging in this process of synthesizing should transform you from a cookie-cutter dove or hawk into an odd hybrid creature, a dove-hawk, with a nuanced view of when tougher or softer are likelier to work.

(6) Strive to distinguish as many degrees of doubt as the problem permits but no more.

Few things are either certain or impossible. And “maybe” isn’t all that informative. So your uncertainty dial needs more than three settings. Nuance matters. The more degrees of uncertainty you can distinguish, the better a forecaster you are likely to be. As in poker, you have an advantage if you are better than your competitors at separating 60/40 bets from 40/60--or 55/45 from 45/55. Translating vague-verbiage hunches into numeric probabilities feels unnatural at first but it can be done. It just requires patience and practice. The superforecasters have shown what is possible.

Most of us could learn, quite quickly, to think in more granular ways about uncertainty. Recall the episode in which President Obama was trying to figure out whether Osama bin Laden was the mystery occupant of the walled-in compound in Abbottabad. And recall the probability estimates of his intelligence officers and the president’s reaction to their estimates: “This is fifty-fifty...a flip of the coin.” Now suppose that President Obama had been shooting the breeze with basketball buddies and each one offered probability estimates on the outcome of a college game--and those estimates corresponded exactly to those offered by intelligence officers on the whereabouts of Osama bin Laden. Would the president still have shrugged and said, “This is fifty-fifty”, or would he have said, “sounds like the odds fall between three to one and four to one”? I bet on the latter. The president is accustomed to granular thinking in the domain of sports. Every year, he enjoys trying to predict the winners of the March Madness basketball tournament, a probability puzzle that draws the attention of serious statisticians. But, like his Democratic and Republican predecessors, he does not apply the same rigor to national security decisions. Why? Because different norms govern different thought processes. Reducing complex hunches to scorable probabilities is de rigueur in sports but not in national security.

So, don’t reserve the rigorous reasoning for trivial pursuits. George Tenet would not have dared utter “slam dunk” about weapons of mass destruction if the Bush 43 White House had enforced standards of evidence and proof that are second nature to seasoned gamblers on sporting events. Slam dunk implies one is willing to offer infinite odds--and to lose everything if one is wrong.

(7) Strike the right balance between under- and overconfidence, between prudence and decisiveness.

Superforecasters understand the risks both of rushing to judgment and of dawdling too near “maybe”. They routinely manage the trade-off between the need to take decisive stands (who wants to listen to a waffler?) and the need to qualify their stands (who wants to listen to a blowhard?). They realize that long-term accuracy requires getting good scores on both calibration and resolution--which requires moving beyond blame-game ping-pong. It is not enough to just avoid the most recent mistake. They have to find creative ways to tamp down both types of forecasting errors--misses and false alarms--to the degree a fickle world permits such uncontroversial improvements in accuracy.

(8) Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases.

Don’t try to justify or excuse your failures. Own them! Conduct unflinching postmortems: Where exactly did I go wrong? And remember that although the more common error is to learn too little from failure and to overlook flaws in your basic assumptions, it is also possible to learn too much (you may have been basically on the right track but made a minor technical mistake that had big ramifications). Also don’t forget to do postmortems on your successes too. Not all successes imply your reasoning was right. You may have just lucked out by making offsetting errors. And if you keep confidently reasoning along the same lines, you are setting yourself up for a nasty surprise.

(9) Bring out the best in others and let others bring out the best in you.

Master the fine arts of team management, especially perspective taking (understanding the arguments of the other side so well that you can reproduce them to the other’s satisfaction), precision questioning (helping others to clarify their arguments so they aren’t being misunderstood), and constructive confrontation (learning to disagree without being disagreeable). Wise leaders know how fine the line can be between a helpful suggestion and micromanagerial meddling or between a rigid group and a decisive one or between a scatterbrained group or an open-minded one. Tommy Lasorda, the former coach of the Los Angeles Dodgers, got it roughly right: “Managing is like holding a dove in your hand. If you hold it too tightly you kill it, but if you hold it too loosely, you lose it.”

(10) Master the error-balancing bicycle.

Implementing each commandment requires balancing opposing errors. Just as you can’t learn to ride a bicycle by reading a physics textbook, you can’t become a superforecaster by reading training manuals. Learning requires doing, with good feedback that leaves no ambiguity about whether you are succeeding--”I’m rolling along smoothly!”--or whether you are failing--”crash!” Also remember that practice is not just going through the motions of making forecasts, or casually reading the news and tossing out probabilities. Like all other known forms of expertise, superforecasting is the product of deep, deliberative practice.

(11) Don’t treat commandments as commandments.

“It is impossible to lay down binding rules,” Helmuth von Moltke warned, “because two cases will never be exactly the same.” As in war, so in all things. Guidelines are the best we can do in a world where nothing is certain or exactly repeatable. Superforecasting requires constant mindfulness, even when--perhaps especially when--you are dutifully trying to follow these commandments.


Citation: Tetlock, Philip E., and Dan Gardner. "Appendix: Ten Commandments for Aspiring Superforecasters." In Superforecasting, 277-85. Penguin Random House Company, 2015.

Note: If you wanted to point out there was a link to these ten commandments from this Slate Star Codex post, that link is dead. So switch your "Ten Commandments of Superforecasting" bookmark to this post's url.

6 comments

Comments sorted by top scores.

comment by Matt Goldenberg (mr-hire) · 2018-04-25T19:29:12.070Z · LW(p) · GW(p)

This appendix struck me as exceedingly useless when I first encountered it. Most of the suggestions follow the pattern of "Find the right balance between two extremes", but don't give enough context to figure out where that balance is.

It's like he talked to the experts, got the fact that a lot of what they were doing was tacit knowledge that gave them a feel for this sort of thing, but didn't do any of the modeling work to then pull out the models that actually underlie the tacit knowledge.

I'd be curious if anyone has improved their calibration using these guidelines? Personally, I got much more mileage out of How to Measure Anything's 5 calibration strategies.

Replies from: Evan_Gaensbauer
comment by Evan_Gaensbauer · 2018-04-27T15:14:39.361Z · LW(p) · GW(p)

Edit: someone reminded me that Tetlock or his publishers might not like me reproducing large parts of his book on LW, but if anyone wants to send me a PM we can figure something out.

This isn't the only section of Superforecasting I intend to reproduce on LW, as there are more important parts. I've taken the time to transcribe the parts I found most important. But Tetlock did the modeling work corresponding to each of these commandments as laid out in the book (each commandment corresponds to a chapter of the book). If you can highlight what parts you'd like to zoom in on, I'm happy to transcribe parts of the book referring to specific commandments if you like.

Replies from: mr-hire
comment by Matt Goldenberg (mr-hire) · 2018-04-28T19:18:52.424Z · LW(p) · GW(p)

I'd be interested in the parts that you felt most improved your calibration. Personally most of what I got from the book was about how effective forecasting tournaments were, what their limits where, and how to run them effectively. I got very little in terms of better calibration.

Replies from: Evan_Gaensbauer
comment by Evan_Gaensbauer · 2018-04-28T19:32:08.311Z · LW(p) · GW(p)

I didn't get much in the way which improved my calibration either. I don't think the most valuable parts of the book are about improving individual calibration directly. I see the book as a guide on how to literally become a superforecaster. Unfortunately that takes so much time for most individuals it's infeasible, so the book wasn't written that way. Why I'm uploading parts of *Superforecasting* in the first place is because the ones I'm uploading are the ones which are most relevant for my creation of a system within the rationality/EA communities of a giant forecasting machine. It could include forecasting tournaments. If you what you're after is improved calibration, maybe I won't have to talk about the book, because I intend to get more rationalists to learn by just *doing* it, instead of reading.

comment by Ben Pace (Benito) · 2018-04-25T06:16:04.690Z · LW(p) · GW(p)

Promoted to frontpage.

comment by Ericf · 2019-11-13T15:57:15.314Z · LW(p) · GW(p)

Note: Slam Dunks only succeed ~90% of the time