Overconfident Pessimism

post by lukeprog · 2012-11-24T00:47:43.721Z · LW · GW · Legacy · 38 comments

Contents

  Notes
None
38 comments

You can build a machine to draw [deductive] conclusions for you, but I think you can never build a machine that will draw [probabilistic] inferences.

George Polya, 34 years before Pearl (1988) launched the probabilistic revolution in AI

The energy produced by the breaking down of the atom is a very poor kind of thing. Anyone who expects a source of power from the transformation of these atoms is talking moonshine.

Ernest Rutherford in 1933, 18 years before the first nuclear reactor went online

I confess that in 1901 I said to my brother Orville that man would not fly for fifty years. Two years later we ourselves made flights. This demonstration of my impotence as a prophet gave me such a shock that ever since I have distrusted myself...

Wilbur Wright, in a 1908 speech

 

Startling insights are hard to predict.1 Polya and Rutherford couldn't have predicted when computational probabilistic reasoning and nuclear power would arrive. Their training in scientific skepticism probably prevented them from making confident predictions about what would be developed in the next few decades.

What's odd, then, is that their scientific skepticism didn't prevent them from making confident predictions about what wouldn't be developed in the next few decades.

I am blessed to occasionally chat with some of the smartest scientists in the world, especially in computer science. They generally don't make confident predictions that certain specific, difficult, insight-based technologies will be developed soon. And yet, immediately after agreeing with me that "the future is very hard to predict," they will confidently state that a specific, difficult technology is more than 50 years away!

Error. Does not compute.

What's going on, here?

I don't think it's always a case of motivated skepticism. I don't think think Wilbur Wright was motivated to think flight was a long way off. I think he was "zoomed in" on the difficulty of the problem, didn't see a way to solve it, and misinterpreted his lack of knowledge about the difficulty of flight as positive information that flight was extremely difficult and far away.

As Eliezer wrote:

When heavier-than-air flight or atomic energy was a hundred years off, it looked fifty years off or impossible; when it was five years off, it still looked fifty years off or impossible. Poor information.

(Of course, we can predict some technological advances better than others: "Five years before the first moon landing, it looked a few years off but certainly not a hundred years off.")

There may also be a psychological double standard for "positive" and "negative" predictions. Skepticism about confident positive predictions — say, that AI will be invented soon — feels like the virtuous doubt of standard scientific training. But oddly enough, making confident negative predictions — say, that AI will not be invented soon — also feels like virtuous doubt, merely because the first prediction was phrased positively and the second was phrase negatively.

There's probably some Near-Far stuff going on, too. Nuclear fusion and AI feel abstract and unknown, and thus they also feel distant. But when you're ignorant about a phenomenon, the correct response is to broaden your confidence intervals in both directions, not push them in one direction like the Near-Far effect wants you to.

The scientists I speak to are right to say that it's very hard to predict the development of specific technologies. But one cannot "simultaneously claim to know little about the future and to be able to set strong lower bounds on technology development times," on pain of contradiction.

Depending on the other predictions these scientists have made, they might be3 manifesting a form of overconfidence I'll call "overconfident pessimism." It's well-known that humans are overconfident, but since overconfident pessimism seems to be less-discussed than overconfident optimism, I think it's worth giving it its own name.

What can we do to combat overconfident pessimism in ourselves?

The most broadly useful debiasing technique is to "consider the opposite" (Larrick 2004):

The strategy consists of nothing more than asking oneself, “What are some reasons that my initial judgment might be wrong?” The strategy is effective because it directly counteracts the basic problem of association-based processes — an overly narrow sample of evidence – by expanding the sample and making it more representative...

Or, consider this variant of "consider the opposite":

Typically, subjective range estimates exhibit high overconfidence. Ranges for which people are 80 percent confident capture the truth 30 percent to 40 percent of the time. Soll and Klayman (2004) showed that having judges generate 10th and 90th percentile estimates in separate stages – which forces them to consider distinct reasons for low and high values – increased hit rates to nearly 60 percent by both widening and centering ranges.2

Another standard method for reducing overconfidence and improving one's accuracy in general is calibration training (Lichtenstein et al. 1982; Hubbard 2007).

The calibration training process is pretty straightforward: Write down your predictions, then check whether they came true. Be sure to also state your confidence in each prediction. If you're perfectly calibrated, then predictions you made with 60% confidence should be correct 60% of the time, while predictions you made with 90% confidence should be correct 90% of the time.

You will not be perfectly calibrated. But you can become better-calibrated over time with many rounds of feedback. That's why weather forecasters are so much more accurate than most other kinds of experts (Murphy & Winkler 1984): every week, they learn whether their predictions were correct. It's harder to improve your calibration when you have to wait 5 or 30 years to see whether your predictions (say, about technological development) were correct, but calibration training in any domain seems to reduce overconfidence in general, since you get to viscerally experience how often you are wrong — even on phenomena that should be easier to predict than long-term technological development.

Perhaps the best online tool for calibration training is PredictionBook.com. For a story of one person becoming better calibrated using PredictionBook.com, see 1001 PredictionBook Nights. Another tool is the Calibration Game, available for Mac, Windows, iOS, and Android.

To counteract overconfident pessimism in particular, be sure to record lots of negative predictions, not just positive predictions.

Finally, it may help to read lists of failed negative predictions. Here you go: one, two, three, four, five, six.

 

 

Notes

1 Armstrong & Sotala (2012) helpfully distinguish "insight" and "grind":

Project managers and various leaders are often quite good at estimating the length of projects... Publication dates for video games, for instance, though often over-optimistic, are generally not ridiculously erroneous – even though video games involve a lot of creative design, play-testing, art, programing the game “AI”, etc. . . Moore’s law could be taken as an ultimate example of grind: we expect the global efforts of many engineers across many fields to average out to a rather predictable exponential growth.

Predicting insight, on the other hand, seems a much more daunting task. Take the Riemann hypothesis, a well-established mathematical hypothesis from 1885. How would one go about estimating how long it would take to solve? How about the P = NP hypothesis in computing? Mathematicians seldom try and predict when major problems will be solved, because they recognise that insight is very hard to predict.

2 See also Speirs-Bridge et al. (2009).

3 The original version of this post incorrectly accused the scientists I've spoken with of overconfidence, but I can't rightly draw that conclusion without knowing the outcomes of their other predictions.

38 comments

Comments sorted by top scores.

comment by cousin_it · 2012-11-24T03:14:32.793Z · LW(p) · GW(p)

How can you say that people are overconfident if you don't count the pessimistic predictions that turned out to be true?

Replies from: Eliezer_Yudkowsky, lukeprog
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-24T23:01:19.022Z · LW(p) · GW(p)

You can't, but you can notice that they're being incoherent if they simultaneously claim to know little about the future and to be able to set strong lower bounds on technology development times.

comment by lukeprog · 2012-11-24T23:19:12.889Z · LW(p) · GW(p)

My mistake; the original post doesn't distinguish between the colloquial term "overconfidence" and "overconfidence" as a term of art in psychology. In the early cases, I meant "overconfidence" as a colloquial term meaning "excessive confidence" — that is, a degree of confidence not justified by one's available information. But later, I used the term to refer to the overconfidence effect, which (as far as I know) is only ever measured in the context of multiple judgments.

I'll try to think of the best way to fix the original post, and I'm open to suggestions!

Edit: I think I've fixed it. I've removed the accusation of overconfidence, while still saying that it might be part of the problem. I've also added a footnote explaining that I've edited the original post to correct the error.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-24T22:58:09.688Z · LW(p) · GW(p)

Role-acting that conflates lack of info with skepticism. If you're acting out the role of a dutifully skeptical scientist, you say that things are a long way away unless you have strong info that they're about to happen, for fear of making a mistaken prediction that makes you sound like a mere enthusiast. Failed negative predictions don't count against the role - they don't reduce your prestige to that of a mere enthusiast. Just imagine it as a serious frown vs. a childish smile; they're trying to give a serious frown, and avoid childish smiles. You're dealing with a behavioral rule for when to agree with verbal statements, not a coherent state of uncertainty. Both "we know very little about the future" and "AI is fifty years off" sound like a serious frown, so they agree with both.

I doubt there's very much more to it than that.

comment by Unnamed · 2012-11-24T02:43:00.660Z · LW(p) · GW(p)

A premortem might be a useful technique to use to counteract overconfident pessimism. It's another variant on "consider the opposite", discussed in Kahneman's Thinking, Fast and Slow and this LW post (among other places). With standard overconfidence (e.g. "our project will succeed"), you do a premortem by saying "it's two months from now and the project has failed; why did it fail?" Suddenly your brain can generate various pathways to failure which had been hidden.

With overconfident pessimism ("there is no way that we'll develop that technology in the next 50 years"), you'd do a premortem by saying "it's 20 years from now and we have that technology; how did we get it?"

Replies from: chaosmosis
comment by chaosmosis · 2012-11-27T00:06:03.648Z · LW(p) · GW(p)

Does anyone find this useful, personally? I've heard it as advice before, but it never helps me.

Replies from: khafra
comment by khafra · 2012-11-27T14:33:09.053Z · LW(p) · GW(p)

In what way does it not help? Does it leave you frozen, inactive? Perhaps also do a premortem for the failure modes caused by doing nothing?

Replies from: chaosmosis
comment by chaosmosis · 2012-11-27T18:42:05.947Z · LW(p) · GW(p)

It doesn't lead to any new insights. I can't generate any thoughts by pretending that it's now the future and that I'm looking back into the past. I don't know whether or not other people do somehow generate new thoughts this way. It sounds plausible while also sounding ridiculous, so I'm unsure whether or not it's legitimate.

comment by ChrisHallquist · 2012-11-24T05:06:07.755Z · LW(p) · GW(p)

Can we compile better lists of false predictions?

With some of them, it's not clear whether they were even intended as predictions; they may have been intended as statements about the technology at the time. Taken in that sense, they may have been true:

"I think there is a world market for maybe five computers." -- Thomas Watson, chairman of IBM, 1943.

Was there much demand for computers as they existed in 1943?

There's also the problem that some of the claims may not have been intended as claims about what will happen, but about what should happen. For example, the Darwin quote:

"I see no good reasons why the views given in this volume should shock the religious sensibilities of anyone." -- Charles Darwin, The Origin Of Species, 1869.

That doesn't entail people's religious sensibilities won't be shocked for bad reasons.

Also, these lists don't cite their sources very well.

comment by jsteinhardt · 2012-11-24T03:33:22.271Z · LW(p) · GW(p)

How do we know that calibration training will improve our calibration on long-term predictions? It would seem that we necessarily have little evidence about the efficacy of short-term calibration training on calibrating long-term predictions.

Replies from: lukeprog, Qiaochu_Yuan
comment by lukeprog · 2012-11-24T21:33:03.014Z · LW(p) · GW(p)

We can't know this with much confidence, but it seems likely to me. The reason is pretty simple: most people are wildly overconfident, and calibration training reduces people's confidence in their predictions. It's hard to be as underconfident as most people are overconfident, so calibration training should improve one's accuracy in general. Indeed, several studies show calibration transfer between particular domains (i.e. calibration training in one domain improves one's accuracy in another domain), though it's true I'm not aware of a study showing specifically that calibration training with short-term predictions improves one's accuracy with long-term predictions. But if that wasn't the case, then this would be an exception to the general rule, and I don't see a good reason to think it will turn out to be such an exception.

comment by Qiaochu_Yuan · 2012-11-25T08:26:43.456Z · LW(p) · GW(p)

A simple model of calibration training is that it helps you more honestly integrate whatever evidence is floating around in your brain pertaining to a subject. Whether a prediction is short-term or long-term ought to be less important than other aspects of the quality of that evidence. This model predicts that, for example, calibration training on short-term predictions about which one has very little evidence should improve calibration on long-term predictions about which one also has very little evidence.

And people regularly make both short- and long-term predictions on PredictionBook, so in 5 to 10 years...

Replies from: lukeprog
comment by lukeprog · 2012-11-26T02:07:36.929Z · LW(p) · GW(p)

Yes, I've been trying to make both short- and long-term predictions on PredictionBook.

comment by JoshuaZ · 2012-11-24T01:13:09.669Z · LW(p) · GW(p)

I've been uncomfortable for a while with statements like Eliezer's remark that:

When heavier-than-air flight or atomic energy was a hundred years off, it looked fifty years off or impossible; when it was five years off, it still looked fifty years off or impossible.

This really is picking and choosing specific technological examples rather than looking at the overall pattern. In 1964, five years before the first moon landing, it looked a few years off but certainly not a hundred years off.

Perhaps the best online tool for calibration training is PredictionBook.com

I strongly agree with this. I've used it to make a variety of predictions, including tech predictions. One issue it does have is that there's no easy categorization so one can't use it for example to see at a glance whether one's tech predictions are more or less accurate than one's predictions about politics or other subjects.

Mathematicians seldom try and predict when major problems will be solved, because they recognise that insight is very hard to predict.

Noteworthy counterexample: Soon after the Feit-Thompson theorem, people started talking about classifying all finite simple groups, but this was because Gorenstein had a specific blueprint that was thought might be able to get the full result. But even then, the time period was shorter.

In cases like the Riemann hypothesis we have a few ideas of things that might work, but none look that promising, and results one would expect to fall first, like the Lindelof hypothesis remain apparently unassailable. So one major sign of a problem being genuinely far off is that even to our eyes, much simpler problems look far off. I'm not sure how to apply that to AI. Do modern practical successes like machine learning count plausibly as successes of related minor aspects? It will be a lot easier to tell after there's some form of general AI and we have more of an idea about its structure. Similar issues apply to almost any future tech.

Replies from: lukeprog
comment by lukeprog · 2012-11-24T01:21:00.544Z · LW(p) · GW(p)

This really is picking and choosing specific technological examples rather than looking at the overall pattern. In 1964, five years before the first moon landing, it looked a few years off but certainly not a hundred years off.

I don't think Eliezer meant to say that breakthrough technologies always seem 50 years off or impossible until they are invented. Those who were paying attention to computer chess could predict it passing the human level before the end of the millenium, and we've seen self-driving cars coming for a while now. Anyway, I've added a clarifying note below the Eliezer quote, now.

Replies from: None
comment by [deleted] · 2012-11-24T16:24:58.221Z · LW(p) · GW(p)

I don't think Eliezer meant to say that breakthrough technologies always seem 50 years off or impossible until they are invented.

I don't think JoshuaZ meant to say Eliezer meant to say that. It seems more like he just meant that the list feels cherry-picked; that the examples given seem to be chosen for their suitability to the argument rather than because they form a compelling signal when compared against other relevant data points.

comment by A1987dM (army1987) · 2012-11-24T10:36:32.092Z · LW(p) · GW(p)

It don't think it's just a matter of pessimism -- the phenomenon whereby “I cannot predict X” somehow becomes “X is not going to happen” can also happen when X is a bad thing (e.g. “there will be a major earthquake in L'Aquila in the next few days”).

EDIT: If you've heard the story about the convicted scientists, please read the article linked to, as most news coverage seriously distorted the story.

comment by The_Duck · 2012-11-24T01:20:59.416Z · LW(p) · GW(p)

The examples make the point that it's possible to be too pessimistic, and too confident in that pessimism. However, maybe we can figure out when we should be confidently pessimistic.

For example, we can be very confidently pessimistic about the prospects for squaring the circle or inventing perpetual motion. Here we have mathematical proofs of impossibility. I think we can be almost as confidently pessimistic about the near-term prospects for practical near-light-speed travel. Here we have a good understanding of the scope of the problem and of the capabilities of all practical sources of propulsion, and we can see that those capabilities are nowhere near enough.

Let's not just leave it at "it's possible to be too pessimistic." How can we identify problems about which we can be confidently pessimistic?

Replies from: lukeprog
comment by lukeprog · 2012-11-24T01:40:41.270Z · LW(p) · GW(p)

Yes, an important question, though not one I wanted to tackle in this post!

In general, we seem to do better at predicting things when we use a model with moving parts, and we have opportunity to calibrate our probabilities for many parts of the model. If we built a model that made a negative prediction about the near-term prospects for a specific technology after we had calibrated many parts of the model on lots of available data, that should be a way to increase our confidence about the near-term prospects for that technology.

The most detailed model for predicting AI that I know of is The Uncertain Future (not surprisingly, an SI project), though unfortunately the current Version 1.0 isn't broken down into parts so small that they are easy to calibrate. For an overview of the motivations behind The Uncertain Future, see Changing the Frame of AI Futurism: From Storytelling to Heavy-Tailed, High-Dimensional Probability Distributions.

comment by Sniffnoy · 2012-11-24T06:34:05.573Z · LW(p) · GW(p)

I'm not sure how comparable AI and nuclear fusion are. AI is clearly a case where new insights are needed, but fusion -- to the best of my knowledge, anyway -- might be obtainable via sufficient grinding. So while certainly people are still looking for new insights into nuclear fusion, one could perhaps predict how long it would obtain fusion by pure grinding. And if you expect no new insights, you would consider this a good estimate -- although as you point out, expecting no new insights is probably pessimistic.

Replies from: JoshuaZ
comment by JoshuaZ · 2012-11-24T14:37:54.088Z · LW(p) · GW(p)

I suspect that to some extent this may be a problem of drawing the line between "grinding" and "insight". It may be difficult to tell where the line is and these may be categories that are particularly fuzzy when one does't yet have the details.

comment by kilobug · 2012-11-25T10:09:25.546Z · LW(p) · GW(p)

Not directly related to the topic, but since you're speaking of PredictionBook, there is a question I would like to ask : it seems from http://predictionbook.com/predictions that the PredictionBook crowd is mostly calibrated, on average, except on the extrema (100%/0%). How does that match with the "people are broadly overconfident" studies ? The two dataset seem contradictory to me. I notice I'm confused.

I could pop explanations like "people on prediction book are not representatives of people in general" or "the kind of predictions made on prediction book isn't the same" but they sound more like rationalizations (popping an explanation with poor data backing it to avoid admitting confusion), so I don't accept them.

Does anyone here has better answers (or data back to my "guesses") on that data contradiction ?

Replies from: gwern, CCC, Jayson_Virissimo
comment by gwern · 2012-11-28T04:54:08.134Z · LW(p) · GW(p)

Calibration is trainable. (I would hardly be engaged in it if the studies had shown overconfidence to be incorrigible.) BTW, much more surprising is that generating random numbers is also trainable if the subjects are given access to statistical tests of the quality of their randomness.

comment by CCC · 2012-11-26T08:12:23.368Z · LW(p) · GW(p)

Hmmm. It seems likely that some people will be overconfident, and some will be underconfident.

I would guess that a new visitor to the site will more likely be overconfident than underconfident; that implies that the old visitors, those who have practiced a bit, may be slightly more likely to be underconfident than overconfident.

comment by Jayson_Virissimo · 2012-11-28T04:36:05.460Z · LW(p) · GW(p)

I thought through precisely those same explanations myself. Currently, I'm leaning towards overconfidence bias being one of those "biases" that is easy to reproduce in the artificial situations created in the laboratory, but that diminishes quickly with feedback (like would usually happen in the "real world").

comment by CronoDAS · 2012-11-24T01:17:55.909Z · LW(p) · GW(p)

We still don't have economical flying cars. :(

(Doing something cheaply and a million times is a lot harder than doing it expensively and once.)

Replies from: fubarobfusco
comment by fubarobfusco · 2012-11-24T04:21:38.774Z · LW(p) · GW(p)

The problem "flying car" was incorrectly described as an aeronautical engineering problem — as if inventing a compact, two- to four-seater aircraft that needed only a short runway — and ideally could use highways — was the hard part. Well, ultralights and roadable aircraft have been invented.

The problem is one of turning the invention into a transportation system ... and that's a different problem, involving safety, automation, and policy too. It turns out that bad pilots can do a lot more damage than bad drivers, for instance ....

comment by devas · 2012-11-24T10:02:28.181Z · LW(p) · GW(p)

Now I'm wondering how this kind of bias operates outside of science, and specifically with what confidence we can expect insane things to be disregarded.

More in detail, I'm wondering how long homeopathy can survive while all experts can attest that it's not useful. The case of Miracle Mineral Supplement which Eliezer mentioned recently, seems to show that people will stop doing absurd things, when it is shown exactly how absurd they are. The question is, how long does it take for this to happen? After all, people still read horoscopes!

Replies from: NancyLebovitz, mfb
comment by NancyLebovitz · 2012-11-24T23:26:00.588Z · LW(p) · GW(p)

What proportion of people read horoscopes for entertainment rather than taking astrology seriously? It might be plausible that there's something unhealthy about reading horoscopes at all if you have a sense of personal application, but that might need some proof.

Replies from: devas
comment by devas · 2012-11-25T13:31:03.626Z · LW(p) · GW(p)

It is true that the vast, vast majority of people don't take horoscopes seriously, but still, they do in fact take up resources which could be freed up and better employed elsewhere; even if it's just some guy working at a newspaper who can now enjoy some more time to edit his other articles, I think it would still be a better state for the world to be in.

I haven't done even the simplest back-of-the-envelope calculations for it, so take that statement as fuzzy and dubious.

Also, I suppose it just bugs me it is treated even jokingly as something that can work...I actually need to work on being more flexible and less rude/cruel/pedantic, I suppose.

Replies from: Strange7
comment by Strange7 · 2012-11-27T23:10:41.282Z · LW(p) · GW(p)

What time of year you're born can have an impact on personality, due to environmental factors on early development. Also, newspaper horoscopes are a potentially useful source of nonspecific but authoritative-sounding advice to go out and do something. Enough people might need to be told that on a regular basis to justify the societal costs involved.

Replies from: devas, wedrifid
comment by devas · 2012-11-28T11:12:50.681Z · LW(p) · GW(p)

True, but I don't think the people writing horoscopes know or care about the influences your date of birth has or will have in your life. And as for the societal costs...I think they're worse than they appear at first glance, since they foster an attitude of "magic has been proven not to exist, but who cares, let's believe in it anyway!" of which I'm afraid

comment by wedrifid · 2012-11-28T03:16:33.008Z · LW(p) · GW(p)

What time of year you're born can have an impact on personality, due to environmental factors on early development.

Totally. I was born near Christmas. This means I on average got slightly less presents. I may have been scarred for life. Or perhaps the trivial hardship prepared me for challenges and contributed to me accepting the unfairness of life. Something.

comment by mfb · 2012-11-24T19:45:01.726Z · LW(p) · GW(p)

I don't think this will die soon, similar to many other obscure types of "medicine". Proper medical treatments can fail, and in that case many are looking for alternatives. Add some "$person was treated with §method and $symptom went away!"-"confirmations", and you have a market for that.

Replies from: CCC
comment by CCC · 2012-11-26T08:15:41.336Z · LW(p) · GW(p)

If a person (a) is poorly, (b) receives treatment intended to make him better, and (c) gets better, then no power of reasoning known to medical science can convince him that it may not have been the treatment that restored his health.

-- Sir Peter Medawar, "The Art of the Soluble"

comment by Epiphany · 2012-11-30T09:16:28.608Z · LW(p) · GW(p)

A virtuous doubt that applies to us:

Doubt that LessWrong is gifted.

Here, I consider the opposite.

I realize this isn't a prediction that people are overconfident and pessimistic about, but these are such great concepts, I would like to apply them everywhere.

In addition to my experiences where LessWrongers have been surprisingly overconfident and pessimistic about IQ claims, Yvain has also made an observation in the 2012 survey with "some people have been pretty quick to ridicule this survey's intelligence numbers as completely useless and impossible and so on". It's not just me who is encountering this, so I wonder if this might be common mistake that LessWrongers make.

comment by Bruno_Coelho · 2012-11-28T14:08:25.758Z · LW(p) · GW(p)

Some pessimists expect a linear grown in sofistication, with a few insights along the way, but not sequential insights from one single group. Safe AI designs are more hard than unsafe ones. Normally hard problems are solved after the easy, and by different people. If FAI is a hard-rather-than-easy project, the results will appear after the unsafe AGIs. However, this could not be the case, if AI research change.

comment by CCC · 2012-11-26T08:27:34.137Z · LW(p) · GW(p)

It's probably also worth noting that widely-known and accepted prediction will, in turn, have an influence on what they predict. When a physicist has an idea for a very expensive experiment that may lead to nuclear fusion and has to get funding, he's more likely to get funding from an agency that feels that nuclear fusion is 'just around the corner' than from one that feels that nuclear fusion is 'impossible within the next fifty years'.

And before that, of course, the physicist himself has to decide in which direction he will take his research. The R&D departments of commercial enterprises will usually want to direct their research towards outcomes that are likely to result in something that can be solved, and therefore sold, reasonably soon. Therefore, most commercial enterprises are unlikely to run across any unexpected insights.