# On Overconfidence

post by Scott Alexander (Yvain) · 2015-08-21T02:21:00.000Z · score: 39 (19 votes) · LW · GW · 3 comments## Contents

I. II. III. None 3 comments

*[Epistemic status: This is basic stuff to anyone who has read the Sequences, but since many readers here haven’t I hope it is not too annoying to regurgitate it. Also, ironically, I’m not actually that sure of my thesis, which I guess means I’m extra-sure of my thesis]*

**I.**

A couple of days ago, the Global Priorities Project came out with a calculator that allowed you to fill in your own numbers to estimate how concerned you should be with AI risk. One question asked how likely you thought it was that there would be dangerous superintelligences within a century, offering a drop down menu with probabilities ranging from 90% to 0.01%. And so people objected: there should be options to put in only one a million chance of AI risk! One in a billion! One in a…

For example, a commenter writes that: “the best (worst) part: the probability of AI risk is selected from a drop down list where the lowest probability available is 0.01%!! Are you kidding me??” and then goes on to say his estimate of the probability of human-level (not superintelligent!) AI this century is “very very low, maybe 1 in a million or less”. Several people on Facebook and Tumblr say the same thing – 1/10,000 chance just doesn’t represent how sure they are that there’s no risk from AI, they want one in a million or more.

Last week, I mentioned that Dylan Matthews’ suggestion that maybe there was only 10^-67 chance you could affect AI risk was stupendously overconfident. I mentioned that was thousands of lower than than the chance, *per second*, of getting simultaneously hit by a tornado, meteor, and al-Qaeda bomb, while *also* winning the lottery twice in a row. Unless you’re comfortable with that level of improbability, you should stop using numbers like 10^-67.

But maybe it sounds like “one in a million” is much safer. That’s only 10^-6, after all, way below the tornado-meteor-terrorist-double-lottery range…

So let’s talk about overconfidence.

Nearly everyone is very very very overconfident. We know this from experiments where people answer true/false trivia questions, then are asked to state how confident they are in their answer. If people’s confidence was well-calibrated, someone who said they were 99% confident (ie only 1% chance they’re wrong) would get the question wrong only 1% of the time. In fact, people who say they are 99% confident get the question wrong about 20% of the time.

It gets worse. People who say there’s only a 1 in 100,000 chance they’re wrong? Wrong 15% of the time. One in a million? Wrong 5% of the time. They’re not just overconfident, they are *fifty thousand times* as confident as they should be.

This is not just a methodological issue. Test confidence in some other clever way, and you get the same picture. For example, one experiment asked people how many numbers there were in the Boston phone book. They were instructed to set a range, such that the true number would be in their range 98% of the time (ie they would only be wrong 2% of the time). In fact, they were wrong 40% of the time. Twenty times too confident! What do you want to bet that if they’d been asked for a range so wide there was only a one in a million chance they’d be wrong, at least five percent of them would have bungled it?

Yet some people think they can predict the future course of AI with one in a million accuracy!

Imagine if every time you said you were sure of something to the level of 999,999/1 million, and you were right, the Probability Gods gave you a dollar. Every time you said this and you were wrong, you lost $1 million (if you don’t have the cash on hand, the Probability Gods offer a generous payment plan at low interest). You might feel like getting some free cash for the parking meter by uttering statements like “The sun will rise in the east tomorrow” or “I won’t get hit by a meteorite” without much risk. But would you feel comfortable predicting the course of AI over the next century? What if you noticed that most other people only managed to win $20 before they slipped up? Remember, if you say even one false statement under such a deal, all of your true statements you’ve said over years and years of perfect accuracy won’t be worth the hole you’ve dug yourself.

Or – let me give you another intuition pump about how hard this is. Bayesian and frequentist statistics are pretty much the same thing [citation needed] – when I say “50% chance this coin will land heads”, that’s the same as saying “I expect it to land heads about one out of every two times.” By the same token, “There’s only a one in a million chance that I’m wrong about this” is the same as “I expect to be wrong on only one of a million statements like this that I make.”

What do a million statements look like? Suppose I can fit twenty-five statements onto the page of an average-sized book. I start writing my predictions about scientific and technological progress in the next century. “I predict there will not be superintelligent AI.” “I predict there will be no simple geoengineering fix for global warming.” “I predict no one will prove P = NP.” *War and Peace*, one of the longest books ever written, is about 1500 pages. After you write enough of these statements to fill a *War and Peace* sized book, you’ve made 37,500. You would need to write about 27 *War and Peace* sized books – enough to fill up a good-sized bookshelf – to have a million statements.

So, if you want to be confident to the level of one-in-a-million that there won’t be superintelligent AI next century, you need to believe that you can fill up 27 *War and Peace* sized books with similar predictions about the next hundred years of technological progress – and be wrong – at most – once!

This is especially difficult because claims that a certain form of technological progress will not occur have a very poor track record of success, even when uttered by the most knowledgeable domain experts. Consider how Nobel-Prize winning atomic scientist Ernest Rutherford dismissed the possibility of nuclear power as “the merest moonshine” *less than a day* before Szilard figured out how to produce such power. In 1901, Wilbur Wright told his brother Orville that “man would not fly for fifty years” – two years later, they flew, leading Wilbur to say that “ever since, I have distrusted myself and avoided all predictions”. Astronomer Joseph de Lalande told the French Academy that “it is impossible” to build a hot air balloon and “only a fool would expect such a thing to be realized”; the Montgolfier brothers flew less than a year later. This pattern has been so consistent throughout history that sci-fi titan Arthur C. Clarke (whose own predictions were often eerily accurate) made a heuristic out of it under the name Clarke’s First Law: “When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”

Also – one good heuristic is to look at what experts in a field think. According to Muller and Bostrom (2014), a sample of the top 100 most-cited authors in AI ascribed a > 70% probability to AI within a century, a 50% chance of superintelligence conditional on human-level, and a 10% chance of existential catastrophe conditional on human level AI. Multiply it out, and you get a couple percent chance of superintelligence-related existential catastrophe in the next century.

Note that my commenter wasn’t disagreeing with the 4% chance. They were disagreeing with the possibility that *there would be human-level AI at all*, that is, the 70% chance! That means that he was saying, essentially, that he was confident he could write a million sentences – that is, twenty-seven *War and Peace*‘s worth – all of which were trying to predict trends in a notoriously difficult field, all of which contradicted a well-known heuristic about what kind of predictions you should never try to make, all of which contradicted the consensus opinion of the relevant experts – and only have one of the million be wrong!

But if you feel superior to that because you don’t believe there’s only a one-in-a-million chance of human-level AI, you just believe there’s a one-in-a-million chance of existential catastrophe, you are missing the point. Okay, you’re not 300,000 times as confident as the experts, you’re only 40,000 times as confident. Good job, here’s a sticker.

Seriously, when people talk about being able to defy the experts a million times in a notoriously tricky area they don’t know much about and only be wrong once – I don’t know what to think. Some people criticize Eliezer Yudkowsky for being overconfident in his favored interpretation of quantum mechanics, but he doesn’t even attach a number to that. For all I know, maybe he’s only 99% sure he’s right, or only 99.9%, or something. If you are absolutely outraged that he is claiming one-in-a-thousand certainty on something that doesn’t much matter, shouldn’t you be literally a thousand times more outraged when every day people are claiming one-in-a-million level certainty on something that matters very much? It is *almost impossible* for me to comprehend the mindsets of people who make a Federal Case out of the former, but are totally on board with the latter.

*Everyone* is overconfident. When people say one-in-a-million, they are wrong five percent of the time. And yet, people keep saying “There is only a one in a million chance I am wrong” on issues of making really complicated predictions about the future, where many top experts disagree with them, and where the road in front of them is littered with the bones of the people who made similar predictions before. HOW CAN YOU DO THAT?!

**II.**

I am of course eliding over an important issue. The experiments where people offering one-in-a-million chances were wrong 5% of the time were on true-false questions – those with only two possible answers. There are other situations where people can often say “one in a million” and be right. For example, I confidently predict that if you enter the lottery tomorrow, there’s less than a one in a million chance you will win.

On the other hand, I feel like I can justify that. You want me to write twenty-seven *War and Peace* volumes about it? Okay, here goes. “Aaron Aaronson of Alabama will not win the lottery. Absalom Abramowtiz of Alaska will not win the lottery. Achitophel Acemoglu of Arkansas will not win the lottery.” And so on through the names of a million lottery ticket holders.

I think this is what statisticians mean when they talk about “having a model”. Within the model where there are a hundred million ticket holders, and we know exactly one will be chosen, our predictions are on very firm ground, and our intuition pumps reflect that.

Another way to think of this is by analogy to dart throws. Suppose you have a target that is half red and half blue; you are aiming for red. You would have to be very very confident in your dart skills to say there is only a one in a million chance you will miss it. But if there is a target that is 999,999 millionths red, and 1 millionth blue, then you do not have to be at *all* good at darts to say confidently that there is only a one in a million chance you will miss the red area.

Suppose a Christian says “Jesus might be God. And he might not be God. 50-50 chance. So you would have to be incredibly overconfident to say you’re sure he isn’t.” The atheist might respond “The target is full of all of these zillions of hypotheses – Jesus is God, Allah is God, Ahura Mazda is God, Vishnu is God, a random guy we’ve never heard of is God. You are taking a tiny tiny submillimeter-sized fraction of a huge blue target, painting it red, and saying that because there are two regions of the target, a blue region and a red region, you have equal chance of hitting either.” Eliezer Yudkowsky calls this “privileging the hypothesis”.

There’s a tougher case. Suppose the Christian says “Okay, I’m not sure about Jesus. But either there is a Hell, or there isn’t. Fifty fifty. Right?”

I think the argument against this is that there are way more ways for there not to be Hell than there are for there to be Hell. If you take a bunch of atoms and shake them up, they usually end up as not-Hell, in much the same way as the creationists’ fabled tornado-going-through-a-junkyard usually ends up as not-a-Boeing-747. For there to be Hell you have to have some kind of mechanism for judging good vs. evil – which is a small part of the space of all mechanisms, let alone the space of all things – some mechanism for diverting the souls of the evil to a specific place, which same, some mechanism for punishing them – again same – et cetera. Most universes won’t have Hell unless you go through a *lot* of work to put one there. Therefore, Hell existing is only a very tiny part of the target. Making this argument correctly would require an in-depth explanation of formalizations of Occam’s Razor, which is outside the scope of this essay but which you can find on the LW Sequences.

But this kind of argumentation is really hard. Suppose I predict “Only one in 150 million chance Hillary Clinton will be elected President next year. After all, there are about 150 million Americans eligible for the Presidency. It could be any one of them. Therefore, Hillary covers only a tiny part of the target.” Obviously this is wrong, but it’s harder to explain how. I would say that your dart-aim is guided by an argument based on a concrete numerical model – something like “She is ahead in the polls by X right now, and candidates who are ahead in the polls by X usually win about 50% of the time, therefore, her real probability is more like 50%.”

Or suppose I predict “Only one in a million chance that Pythagoras’ Theorem will be proven wrong next year.” Can I get away with that? I can’t *quite* appeal to “it’s been proven”, because there might have been a mistake in (all the) proofs. But I could say: suppose there are five thousand great mathematical theorems that have undergone something like the level of scrutiny as Pythagoras’, and they’ve been known on average for two hundred years each. None of them have ever been disproven. That’s a numerical argument that the rate of theorem-disproving is less than one per million years, and I think it holds.

Another way to do this might be “there are three hundred proofs of Pythagoras’ theorem, so even accepting an absurdly high 10%-per-proof chance of being wrong, the chance is now only 10^-300.” Or “If there’s a 10% chance each mathematician reading a proof missing something, and one million mathematicians have read the proof of Pythagoras’ Theorem, then the probability that they all missed it is more like 10^-1,000,000.”

But this can get tricky. Suppose I argued “There’s a good chance Pythagoras’ Theorem will be disproven, because of all Pythagoras’ beliefs – reincarnation, eating beans being super-evil, ability to magically inscribe things on the moon – most have since been disproven. Therefore, the chance of a randomly selected Pythagoras-innovation being wrong is > 50%.”

Or: “In 50 past presidential elections, none have been won by women. But Hillary Clinton is a woman. Therefore, the chance of her winning this election is less than 1/50.”

All of this stuff about adjusting for size of the target or for having good mathematical models is really hard and easy to do wrong. And then you have to add another question: are you sure, to a level of one-in-a-million, that you didn’t mess up your choice of model at all?

Let’s bring this back to AI. Suppose that, given the complexity of the problem, you predict with utter certainty that we will not be able to invent an AI this century. But if the modal genome trick pushed by people like Greg Cochran works out, within a few decades we might be able to genetically engineer humans far smarter than any who have ever lived. Given tens of thousands of such supergeniuses, might we be able to solve an otherwise impossible problem? I don’t know. But if there’s a 1% chance that we can perform such engineering, and a 1% chance that such supergeniuses can invent artificial intelligence within a century, then the probability of AI within the next century isn’t one in a million, it’s one in ten thousand.

Or: consider the theory that all the hard work of brain design has been done by the time you have a rat brain, and after that it’s mostly just a matter of scaling up. You can find my argument for the position in this post – search for “the hard part is evolving so much as a tiny rat brain”. Suppose there’s a 10% chance this theory is true, and a 10% chance that researchers can at least make rat-level AI this century. Then the chance of human-level AI is not one in a million, but one in a hundred.

Maybe you disagree with both of these claims. The question is: *did you even think about them before you gave your one in a million estimate*? How many other things are there that you never thought about? Now your estimate has, somewhat bizarrely, committed you to saying there’s a less than one in a million chance we will significantly enhance human intelligence over the next century, *and* a less than one in a million chance that the basic-scale-up model of intelligence is true. You may never have thought directly about these problems, but by saying “one in a million chance of AI in the next hundred years”, you are not only committing yourself to a position on them, but committing yourself to a position with one-in-a-million level certainty even though several domain experts who have studied these fields for their entire lives disagree with you!

A claim like “one in a million chance of X” not only implies that your model is strong enough to spit out those kinds of numbers, but that there’s only a one in a million chance you’re using the wrong model, or missing something, or screwing up the calculations.

A few years ago, a group of investment bankers came up with a model for predicting the market, and used it to design a trading strategy which they said would meet certain parameters. In fact, they said that there was only a one in 10^135 chance it would fail to meet those parameters during a given year. A human just uttered the probability “1 in 10^135”, so you can probably guess what happened. The very next year was the 2007 financial crisis, the model wasn’t prepared to deal with the extraordinary fallout, the strategy didn’t meet its parameters, and the investment bank got clobbered.

This is why I don’t like it when people say we shouldn’t talk about AI risk because it involves “Knightian uncertainty”. In the real world, Knightian uncertainty collapses back down to plain old regular uncertainty. When you are an investment bank, the money you lose because of normal uncertainty and the money you lose because of Knightian uncertainty are denominated in the same dollars. Knightian uncertainty becomes just another reason not to be overconfident.

**III.**

I came back to AI risk there, but this isn’t just about AI risk.

You might have read Scott Aaronson’s recent post about Aumann Agreement Theorem, which says that rational agents should be able to agree with one another. This is a nice utopian idea in principle, but in practice, well, nobody seems to be very good at carrying it out.

I’d like to propose a more modest version of Aumann’s agreement theorem, call it Aumann’s Less-Than-Total-Disagreement Theorem, which says that two rational agents shouldn’t both end up with 99.9…% confidence on opposite sides of the same problem.

The “proof” is pretty similar to the original. Suppose you are 99.9% confident about something, and learn your equally educated, intelligent, and clear-thinking friend is 99.9% confident of the opposite. Arguing with each other and comparing your evidence fails to make either of you budge, and neither of you can marshal the weight of a bunch of experts saying you’re right and the other guy is wrong. Shouldn’t the fact that your friend, using a cognitive engine about as powerful as your own, got so heavily different a conclusion make you worry that you’re missing something?

But practically everyone is walking around holding 99.9…% probabilities on the opposite sides of important issues! I checked the Less Wrong Survey, which is as good a source as any for people’s confidence levels on various tough questions. Of the 1400 respondents, about 80 were at least 99.9% certain that there were intelligent aliens elsewhere in our galaxy; about 170 others were at least 99.9% certain that they weren’t. At least 80 people just said they were certain to one part in a thousand and then got the answer wrong! And some of the responses were things like “this box cannot fit as many zeroes as it would take to say how certain I am”. Aside from stock traders who are about to go bankrupt, *who says that sort of thing??!*

And speaking of aliens, imagine if an alien learned about this particular human quirk. I can see them thinking *“Yikes, what kind of a civilization would you get with a species who routinely go around believing opposite things, always with 99.99…% probability?”*

Well, funny you should ask.

I write a lot about free speech, tolerance of dissenting ideas, open-mindedness, et cetera. You know which posts I’m talking about. There are a lot of reasons to support such a policy. But one of the big ones is – who the heck would burn heretics if they thought there was a 5% chance the heretic was right and they were wrong? Who would demand that dissenting opinions be banned, if they were only about 90% sure of their

own? Who would start shrieking about “human garbage” on Twitter when they fully expected that in some sizeable percent of cases, they would end up being wrong and the garbage right?

Noah Smith recently asked why it was useful to study history. I think at least one reason is to medicate your own overconfidence. I’m not just talking about things like “would Stalin have really killed all those people if he had considered that he was wrong about communism” – especially since I don’t think Stalin worked that way. I’m talking about Neville Chamberlain predicting “peace in our time”, or the centuries when Thomas Aquinas’ philosophy was the preeminent Official Explanation Of Everything. I’m talking about Joseph “no one will ever build a working hot air balloon” Lalande. And yes, I’m talking about what Muggeridge writes about, millions of intelligent people thinking that Soviet Communism was great, and ending out disastrously wrong. Until you see how often people just like you have been wrong in the past, it’s hard to understand how uncertain you should be that you are right in the present. If I had lived in 1920s Britain, I probably would have been a Communist. What does that imply about how much I should trust my beliefs today?

There’s a saying that “the majority is always wrong”. Taken literally it’s absurd – the majority thinks the sky is blue, the majority don’t believe in the Illuminati, et cetera. But what it *might* mean, is that in a world where everyone is overconfident, the majority will always be wrong about which direction to move the probability distribution in. That is, if an ideal reasoner would ascribe 80% probability to the popular theory and 20% to the unpopular theory, perhaps most real people say 99% popular, 1% unpopular. In that case, if the popular people are urging you to believe the popular theory more, and the unpopular people are urging you to believe the unpopular theory more, the unpopular people are giving you better advice. This would create a strange situation in which good reasoners are usually engaged in disagreeing with the majority, and also usually “arguing for the wrong side” (if you’re not good at thinking probablistically, and almost no one is), but remain good reasoners and the ones with beliefs most likely to produce good outcomes. Unless you count “why are all of our good reasoners being burned as witches?” as a bad outcome.

I started off by saying this blog was about “the principle of charity”, but I had trouble defining it and in retrospect I’m not that good at it anyway. What can be salvaged from such a concept? I would say “behave the way you would if you were less than insanely overconfident about most of your beliefs.” This is the Way. The rest is just commentary.

**Discussion Questions** (followed by my own answers in ROT13)

1. What is your probability that there is a god? (Svir creprag)

2. What is your probability that psychic powers exist? (Bar va bar gubhfnaq)

3. What is your probability that anthropogenic global warming will increase temperatures by at least 1C by 2050? (Avargl creprag)

4. What is your probability that a pandemic kills at least one billion people in a 5 year period by 2100? (Svsgrra creprag)

5. What is your probability that humans land on Mars by 2050? (Rvtugl creprag)

6. What is your probability that superintelligent AI (=AI better than almost every human at almost every cognitive task) exists by 2115? (Gjragl svir creprag)

## 3 comments

Comments sorted by top scores.

I’d like to propose a more modest version of Aumann’s agreement theorem, call it Aumann’s Less-Than-Total-Disagreement Theorem, which says that two rational agents shouldn’t both end up with 99.9…% confidence on opposite sides of the same problem.

So, it seems that actually not both of these people need to be so confident -

If one says "I give a 99.9% chance that AGI won't happen", the second person doesn't need the same confidence in the opposite direction, just the same confidence in that the first person shouldn't be so confident. thus (if the first disagrees with him), we again end up in a situation where two people are very confident in the opposite direction, first person thinks with a 99.9% certainty that he should be that certain about AGI, and the second person thinks with a 99.9% certainty that he shouldn't.

Though i'm kinda confused now, cause it seems that if BOTH now need to not be so confident, than it seems that once the second person lowers his confidence, the other person can pump it back up, and maybe it goes on like that forever. frankly, i don't understand Bayesian math enough to answer myself.

(Also, is there a reason there are almost no comments on these posts?)

(Also, is there a reason there are almost no comments on these posts?)

They are reposts from slatestarcodex.com.

1. What is your probability that there is a god? (Svir creprag)

1/100. I could just be really wrong. It doesn't say a god that cares about humans, which has a much lower probability.

2. What is your probability that psychic powers exist? (Bar va bar gubhfnaq)

1/10000 Goes against the spirit of the post but...

3. What is your probability that anthropogenic global warming will increase temperatures by at least 1C by 2050? (Avargl creprag)

80/100

4. What is your probability that a pandemic kills at least one billion people in a 5 year period by 2100? (Svsgrra creprag)

4/100

5. What is your probability that humans land on Mars by 2050? (Rvtugl creprag)

1/50

6. What is your probability that superintelligent AI (=AI better than almost every human at almost every cognitive task) exists by 2115? (Gjragl svir creprag)

7/10