Confidence Confusion
post by alkjash · 2018-02-16T02:00:00.852Z · LW · GW · 15 commentsContents
The Dinosaur Market Precision is Confidence A Thought Experiment None 15 comments
“Captain, if you steer the Enterprise directly into that black hole, our probability of surviving is only 2.234%” Yet nine times out of ten the Enterprise is not destroyed. What kind of tragic fool gives four significant digits for a figure that is off by two orders of magnitude?
This post poses a basic question about probabilities that I’m confused about. Insight would be appreciated.
It’s inspired by a quick skim I gave to Proofiness, which argues that precise numbers are a powerful persuasion technique.
The Dinosaur Market
One of the CFAR prediction markets was: A randomly selected participant will correctly name seven real species of dinosaurs.
I made a series of suspect Fermi estimate as follows:
Extrapolating from the current bids, half the participants will not bid in this market. If selected, they will just try to name as many dinosaurs as they can. 20% of participants will be able to.
Half the participants do bid in this market. Among those that bid high on the market, 70% will take the time to study dinosaurs and memorize seven. The other half who bid low will intentionally or unintentionally fail if they’re paying attention. I give them a 10% chance of success.
That comes out to a total of 30%.
There’s a 5% chance of anomalous situations such as one person caring enough to teach people or publicly post dinosaur names. In this case much higher chances of market evaluating to True, say 60%.
I arrived at a probability estimate of .95 * .3 + .05 * .6 = .315.
At this point, I felt obligated to round 31.5% to 30%, and so I bid 30 on the market instead of 31 or 32. Is there a valid reason to do so?
Precision is Confidence
I’ve been focusing on my aversion to report high-precision numbers, even if I believe them to be closer to the truth. When I report 31.5%, I feel more confident than I am.
Teasing out what it means for 31.5% to be more confident 30% the issue is that any number comes with an implicit confidence interval based on the number of significant figures. 30 really means , whereas 31.5 really means .
In the absence of explicitly reporting confidence intervals around every probability estimate, 30 thus feels like a more honest report of my actual beliefs. Despite the simultaneous fact that I would buy at any price less than 31 and sell at any price over 32.
While explicitly reporting confidence intervals solves the issue – I’d rather say instead of – this strategy seems impractical and carries its own signalling problems.
A Thought Experiment
A large part of my aversion to putting precise numbers on beliefs is a result of the type of error above: in a social setting you cannot just say what you mean, and in particular the number of significant figures also signifies a level of confidence/information. This seems to be the norm: almost every prediction Scott Alexander makes is a multiple of 5 or 10.
Here’s a thought experiment:
Albert and Betty are astronauts sent to study a mysterious coin on Europa and independently transmit short messages back to Earth about the coin’s bias. Albert finds the coin successfully and flips it a thousand times, seeing 531 heads and 469 tails. He concludes the coin is fair and reports 50%.
Betty’s landing capsule collides with a giant teapot in upper orbit and lands several hundred miles away from target. She still has to report her beliefs about the mysterious coin. She has two choices:
- Report 50%, because that’s her prior in the absence of information. After all, probability is in the mind. If she does this, however, Albert’s higher confidence is lost in transmission.
- Report nothing to convey the fact that she has no information. Any other rationalist can automatically compute her odds are 50-50 anyway – that’s a shared prior.
What would you do? What are the correct norms around sharing probabilities in conversation?
If social norms indeed dictate that significant figures transmit confidence, might it be deceptive to report 31.5 instead of 30 in conversation about the dinosaur market?
15 comments
Comments sorted by top scores.
comment by Qiaochu_Yuan · 2018-02-16T02:16:11.670Z · LW(p) · GW(p)
I had exactly this argument with Critch several years ago. He was very strongly on the side of reporting all of the digits you have. I disagreed with him at the time but now I think he's right. As outside view support, I hear that Tetlock's superforecasters do noticeably worse if you round their probabilities off to the nearest multiple of 5 or 10, but I don't remember where I heard this and haven't read Superforecasting myself to corroborate.
As an inside view argument, rounding is extremely sensitive to how you choose to parameterize probabilities. Here are four options: you could choose to think in terms of probabilities, log probabilities, odds, or log odds. In each of these "coordinate systems" rounding has very different results. So mathematically it's not a very principled thing to do to a probability.
The thing I usually do, when asked to elicit a probability, is report a probability (usually 2 sig figs) and then also a subjective sense of how easy it would be to shift that probability by giving me more evidence / allowing me more time to think. I also sometimes straight up refuse to report a probability. The thing I generally prefer to do is to share my models instead of sharing my probabilities.
I think the thought experiment is dramatically underspecified. Who are Albert and Betty reporting probabilities to, and what will those probabilities be used for?
Replies from: Unnamed, robert-miles, alkjash, ricraz↑ comment by Unnamed · 2018-02-16T02:28:40.549Z · LW(p) · GW(p)
Scott mentioned that fact about superforecasters in his review; from what I remember the book doesn't add much detail beyond Scott's summary.
One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.
↑ comment by Robert Miles (robert-miles) · 2018-02-16T11:39:18.857Z · LW(p) · GW(p)
Perhaps the principled way is to try representing your probability to the same number of significant figures as a probability, as a log probability, as odds, and as log odds, and then present whichever option happens to fall closest to your true estimate :p
↑ comment by alkjash · 2018-02-16T03:01:02.624Z · LW(p) · GW(p)
I think I'm most interested in the last question I posed: as a conversational default when I'm not interested in diving into models and computations, should I share all the digits or as many as my confidence allows?
Replies from: Qiaochu_Yuan↑ comment by Qiaochu_Yuan · 2018-02-16T04:30:29.156Z · LW(p) · GW(p)
I think you should share 2 digits.
Replies from: habryka4↑ comment by habryka (habryka4) · 2018-02-16T05:05:28.959Z · LW(p) · GW(p)
I think you should share more digits. I sometimes say 33.5%, and experience it as meaningfully different from 34% or 33%.
This is obviously exacerbated around the ends of the probability spectrum. I.e. there is a massive difference between 99% and 99.5%, and it seems very important to feel comfortable distinguishing between them.
Replies from: Qiaochu_Yuan↑ comment by Qiaochu_Yuan · 2018-02-16T05:25:28.209Z · LW(p) · GW(p)
That's fair.
When I say 2 digits I mean 2 sig figs, so e.g. 0.05% is one digit. I think if you're reporting a probability near 99% it makes sense to report 1 minus that probability, to 2 (or 3, or more if you have them) sig figs.
↑ comment by Richard_Ngo (ricraz) · 2018-02-17T03:52:04.455Z · LW(p) · GW(p)
The thing I usually do, when asked to elicit a probability, is report a probability (usually 2 sig figs) and then also a subjective sense of how easy it would be to shift that probability by giving me more evidence / allowing me more time to think.
What is the correct technical way to summarise the latter quantity (ease of shifting), in an idealised setting?
Replies from: Qiaochu_Yuan↑ comment by Qiaochu_Yuan · 2018-02-17T04:59:56.941Z · LW(p) · GW(p)
Uh, I dunno, something like, I currently have a belief about the probability distribution of kinds of evidence I expect to encounter in the future, and from there I can compute a probability distribution over what my posterior beliefs are after updating on that evidence, then compute some summary statistic of that distribution that measures how spread out it is. An easy setting in which this can be made completely formal is repeatedly flipping a coin of unknown bias.
comment by gjm · 2018-02-16T12:53:07.659Z · LW(p) · GW(p)
Despite the first sentence of this post, I don't think it's actually about probabilities. The same questions arise when you have any other sort of number to report.
The answer seems to me to be composed of one kinda-obvious part and one kinda-impossible-to-determine part.
The obvious part is that unless for some reason you're deliberately deceiving, you should do your best to convey the information you have, which includes both your best estimate of the number and how much you think you know about it (really it's something like a probability distribution over the possible values of the number) within whatever constraints you have -- e.g., on the attention span of whoever you're reporting the number to, or your insight into your beliefs.
The kinda-impossible part is figuring out how those considerations actually trade off. Usually you will have limited resources, limited insight into what you actually believe, an audience with limited patience, a social context in which numbers with lots of nonzero digits in them are taken as implicit claims to detailed knowledge, etc., and exactly what that means for how you should report the numbers is going to be (1) different each time, as all those factors vary, and (2) very difficult to determine.
If you're dealing with a fairly technical audience, or one strongly motivated to pay attention to the details of what you say, I think it should be OK to say things like "31.5% +- 2.5%". Otherwise, I suspect there usually is no way to avoid their understanding being seriously deficient, and you get to choose between saying 31.5% and misleading them about your confidence, and saying 30% and misleading them about your best point estimate.
comment by Unnamed · 2018-02-16T02:24:49.040Z · LW(p) · GW(p)
Albert and Betty should share likelihood ratios, not posterior beliefs.
Replies from: alkjash↑ comment by alkjash · 2018-02-16T03:20:23.713Z · LW(p) · GW(p)
This is definitely a major improvement in situations where people have mostly independent data to aggregate, but in real life independence seems to be fairly rare and using likelihood ratios would cause extremely unnatural updating. In the end you can't get around sharing all your models and data to solve the independence issue.
I guess my real question is: in a low-bandwidth/low-effort setting like casual conversation what is the best single number to share? If you had to design discourse norms surrounding sharing probabilities in this setting, is likelihood ratio really the right norm?
comment by Ben Pace (Benito) · 2018-02-16T09:30:22.203Z · LW(p) · GW(p)
Moved to frontpage.
comment by John Faben (john-faben) · 2018-02-22T10:48:56.129Z · LW(p) · GW(p)
>If social norms indeed dictate that significant figures transmit confidence, might it be deceptive to report 31.5 instead of 30 in conversation about the dinosaur market?
Not if it's your bid in a market, yes if someone asks you for your probability estimate. Your best point estimate is obviously your best point estimate, and suffers from rounding.
> Betty’s landing capsule collides with a giant teapot in upper orbit and lands several hundred miles away from target. She still has to report her beliefs about the mysterious coin.
Why? If she has to offer someone a bet as to which side the coin lands on, she should probably offer even odds, but I can't see any situation where she has to report her probability but isn't able to report that this is based on nothing more than her prior.
Replies from: alkjash