How to come up with verbal probabilities

post by jimmy · 2009-04-29T08:35:01.709Z · LW · GW · Legacy · 20 comments

Contents

20 comments

Unfortunately, we are kludged together, and we can't just look up our probability estimates in a register somewhere when someone asks us "How sure are you?".

The usual heuristic for putting a number on the strength of beliefs is to ask "When you're this sure about something, what fraction of the time do you expect to be right in the long run?".  This is surely better than just "making up" numbers with no feel for what they mean, but still has it's faults.  The big one is that unless you've done your calibrating, you may not have a good idea of how often you'd expect to be right.

I can think of a few different heuristics to use when coming up with probabilities to assign.

1) Pretend you have to bet on it. Pretend that someone says "I'll give you ____ odds, which side do you want?", and figure out what the odds would have to be to make you indifferent to which side you bet on. Consider the question as if though you were actually going to put money on it . If this question is covered on a prediction market, your answer is given to you.

2) Ask yourself how much evidence someone would have to give you before you're back to 50%. Since we're trying to update according to bayes law, knowing how much evidence it takes to bring you to 50% tells you the probability you're implicitely assigning.

For example, pretend someone said something like "I can guess peoples names by their looks".  If he guesses the first name right, and it's a common name, you'll probably write it off as fluke.  The second time you'll probably think he knew the people or is somehow fooling you, but conditional on that, you'd probably say he's just lucky.  By bayes law, this suggests that you put the prior probability of him pulling this stunt at 0.1%<p<3%, and less than 0.1% prior probability of him having his claimed skill.  If it takes 4 correct calls to bring you to equally unsure either way, then thats about 0.03^4 if they're common names, or one in a million1...

There's a couple neat things about this trick.  One is that it allows you to get an idea of what your subconscious level of certainty is before you ever think of it.  You can imagine your immediate reaction to "Why yes, my name is Alex, how did you know" as well as your carefully deliberated response to the same data (if they're much different, be wary of belief in belief).  The other neat thing is that it pulls up alternate hypotheses that you find more likely, and how likely you find those to be (eg. "you know these people").

3) Map out the typical shape of your probability distributions (ie through calibration tests) and then go by how many standard deviations off the mean you are. If you're asked to give the probability that x<C, you can find your one sigma confidence intervals and then pull up your curve to see what it predicts based on how far out C is2.

4) Draw out your metaprobability distribution, and take the mean.

You may initially have different answers for each question, and in the end you have to decide which to trust when actually placing bets.

I personally tend to lean towards 1 for intermediate probabilities, and 2 then 4 for very unlikely things.  The betting model breaks down as risk gets high (either by high stakes or extreme odds), since we bet to maximize a utility function that is not linear in money.

What other techniques do you use, and to how do you weight them?


 
Footnotes:

1: A common name covers about 3% of the population, so p(b|!a) = 0.03^4 for 4 consecutive correct guesses, and p(b|a) ~=1 for sake of simplicity.  Since p(a) is small, (1-p(a)) is approximated as 1.

p(a|b) = p(b|a)*p(a)/p(b) = p(b|a)*p(a)/(p(b|a)*p(a)+p(b|!a)*(1-p(a)) => approximately 0.5 = p(a)/(p(a)+0.03^4) => p(a) = 0.03^4 ~= 1/1,000,000

2: The idea came from paranoid debating where Steve Rayhawk assumed a cauchy distribution. I tried to fit some data I had taken myself, but had insufficient statistics to figure out what the real shape is (if you guys have a bunch more data I could try again). It's also worth noting that the shape of one's probability distribution can change significantly from question to question so this would only apply in some cases.

20 comments

Comments sorted by top scores.

comment by matt · 2009-04-29T22:44:52.867Z · LW(p) · GW(p)

I keep track: pbook.trike.com.au

I find that my casual estimates of probability often change dramatically as I pause to type something into my permanent PredictionBook record.

(We're almost ready to go live, but as the front page suggests this is still in beta. We're currently updating the styling to work in Internet Explorer (so until then consider using a good browser instead). We'd love your feedback, so please use the "feedback" tab on the right. When we do go live we'll keep any accounts and data from the beta, so you won't lose anything by signing up now.)

Replies from: gwern, MBlume
comment by gwern · 2009-05-04T17:16:36.695Z · LW(p) · GW(p)

That's an interesting site. May I make a suggestion?

Since a year or two ago when someone suggested on OB that tracking predictions would be good for your mental health, I've pondered how one could do that with software. The problem is that the things where prediction tracking delivers day-to-day bacon - such as whether I'll like the vanilla or chocolate better, or which route to work will be faster - are also so minor that no one would bother writing them down and later inputting them to some website/program. The friction/overhead is just way too high.

But I've been looking at twitter, and it seems to me, that maybe if your cellphone is on and all you have to do is type up the prediction and % and hit enter, maybe that's lightweight enough to make it a habit. What do you guys think?

Replies from: matt
comment by matt · 2009-05-04T22:09:56.116Z · LW(p) · GW(p)

Agreed. Tweet it, or email it, or SMS it, or enter it into the iPhone app, or use the method that we didn't think of, but that some other hacker was able to put together using our API. Alternative input methods are high on our todo list.

comment by MBlume · 2009-04-29T23:06:59.103Z · LW(p) · GW(p)

so until then consider using a good browser instead

I'm curious, has anyone attempted to put a dollar amount to the developer time spent making things work in IE?

ETA: for Bayesian practice, I'll put my 98% bars at...$1000/yr and $200000000/yr

Replies from: ciphergoth, mattnewport, Alicorn
comment by Paul Crowley (ciphergoth) · 2009-05-02T12:36:00.865Z · LW(p) · GW(p)

I would assign much lower odds to the chances of it being less than $1000/yr worldwide, since I have personally done more than a day's work on it in the last 12 months and our daily chargeable rate is higher than that.

comment by mattnewport · 2009-04-29T23:15:39.573Z · LW(p) · GW(p)

Define wasted - I assume people generally put effort into making things work in IE because they expect to increase their audience as a direct result. For a profit making enterprise anyone investing time in making things work in IE is presumably doing so under the expectation that it will deliver a positive return. For a non-profit making enterprise the expectation is presumably that the increased audience is worth the expenditure based on whatever measure is used for the value of a larger audience.

Is your claim that people in fact consistently overestimate the return on investment for ensuring compatibility with IE? Or that relative to some hypothetical global optimum money is 'wasted'? If the latter, how would you attempt to evaluate the waste?

Replies from: MBlume
comment by MBlume · 2009-04-29T23:18:44.676Z · LW(p) · GW(p)

I'm sorry, that was unclear.

What I intended by the word was simply that from an aggregate perspective, the optimal solution would be everyone, as matt said, using a good browser instead, which would require minimal effort by users, and make the time investment from developers unnecessary.

Still, I've edited to the more neutral spent.

ETA: I suppose if you wanted to put a dollar value to "time wasted", you would have to subtract off the dollar value of the time it would take for each present-day IE user to download Firefox, Safari, Opera, Chrome, etc, import any bookmarks, and become familiar with its workings up to the level at which they were previously proficient in IE. This amount is non-negligible, and I was wrong to overlook it, but I strongly doubt that it is of the same order as the developer-time spent.

comment by Alicorn · 2009-04-29T23:12:07.805Z · LW(p) · GW(p)

Does that include a dollar value assigned to volunteered coding time, or no?

Replies from: MBlume
comment by MBlume · 2009-04-29T23:13:09.457Z · LW(p) · GW(p)

Excellent question -- I'd include the dollar value of whatever work the individual could have been doing, yes.

comment by kurige · 2009-04-29T17:28:47.588Z · LW(p) · GW(p)

Thanks for the examples of how to apply OB/LW techniques to everyday life.

Definitely more articles in this vein would be greatly appreciated.

Replies from: Yvain
comment by Scott Alexander (Yvain) · 2009-04-29T22:12:40.593Z · LW(p) · GW(p)

Agreed. I especially like technique 1. A related technique is to imagine you're going to be called on it. For example, if I predict there's a 90% chance the economic crisis will be over by 2011, I imagine that the second I say the estimate, Omega will come down from the sky and say whether it is or it isn't. Quite often I find that I'm worrying a bit more than 10% that Omega will announce that I was wrong.

comment by kim0 · 2009-05-02T10:01:55.799Z · LW(p) · GW(p)

Verbal probabilities are typically impossible because the priors are unknown and important.

However: relative probabilities and similar can often be given usueful estimates, or limits.

For instance: Seeing a cat is more likely than seeing a black cat because black cats are a subset of cats.

Stuff like this is the reason that pure probability calculations are not sufficient for general intelligence.

Probability distributions however, seem to me to be sufficient. This cat example cuts the distribution in 2.

Kim Øyhus

comment by steven0461 · 2009-04-29T17:59:37.497Z · LW(p) · GW(p)

Ask yourself how much evidence someone would have to give you before you're back to 50%.

Unfortunately, this depends on accurate intuitions for the evidential value of different numbers of successful trials. I don't think people have these, though maybe their intuitions are good enough to avoid the more grievous errors.

Another problem is if there are more possible explanations than chance and the thing you're looking for. I would not believe space aliens were manipulating coinflips after any number of heads results in a row.

Replies from: Cyan
comment by Cyan · 2009-04-29T18:37:59.340Z · LW(p) · GW(p)

this depends on accurate intuitions for the evidential value of different numbers of successful trials

This is the part for which you consciously use math, as in the given example.

Replies from: steven0461
comment by steven0461 · 2009-04-29T18:47:47.018Z · LW(p) · GW(p)

It seems to me it's logically possible for me to have horribly off intuitions about the evidential value of ten heads results, so that something to which I rationally and non-consciously assign a 1 in a million probability seems only ten heads results away from 50-50. Maybe you'd say that I wasn't really assigning anything as extreme as a 1 in a million probability, but it seems like there are other ways to make the concept meaningful that don't refer to coinflips.

Replies from: Vladimir_Nesov, Cyan
comment by Vladimir_Nesov · 2009-04-29T19:31:55.832Z · LW(p) · GW(p)

I think it'd be useful to calibrate the semantics of purely intuitive valuation of statements, all else equal. Once you start systematically processing the evidence by means other than gut feeling, you construct other sources of evidence, but knowing what the intuition really tells you is important in its own right.

If you can't tell 1 in 1,000 from 1 in 1,000,000, there should be a concept for that level of belief, with all beliefs in it assigned odds in between, depending on empirical distribution of statements you ask questions about. If you apply other techniques to update on that estimate, fine, but that's different data. Maybe the concepts of ranges of probability are not very good, and it's more useful to explicitly compensate for affection, complexity of the statement or imagery, etc, but those are not about evidence per se, only about deciphering the gut feeling.

For this purpose, the already existing concepts of "a little", "probably", "unlikely" may be very important, if only their probabilistic semantics can be extracted, probably through estimating the concepts fitting each of a set of statements specifically constructed around a given one to extract its odds.

comment by Cyan · 2009-04-29T19:14:44.501Z · LW(p) · GW(p)

I think I see what you're saying -- that our intuitions for extreme likelihood functions might be as bad as those for extreme prior probabilities. IIRC, research shows that humans have a good sense for probabilities in the neighborhood of 0.5, so I think you're safe as long as your trials have sampling probabilities around 0.5 and you explicitly and sequentially imagine each counterfactual trial and your resulting feelings of credence.

Replies from: steven0461
comment by steven0461 · 2009-04-29T19:32:03.865Z · LW(p) · GW(p)

Right, that's what I'm saying.

The research result is interesting, but I can still imagine people being maybe an order of magnitude off every 10-20 coinflips. (Maybe by that time an order of magnitude no longer matters much.)

I think I'd imagine them to be very wrong when assessing the evidence in 100 heads results vs. 10 if they didn't explicitly imagine every trial, just because of scope insensitivity. (Maybe these are more extreme cases than are likely to come up in applications.)

comment by UnholySmoke · 2009-04-30T10:35:59.905Z · LW(p) · GW(p)

Consider the question as if though you were actually going to put money on it .

My father calls this 'the £100 test'. Very useful heuristic when you want to find out what you actually believe.

comment by Cyan · 2009-04-29T15:44:24.684Z · LW(p) · GW(p)

Suggestion: put in a summary break, perhaps between the third and fourth paragraphs, just for the sake of housekeeping.