Bayesian exercise

rolfandreassen

Bayesian exercise

post by RolfAndreassen · 2011-09-21T21:34:13.402Z · LW · GW · Legacy · 21 comments

21 comments

I am confused.

Suppose you are in charge of estimating the risk of catastrophic failure of the Space Shuttle. From engineers, component tests, and guesswork, you come to the conclusion that any given launch is about 1% likely to fail. On the strength of this you launch the Shuttle, and it does not blow up. Now, with this new information, what is your new probability estimate? I write down

P(failure next time | we observe one successful launch) = P (we observe one successful launch | failure next time) * P(failure) / P(observe one success)

P(FNT|1S) = P(1S|FNT)*P(F)/P(S)

We have P(F) = 1-P(S) = 0.03. Presumably your chances of success this time are not affected by the next one being a failure, so P(1S|FNT) is just P(S) = 0.97. So the two 97% chances cancel, and I'm left with the same estimate I had before, 3% chance of failure. Is this correct, that a successful launch does not give you new information about the chances of failure? This seems counterintuitive.

21 comments

Comments sorted by top scores.

comment by JGWeissman · 2011-09-21T21:56:34.295Z · LW(p) · GW(p)

Your problem is that you are effectively assigning probability 1 to the proposition that 1% of launches will fail. Instead, you should have a probability distribution over the fraction of launches that fail. When you observe a launch or a failure, update that probability distribution using Bayes' law, resulting in higher probabilities for lower frequencies after a success.

Replies from: Normal_Anomaly

↑ comment by Normal_Anomaly · 2011-09-22T01:50:33.318Z · LW(p) · GW(p)

It occurs to me that I don't really know how to mathematically handle a probability distribution. How much calculus, if any, is required for this?

Replies from: jsalvatier, None, Cyan

↑ comment by jsalvatier · 2011-09-22T14:53:32.966Z · LW(p) · GW(p)

You need calculus if you're going to try to estimate any continuous quantities, but you can often avoid this by making the variable discrete. Instead of saying "the proportion is a number [0,1]" you say "the proportion is either 0, .25, .5, .75 or 1". This approximates the continuous version and can be done without any calculus.

↑ comment by [deleted] · 2011-09-22T02:15:44.372Z · LW(p) · GW(p)

To fully interpret a probability distribution you need to use integrals. For example, if I have a probability distribution over the number of heads in 50 coinflips and I want to know the probability that the observed value is going to fall within a certain interval, I have to take the integral of that part of the distribution. You can definitely understand what a probability distribution is without calculus, but you're going to have a hard time actually doing the math.

Edit: It occurs to me that statistical software could do most of the number-crunching for you, which would definitely make things easier.

↑ comment by Cyan · 2011-09-22T02:56:33.577Z · LW(p) · GW(p)

For probability distributions on continuous quantities (such as the proportion of launches that fail), you need to know how to do derivatives and integrals.

comment by Oscar_Cunningham · 2011-09-21T23:59:47.866Z · LW(p) · GW(p)

In brief:

There are two kinds of probabilities: Frequencies and credences.

The frequency of an event is the fraction of times it occurs in the long run. If you're wearing Bayesian goggles then you refuse to call this a probability. You just treat it as a physical quantity, like the mass of a person, or the height of a building. A fair coin comes heads with a frequency of 50%, a die rolls 2 with a frequency of 1/6.

The credence you have in some fact is the degree to which you believe it. This is what Bayesians call "probability". If you had a biased coin you might believe that the next toss will be heads, and the strength of your belief could be 70%, but this doesn't mean that you think the long run frequency of heads will be 70%.

So when you're assessing the space shuttle, you treat it as if there is some fixed frequency with which shuttles crashes. But you don't know what this frequency is. So you have a probability distribution over the possible values of the frequency. Maybe you have a 20% credence that the frequency is 0.5%, a 15% credence that the frequency is 1%, a 10% credence that the frequency is 1.5%, and so on...

In symbols this looks like this. Let f be the true frequency. Then for each θ between 0 and 1 we have some credence that f = θ. We write this credence as P(f = θ). This expression is a function that depends on θ. It expresses our belief that the frequency is equal to θ.

Now, we want to calculate the probability (our credence) that the shuttle will crash. If we knew that f was 3%, then our credence that the shuttle would crash would also be 3%. That is, if we know the frequency then our degree of belief that the shuttle will crash is equal to that frequency. In symbols, this looks like this: Let A be the event that the shuttle crashes, then P(A | f = θ) = θ: the probability of the shuttle crashing given that f is equal to θ, is exactly θ.

But we don't know what f is. So what we do is we take an average over all possible values of f, weighted by how likely we think it is that f really does take that value.

P(A) = "Sum from θ=0 to θ=1"( P(f = θ) P(A | f = θ) ) = "Sum from θ=0 to θ=1"( P(f = θ) θ )

So we get a value for the probability of A, but this isn't necessarily equal to the true frequency f.

Right. Now we watch a successful shuttle launch. A has not occurred. Call this event is called ¬A, or not-A. This changes our credences about what f could be equal to. Given that one launch has been successful, it is more likely that f is small. The way we determine our new belief, P(f = θ | ¬A), is to use Bayes theorem:

P(f = θ | ¬A) = P(¬A | f = θ) * P(f = θ) / P(¬A)

But we know that P(A | f = θ) = θ, so we have P(¬A | f = θ) = 1- θ. We put that in and get:

P(f = θ | ¬A) = θ * P(f = θ) / (1 - P(A))

and luckily we see that we know all of the things on the right hand side.

Having updated to our new credences for the value of f, given by P(f = θ | ¬A), we could now calculate the probability of the shuttle crashing on its second launch. Call this event B. That is, we want the probability of B, given that ¬A has occurred, P(B | ¬A). We do exactly what we did before, take a weighted average of P(B | f = θ , ¬A) weighting by the chance that f really is θ.

P(B | ¬A) = "Sum from θ=0 to θ=1"( P(f = θ | ¬A) P(B | f = θ , ¬A) ) = "Sum from θ=0 to θ=1"( P(f = θ | ¬A) θ )

Replies from: prase

↑ comment by prase · 2011-09-22T12:53:19.610Z · LW(p) · GW(p)

Let A be the event that the shuttle crashes.

Now we watch a successful shuttle launch. A has occurred.

Inconsistent.

Replies from: Oscar_Cunningham

↑ comment by Oscar_Cunningham · 2011-09-22T13:30:09.594Z · LW(p) · GW(p)

Gah!

EDIT: Fixed.

comment by [deleted] · 2011-09-21T22:29:10.134Z · LW(p) · GW(p)

Yeah, see, you can't do that. You're trying to change your model from inside your model; you're basically saying, "If the probability of success is 97%, and this launch was a success, what is the probability the next launch is a success?" The answer has to be 97%, because inside your model, the two launches are independent events.

You have to go meta: Instead of asking "What are the chances the next launch will succeed?" you ask, "What are the chances that my model, which predicted this launch would succeed with a probability of 97%, is correct?" To do that, you need a prior probability that the model is right, and you need other possible models. More complicated to do with this than the mammogram example, because you either have cancer or you don't, but in this example, the probability of success can be anywhere between 0 and 100%.

comment by RolfAndreassen · 2011-09-22T02:51:16.232Z · LW(p) · GW(p)

Thanks for the replies. Let me rephrase to see if I understood correctly. My problem is that I don't really have a single degree-of-belief, I have a distribution over failure frequencies, and as I've set it up my distribution is a delta function - in effect, I've assigned something a `probability' of 1, which naturally breaks the formula. Instead I ought to have something like a Gaussian, or whatever, with mean 0.03 and signma (let's say) 0.01. (Of course it won't be a true Gaussian since it is cut off at 0 and at 1, but that's a detail.) Then, to calculate my new distribution, I do Bayes at each point, thus:

P(failure rate x | one successful launch) = P(one successful launch | failure rate x) * P(x) / P(one successful launch)

where P(x) is my Gaussian prior and P(one successful launch) is the integral from 0 to 1 of P(x)(1-x). We can easily see that in the case of the delta function, this reduces to what I have in my OP. In effect I did the arithmetic correctly, but started with a bad prior - you can't shift yourself away from a prior probability of 1, no matter what evidence you get. We can also see that this procedure will shift the distribution down, towards lower failure probabilities.

Thanks for clearing up my confusion. :)

Replies from: prase, Oscar_Cunningham

↑ comment by prase · 2011-09-22T11:39:31.156Z · LW(p) · GW(p)

Right. But rather than to say that you have started with a bad prior (which you effectively did, bud hadn't noticed that you had had such a prior) I would say that the confusion stemmed from bad choice of words. You thought about frequency of failures and said probability of failure, which caused you to think that this is what has to be updated. Frequency of failures isn't a Bayesian probability, it's an objective property of the system. But once you say "probability of failure", it appears that the tested hypothesis is "next launch will fail" rather than "frequency of failures is x". "Next launch will fail" says apparently nothing about this launch, so one intuitively concludes that observing this launch is irrelevant as for that hypothesis, more so if one correctly assumes that failure next time doesn't causally influence chances for failure this time.

Of course this line of thought is wrong: both launches are instances of the same process and by observing one launch one can learn something which applies to all other launches. But it is easy to overlook if one speaks about probabilities of single event outcomes rather than a general model which includes some objective frequencies. So, before you write down the Bayes' formula, make sure what hypothesis you are testing and that you don't mix objective frequencies and subjective probabilities, even if they may be (under some conditions) the same.

(I hope I have described the thought processes correctly. I have experienced the same confusion when I was trying to figure out how Bayesian updating works for the first time.)

↑ comment by Oscar_Cunningham · 2011-09-22T10:09:10.601Z · LW(p) · GW(p)

Yes! Exactly right.

By the way, the idea that the if the frequency is known it is equal to the probability, is wittily known as the "Principal Principle".

comment by Emile · 2011-09-22T15:53:14.206Z · LW(p) · GW(p)

Imagine that your "component test and guesswork" is launching shuttle after shuttle, and seeing how many blow up. You could get the 1% figure by

launching 100 shuttles, and observing that only one blew up
launching 100.000 shuttles, and observing that 1000 of them blew up.

Even though both could be described as "1% likely to fail", it's clear that you have much more confidence in that figure in the second scenario; observing one extra successful launch will shift your confidence around more in the first scenario than in the second.

As others said, you should have a probability distribution over the frequency of failure (a Beta distribution I believe), that should peak near 1%, but the peak will be much sharper in the second scenario than in the first.

comment by Shmi (shminux) · 2011-09-22T00:15:45.114Z · LW(p) · GW(p)

Good for you for actually trying to apply the Bayesian gospel to realistic problems, instead of taking it on faith!

Not sure if this is a useful example, though, as the posterior probability of a successful launch depends on a score of other tidbits of information obtained from the launch, which are bound to overwhelm the fact of successful launch itself, except in the eyes of the management, who quickly reduce the announced failure rate to one in 100,000 or some other acceptable number after only a dozen of successes, regardless of the warning signs.

In fact, the question, as stated, is meaningless, since the events S and F are mutually exclusive, and to calculate a conditional probability implicitly embedded in the Bayes expression you need compatible events. One would instead define a sample space in which S and F live (e.g. odds of an O-ring failure and its effect on the launch success).

The confidence gained from a single successful launch is no better than the confidence of seeing another head given two heads in two successive tosses of a known fair coin. Until and unless you see a really unlikely event, you should not update your priors based on the outcomes, but only based on your underlying models of the event in question and whatever useful data you can glean from the coin trajectory.

That said, you can definitely apply Bayes to discriminate between competing models of, say, isolation foam debris striking the shuttle, based on the empirical data from a given launch, which will, in turn, affect the estimate of success for the next launch.

Hmm, that ended up being wordier than I expected, but hopefully I haven't told many lies.

comment by Owen · 2011-09-21T22:03:39.653Z · LW(p) · GW(p)

I think the error lies in this sentence:

"Presumably your chances of success this time are not affected by the next one being a failure."

I assume you think this is true because there's no causal relationship where the next shuttle launch can affect this one, but their successes can still be correlated, which your probability estimate isn't taking into account.

If you want to update meaningfully, you need to have an alternative hypothesis in mind. (Remember, evidence can only favor one hypothesis over another (if anything); evidence is never "for" or "against" any one theory at a time.) Perhaps the engineers believe that there is a 4% chance that any given shuttle launch will fail (H1), but you estimate a 25% chance that they're wrong and the shuttles are actually foolproof (H2). Then you estimate the probability that the first shuttle launch will fail (F) as

P(F) = P(F|H1) P(H1) + P(F|H2) P(H2) = (4%)(75%) + (0%)(25%) = 3%.

But the shuttle launch goes off ok, so now you update your opinion of the two hypotheses with Bayes' rule:

P(H1|~F) = P(~F|H1) P(H1) / P(~F) = (100% - 4%) (75%) / (100% - 3%) ≈ 74.2%.

Then your estimate that the next shuttle will fail (F') becomes:

P(F' | ~F) = P(F' | H1, ~F) P(H1 | ~F) + P(F' | H2, ~F) P(H2 | ~F)

= (4%) (74.2%) + (0%) (100% - 74.2%) ≈ 2.97%.

So the one successful shuttle launch does, in this case, lower your expectation of a failure next time. As the shuttles keep succeeding, you become gradually more and more sure that the shuttles are foolproof and the engineers are wrong. But if the launch ever does fail, you will instantly believe the engineers and assign no credence to the claim that the shuttles never fail. (Try the math to see how that works.)

comment by jsalvatier · 2011-09-21T22:02:31.424Z · LW(p) · GW(p)

You're trying to estimate a proportion. You should read about doing Bayesian inference for proportions (for example here). If this is your first problem you might want to choose a simpler problem where the quantity you're trying to estimate is a discrete variable (that'll make the problem simpler). The particular mistake you've made is using a single value for your prior; the value I think you're trying to estimate (the proportion of shuttles that fail) is continuous so you should have a prior distribution over the real line [0,1].

Replies from: Cyan

↑ comment by Cyan · 2011-09-22T02:58:40.985Z · LW(p) · GW(p)

Or here, even.

comment by DanielLC · 2011-09-22T06:40:59.006Z · LW(p) · GW(p)

You're finding a probability based on a group of independent events, so I'd suggest using beta distribution. The Wikipedia page gives how to find the mean and variance given alpha and beta. Decide what you think they are (you already effectively gave that the mean was 3%, so you just need the variance), and solve for alpha and beta. If you observe a success, increment beta, and recalculate the mean.

Also, you seem to be saying 1% failure at the beginning, and 3% later on. In addition, both of these would be much too high to risk a launch.

comment by Nic_Smith · 2011-09-21T22:24:21.538Z · LW(p) · GW(p)

A few people have said things that are similar while I was typing this up, but hopefully this still helps: I think the problem is that you're implicitly assigning a probability of 0 to anything other than that one rate. In usual Bayesian analysis, you could imagine the launches as being analogous to a "biased coin." Often success/failure scenarios are modeled as binomial, with a beta distribution describing our degrees of belief for what the "coin" will do. But for simplicity's sake, let's suppose we know that either 1% or 10% of the launches will fail, and we have no further information on what will happen, so we assign a probability of 0.5 to both rates of failure. Because either one or the other has to be the case (by the unrealistic setup of this problem), the unconditional probability of a failure is P(F) = 0.5(0.01) + 0.5(0.10) = 0.055

Now, we see a successful launch. Intuitively, this is more likely if the rate of failure is lower, so this should favor, for the rate R, the hypothesis R = 0.01 and decrease the probability that R=0.1:

P(R = 0.01|S) = P(S|R =0.01)P(R = 0.01)/P(S) = 0.99(0.5)/0.945 = 0.524 And since it has to be one or the other, P(R = 0.1|S) = 1 - 0.524 = 0.476

comment by Richard_Kennaway · 2011-09-22T08:09:11.110Z · LW(p) · GW(p)

Other people have already said most of what there is to be said, but there is also this:

Presumably your chances of success this time are not affected by the next one being a failure, so P(1S|FNT) is just P(S) = 0.97.

Not causally affected, but possibly correlated. Or anticorrelated. At any rate, the fact that the future does not causally affect the present does not establish the probabilistic independence of 1S and FNT.

comment by Normal_Anomaly · 2011-09-22T01:51:25.068Z · LW(p) · GW(p)

Other people have already answered your question better than I can, but I wanted to let you know that at first I thought this post would be about applying Bayesian methods to working out.

Bayesian exercise

Contents

21 comments