Against improper priors

daniellc

Against improper priors

post by DanielLC · 2011-07-26T23:50:44.020Z · LW · GW · Legacy · 21 comments

21 comments

An improper prior is essentially a prior probability distribution that's infinitesimal over an infinite range, in order to add to one. For example, the uniform prior over all real numbers is an improper prior, as there would be an infinitesimal probability of getting a result in any finite range. It's common to use improper priors for when you have no prior information.

The mark of a good prior is that it gives a high probability to the correct answer. If I bet 1,000,000 to one that a coin will land on heads, and it lands on tails, it could be a coincidence, but I probably had a bad prior. A good prior is one that results in me not being very surprised.

With a proper prior, probability is conserved, and more probability mass in one place means less in another. If I'm less surprised when a coin lands on tails, I'm more surprised when it lands on heads. This isn't true with an improper prior. If I wanted to predict the value of a random real number, and used a normal distribution with a mean of zero and a standard deviation of one, I'd be pretty darn surprised if it doesn't end up being pretty close to zero, but I'd be infinitely surprised if I used a uniform distribution. No matter what the number is, it will be more surprising with the improper prior. Essentially, a proper prior is better in every way. (You could find exceptions for this, such as averaging a proper and improper prior to get an improper prior that still has finite probabilities and they just add up to 1/2, or by using a proper prior that has zero in some places, but you can always make a proper prior that's better in every way to a given improper prior).

Dutch books also seems to be a popular way of showing what works and what doesn't, so here's a simple Dutch argument against improper priors: I have two real numbers: x and y. Suppose they have a uniform distribution. I offer you a bet at 1:2 odds that x has a higher magnitude. They're equally likely to be higher, so you take it. I then show you the value of x. I offer you a new bet at 100:1 odds that y has a higher magnitude. You know y almost definitely has a higher magnitude than that, so you take it again. No matter what happens, I win.

You could try to get out of it by using a different prior, but I can just perform a transformation on it to get what I want. For example, if you choose a logarithmic prior for the magnitude, I can just take the magnitude of the log of the magnitude, and have a uniform distribution.

There are certainly uses for an improper prior. You can use it if the evidence is so great compared to the difference between it and the correct value that it isn't worth worrying about. You can also use it if you're not sure what another person's prior is, and you want to give a result that is at least as high as they'd get no matter how much there prior is spread out. That said, an improper prior is never actually correct, even in things that you have literally no evidence for.

21 comments

Comments sorted by top scores.

comment by Cyan · 2011-07-27T01:28:48.905Z · LW(p) · GW(p)

The argument for improper priors is that the resulting posterior distributions work well in various senses. No one uses improper priors for prediction -- the resulting prior predictive densities are improper too, so it's impossible.

Here's the argument by which I justify improper priors to myself when I use them: in cases where I have very little prior information but highly informative data, the proper prior will be essentially proportional to the improper prior in the region of high likelihood. Then using the improper prior as an approximation results in an approximate posterior which gives results that differ only negligibly from the results I would have obtained with the "correct" proper prior.

Replies from: DanielLC

↑ comment by DanielLC · 2011-07-27T03:24:51.276Z · LW(p) · GW(p)

Edited to add that they're sometimes useful, but they don't give the correct answer.

comment by Cyan · 2011-07-27T04:20:50.483Z · LW(p) · GW(p)

Suppose I am sampling from a normal distribution. Is it legitimate to declare that my prior information is such that no matter what values I observe for the first two data points, after observing them my posterior predictive probability for the event "the third data point lies between the first two data points" is 50%?

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-27T05:15:25.846Z · LW(p) · GW(p)

It's coherent to act that way, but it's improper to call it a prior. If you had a proper prior, you'd bet on any question at any time, not just until after seeing two points. But you knew that, right?

Replies from: Cyan

↑ comment by Cyan · 2011-07-27T05:54:29.619Z · LW(p) · GW(p)

...not just...

Well, the condition I describe in words above doesn't directly limit my ability to bet on any question at any time -- it just specifies one possible bet in one particular set of states of information.

However, the condition can be turned into an integral equation in which the prior density is the unknown quantity. The equation can be explicitly solved to give an analytical expression for the unique prior density which satisfies the verbal description above. Since I posted the question in this thread, you can probably guess the punch-line: the prior is improper. In fact, it's the standard "non-informative" prior for the normal distribution with unknown mean and variance.

comment by Cyan · 2011-07-27T04:26:11.256Z · LW(p) · GW(p)

you can always make a proper prior that's better in every way to a given improper prior

[emphasis mine]

"In every way" is too strong. Some improper priors are derived from the optimization of some criterion, so they are the best in a certain specific way. Also, some improper priors give posterior means that, when treated as estimators, have minimax-optimal frequentist risk. I think you mean something more like "better in every way that ought to matter to someone concerned with rationality".

Replies from: DanielLC

↑ comment by DanielLC · 2011-07-27T04:56:43.566Z · LW(p) · GW(p)

Perhaps "better in every case" would work? That is, less surprising no matter what happens?

Replies from: Cyan

↑ comment by Cyan · 2011-07-27T05:58:23.933Z · LW(p) · GW(p)

Since I don't think it makes sense to speak of the "surprise" associated with an improper distribution, I think the claim you want to make doesn't cut to the heart of the matter. There are lots of reasons to object to improper priors, but this isn't well-formed enough to be one of them.

comment by Manfred · 2011-07-27T01:41:32.745Z · LW(p) · GW(p)

Dutch books also seems to be a popular way of showing what works and what doesn't, so here's a simple Dutch argument against improper priors: I have two real numbers: x and y. Suppose they have a uniform distribution. I offer you a bet at 2:1 odds that x has a higher magnitude. They're equally likely to be higher, so you take it. I then show you the value of x. I offer you a new bet at 2:1 odds that y has a higher magnitude. You know y almost definitely has a higher magnitude than that, so you take it again. No matter what happens, I win.

To fix this example, replace "real" with "positive real" and make the bets 2:4 and 100:1.

Still, an example that comes from using improper priors as probability distributions, which they are explicitly not, doesn't seem like a strong argument. Better to show that they can't come up in any interesting situations - this may be impossible, though.

Replies from: DanielLC

↑ comment by DanielLC · 2011-07-27T03:33:39.045Z · LW(p) · GW(p)

To fix this example, replace "real" with "positive real"

I used "real" because with positive reals, you're more likely to use a logarithmic prior.

and make the bets 2:4 and 100:1.

Oops. Why did you say 2:4 instead of 1:2? Do you mean 2:1?

using improper priors as probability distributions, which they are explicitly not,

If they're not probability distributions, what are they?

Replies from: Manfred

↑ comment by Manfred · 2011-07-27T04:06:23.559Z · LW(p) · GW(p)

Why did you say 2:4 instead of 1:2? Do you mean 2:1?

Just to emphasize that the victim should have more money riding on the first bet if they are to consistently lose money.

If they're not probability distributions, what are they?

Since they're invalid probability distributions but can be updated into a probability distribution given some evidence, you might think of these as representing states where you have some knowledge, but not enough to assign consistent probabilities. For example, if all you know is that X is a member of some infinite set, you cannot assign consistent probabilities, but you still have some knowledge, which might be represented as a uniform function.

comment by Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2019-12-11T20:38:27.359Z · LW(p) · GW(p)

The "Dutch books" example is not restricted to improper priors. I don't have time to transform this into the language of your problem, but the basically similar two-envelopes problem can arise from the prior distribution:

f(x) = 1/4*(3/4)^n where x = 2^n (n >=0), 0 if x cannot be written in this form

Considering this as a prior on the amount of money in an envelope, the expectation of the envelope you didn't choose is always 8/7 of the envelope you did choose.

There is no actual mathematical contradiction with this sort of thing -- with prior or improper priors, thanks to the timely appearance of infinities. See here for an explanation:

https://thewindingnumber.blogspot.com/2019/12/two-envelopes-problem-beyond-bayes.html

comment by JoshuaZ · 2011-07-27T00:38:13.528Z · LW(p) · GW(p)

I'm not sure what you mean by proper or improper priors here. At first you seem to be talking about self-consistent but bad priors in your coinflip example, but then when you talk about proper allocation of mass you seem to be talking about self-consistency of priors. These are different issues.

This isn't true with an improper prior. If I wanted to predict the value of a random real number, and used a normal distribution with a mean of zero and a standard deviation of one, I'd be pretty darn surprised if it doesn't end up being pretty close to zero, but I'd be infinitely surprised if I used a uniform distribution.

There is no uniform distribution on the real line.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-27T02:00:39.708Z · LW(p) · GW(p)

"Improper prior" is a technical term for using an infinite measure as a prior.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-07-27T02:13:25.646Z · LW(p) · GW(p)

Ah, thanks. I was not aware of that term. Maybe linking or explaining that in the post might not be a bad idea.

Replies from: DanielLC

↑ comment by DanielLC · 2011-07-27T03:23:54.111Z · LW(p) · GW(p)

Edited to add this.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-27T05:09:55.231Z · LW(p) · GW(p)

Your new first paragraph is not the definition. Partly it goes opposite the definition and partly it is orthogonal. It is so confused, I'm surprised that the other material is (or looked) correct. You should separate your consideration of continuous priors from improper priors. An example of an improper prior in a discrete setting is the uniform prior on positive integers. Another example is the prior p(n) = 1/n.

Replies from: jsalvatier

↑ comment by jsalvatier · 2011-07-27T19:31:06.290Z · LW(p) · GW(p)

I am also confused. More specifically, improper priors are priors that integrate to infinity and thus cannot be normalized.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-28T01:51:05.707Z · LW(p) · GW(p)

That's almost the definition, except that improper priors are not priors.
Is that your confusion?

Replies from: jsalvatier

↑ comment by jsalvatier · 2011-07-28T03:06:26.235Z · LW(p) · GW(p)

No, I mean I share your confusion that the rest of the conversation appeared reasonable given the incorrect definition in the post.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2011-07-28T04:28:33.691Z · LW(p) · GW(p)

Sorry. Probably part of the miscommunication is that I used "confused" to describe Daniel LC and "surprised" to describe myself.

Against improper priors

Contents

21 comments