P(X = exact value) = 0: Is it really counterintuitive?

post by lucidfox · 2011-07-29T12:45:38.527Z · score: 8 (11 votes) · LW · GW · Legacy · 49 comments

I'm probably not going to say anything new here. Someone must have pondered over this already. However, hopefully it will invite discussion and clear things up.

Let X be a random variable with a continuous distribution over the interval [0, 10]. Then, by the definition of probability over continuous domains, P(X = 1) = 0. The same is true for P(X = 10), P(X = sqrt(2)), P(X = π), and in general, the probability that X is equal to any exact number is always zero, as an integral over a single point.

This is sometimes described as counterintuitive: surely, at any measurement, X must be equal to something, and thus its probability cannot be zero since its clearly happened. It can be, of course, argued that mathematical probability is abstract function that does not exactly map to our intuitive understanding of probability, but in this case, I would argue that it does.

What if X is the x-coordinate of a physical object? If classical physics are in question - for example, we pointed a needle at a random point on a 10 cm ruler - then it cannot be a point object, and must have a nonzero size. Thus, we can measure the probability of the 1 cm point lying within the space the end of the needle occupies, a probability that is clearly defined and nonzero.

But even if we're talking about a point object, while it may well occupy a definite and exact coordinate in classical physics, we'll never know what exactly it is. For one, our measuring tools are not that precise. But even if they had infinite precision, statements like "X equals exactly 2.(0)" or "X equals exactly π" contain infinite information, since they specify all the decimal digits of the coordinate into infinity. We would have an infinite number of measurements to confirm it. So while X may objectively equal exactly 2 or π - again, under classical physics - measurers would never know it. At any given point, to measurers, X would lie in an interval.

Then of course there is quantum physics, where it is literally impossible for any physical object, including point objects, to have a definite coordinate with arbitrary precision. In this case, the purely mathematical notion that any exact value is an impossible event turns out (by coincidence?) to match how the universe actually works.

49 comments

Comments sorted by top scores.

comment by JoshuaZ · 2011-07-29T14:12:42.877Z · score: 10 (10 votes) · LW(p) · GW(p)

This actually gets even worse. Consider for example a hypothetical Bayesian version of Issac Newton, trying to estimate what exponent k the radius is raised to in F= GMm/R^k. There's an intuition that mathematically simple numbers should be more likely, such as say "2". A while ago jimrandomh and benelliiot discussed this with me. Ben suggested that in this sort of context you might just have a complicated distribution where part of the distribution arose from something continuous and the other part arose from discrete probabilities for simple numbers. This seems to do a decent job capturing our intuition but it seems to be very hard to actually use that sort of distribution.

comment by lucidfox · 2011-07-30T11:17:56.169Z · score: 5 (5 votes) · LW(p) · GW(p)

If Newton tried to derive his law purely from empirical measurements, then yes, he would never be exactly sure (ignoring general relativity for a moment) that the exponent is exactly 2. For all he would know, it could actually be 2.00000145...

But that would be like trying to derive the value of pi or the exponents in the Pythagorean theorem by measuring physical circles and triangles. If the law of gravity is derived from more general axioms, then its form can be computed exactly provided that these axioms are correct.

comment by Pfft · 2011-07-30T14:39:17.167Z · score: 1 (1 votes) · LW(p) · GW(p)

Do you think that the Dirichlet Processes models that machine learning people use might be relevant here? As I understand it, a DP prior says that the true probability distribution is a discrete probability distribution over some countable set of points, but you don't know which set in advance. So in the posterior, this can consistently assign some nonzero probability on a single point -- in fact, if you do the math the posterior is very simple, it's a mix between a DP and some finite probability mass on the values that you did see.

comment by JoshuaZ · 2011-07-30T18:52:10.329Z · score: 1 (1 votes) · LW(p) · GW(p)

My minimal knowledge base says that sounds potentially relevant. Unfortunately, I don't know nearly enough about this sort of thing other than to make very vague, non-committal remarks.

comment by Eugine_Nier · 2011-07-30T02:32:42.925Z · score: -2 (2 votes) · LW(p) · GW(p)

In summary Newton should assign probability 0 to the statement that his theory of relativity is exactly correct. This turns out to be the right thing to do.

comment by JoshuaZ · 2011-07-30T18:51:07.835Z · score: 2 (2 votes) · LW(p) · GW(p)

Huh? No. The probability shouldn't be zero that he's correct. Even now there's some very tiny probability that Newton's laws are exactly correct. This chance is vanishingly small but non-zero. Moreover, your argument implies too much because one could use the exact same logic for general relativity.

comment by Eugine_Nier · 2011-07-31T00:13:53.523Z · score: 1 (1 votes) · LW(p) · GW(p)

Moreover, your argument implies too much because one could use the exact same logic for general relativity.

And it would be equally correct.

comment by JoshuaZ · 2011-07-31T04:13:31.084Z · score: 1 (1 votes) · LW(p) · GW(p)

Ok. But even if you had a theory of quantum gravity that seemed to explain all observed data your argument would still go through. If your argument is accepted than any theory of everything would have to be assigned zero probability of being correct no matter how well it predicted things. This seems wrong.

comment by Will_Newsome · 2011-07-31T03:09:53.350Z · score: 0 (2 votes) · LW(p) · GW(p)

"Should"? I would much rather be logically inconsistent, or bet that the axioms of probability are meaningless or irrelevant---which in relevant decision theoretic problems they tend to be---than give odds of infinity to one.

comment by shminux · 2011-07-29T19:50:19.425Z · score: 7 (7 votes) · LW(p) · GW(p)

It may or may not be helpful to realize that infinities (including infinitesimals) are merely a mathematical abstraction. Everything you encounter in the physical world is finite. Thus, it's not overly surprising that something actually happens, even though a given mathematical model of that something assigns it a zero probability.

That said, mathematical descriptions that include continuity are extremely convenient (life would be rather cumbersome if we had to use finite difference calculus instead of derivatives in all applications).

It is a very common tendency to identify a physical phenomenon with a particular mathematical model of it (one of the most abused models is that of virtual particles in particle physics), but one would be rather less wrong by keeping in mind that an abstraction of an object is not the object itself.

A nice (if fantastical) description of objects vs models can be found in the HPMoR chapter on partial transfiguration.

comment by Matt_Simpson · 2011-07-29T16:04:42.435Z · score: 5 (5 votes) · LW(p) · GW(p)

Let X be a random variable over the interval [0, 10]. Then, by the definition of probability over continuous domains, P(X = 1) = 0.

Only if you have a continuous probability distribution over that domain. It's quite possible to have a probability distribution with, for example, a point mass at 5 such that p(X=5)=0.5.

This is sometimes described as counterintuitive: surely, at any measurement, X must be equal to something, and thus its probability cannot be zero since its clearly happened.

Others have answered this below, but there is another aspect to this I'd like to discuss. All data are discrete. When you measure something, your measurement apparatus is only ever going to give you one of a discrete, finite set of values. (I'm pretty sure about finite, but willing to be corrected). Any probability distribution over the possible values that you might measure with your apparatus can easily satisfy p(X=x)>0 for all x.

Concretely, if you're measuring the length of something with a ruler, you probably just round to the nearest 1/16th of an inch. This means there are only 12*16=192 possible measurements you can make, so you can create any number of probability distributions of these points where each point has p(X=x)>0.

comment by lucidfox · 2011-07-29T18:13:51.220Z · score: 3 (9 votes) · LW(p) · GW(p)

I implicitly meant a continuous distribution. Clarified that in the post now.

Concretely, if you're measuring the length of something with a ruler, you probably just round to the nearest 1/16th of an inch.

As someone who lives in the dangerous and uncharted part of the world called "outside the US', I prefer centimeters. ;)

comment by Alicorn · 2011-07-29T18:50:30.017Z · score: 4 (8 votes) · LW(p) · GW(p)

This one isn't even a matter of neglecting to convert; it's a cultural divide - while I expect you knew what Matt meant, it's entirely possible he didn't know how to translate it for you. Presumably you don't round to the nearest 1.5875 millimeters. What do metric users round to when measuring lengths? Millimeters? Those are little - even littler than sixteenths of an inch! Do most metric rulers even mark them, or do they just mark halfway points between centimeter lines? I don't know.

comment by [deleted] · 2011-07-29T19:16:28.267Z · score: 9 (9 votes) · LW(p) · GW(p)

Yes, millimeters are typically marked, with a special mark half-way at 5mm. Once you're beyond 1m in length one might skip them, but even then rulers often have them. Small things are normally measured in millimeters as well, though usually some tolerance is expected. For example, one of my rings has a diameter of 21.7mm and was advertised as such. Of course, if you don't need this precision, you round to whatever decimal place you care about and use the nearest unit (like in any system). I don't think of millimeters as particularly tiny, more like the basic unit of "smallness".

(And I fully agree with lucidfox. Imperial units are insane.)

comment by handoflixue · 2011-07-29T20:56:12.191Z · score: 5 (5 votes) · LW(p) · GW(p)

Huh, really, that's a cultural divide? I was taught how to do metric measurements in every science class I took, and I knew how to use millimetres before then because I've never seen a ruler that didn't have them marked. Is this truly uncommon knowledge in the US? o.o

comment by Alicorn · 2011-07-29T20:58:02.073Z · score: 3 (5 votes) · LW(p) · GW(p)

I've used metric rulers, in science classes mostly, but I don't think I've used one in years. When I have to measure things, I use a tape measure, which only has inches marked.

comment by handoflixue · 2011-07-29T21:04:20.834Z · score: 2 (2 votes) · LW(p) · GW(p)

Huh, fascinating. Even my cheap "gift from a job" tape measure does metric, so this is news to me :)

comment by NancyLebovitz · 2011-07-31T16:25:57.807Z · score: 1 (1 votes) · LW(p) · GW(p)

A lot of American rulers are marked in both inches and centimeters, though I don't know what the proportion is compared to rulers which are just marked in inches.

comment by lucidfox · 2011-07-30T03:56:13.976Z · score: 3 (3 votes) · LW(p) · GW(p)

What do metric users round to when measuring lengths? Millimeters?

Depends. In casual use, typically centimeters. But yes, as muflax said, metric rulers have individual millimeters marked, and typically they mark half-centimeters with slightly longer bars.

comment by komponisto · 2011-07-31T15:58:33.428Z · score: -1 (3 votes) · LW(p) · GW(p)

As someone who lives in the dangerous and uncharted part of the world called "outside the US', I prefer centimeters.

Feel free to use centimeters in your own examples, then. But you're not entitled to demand that US users do so.

comment by wedrifid · 2011-07-31T21:59:39.155Z · score: 3 (5 votes) · LW(p) · GW(p)

Feel free to use centimeters in your own examples, then. But you're not entitled to demand that US users do so.

She didn't. Matt said, in reply to lucidfox, "you probably just round to the nearest 1/16ths of an inch". Since she, in fact, would not round to such an absurd metric she pointed out what she would actually use. It is rather rude to declare or imply lucidfox is exceeding the bounds of what her status permits to correct a false claim about herself.

comment by komponisto · 2011-07-31T23:19:32.399Z · score: 3 (3 votes) · LW(p) · GW(p)

You would have a point if lucidfox had not written this post (in which a poster's use of "miles per hour" is cited as one of the offenses), but in that case I wouldn't have written the grandparent either.

Context.

comment by NancyLebovitz · 2011-07-31T16:29:06.323Z · score: 2 (2 votes) · LW(p) · GW(p)

"I prefer" with a smiley and some mild snark isn't exactly a demand.

In any case, people are entitled to demand whatever they want, they just aren't entitled to get compliance.

Would it be worth having a convention at LW that measurements should be given in English and metric units?

comment by wedrifid · 2011-08-01T09:16:20.070Z · score: 3 (3 votes) · LW(p) · GW(p)

Would it be worth having a convention at LW that measurements should be given in English and metric units?

Make the convention the use of the existing International System of Units with other units humored as parochial eccentricities. If folks particularly care they can reply with a conversion to the conventional unit.

comment by komponisto · 2011-07-31T16:53:38.312Z · score: 1 (5 votes) · LW(p) · GW(p)

"I prefer" with a smiley and some mild snark isn't exactly a demand.

The context of other comments and posts by the same user caused me to read it as hostile passive-aggression.

Would it be worth having a convention at LW that measurements should be given in English and metric units?

Seems unnecessary. The general convention should be that people are entitled to employ the conventions and terminology in use where they live, or that they themselves are most familiar with. I wouldn't think of demanding that someone in another country talk about their purchasing habits in terms of US dollars, for example.

comment by lucidfox · 2011-07-31T20:11:28.560Z · score: 0 (4 votes) · LW(p) · GW(p)

Where did I demand anything?

comment by thakil · 2011-07-29T12:53:15.192Z · score: 4 (6 votes) · LW(p) · GW(p)

You are misunderstanding what probability means. A probability of 0 does not mean an event will never happen, it means it will almost surely not happen. That is, for any finite possibility one can think of, the probability of that event occuring is less than that. This does not mean that the event can never occur- as you surmise, otherwise we could never observe any result! Basically, infinity is a bit weird.

comment by [deleted] · 2011-07-29T13:09:14.592Z · score: 0 (0 votes) · LW(p) · GW(p)

Interestingly, the words "Almost surely" also has a Wikipedia article that represents some of these mathematical concepts, and there are also related articles on "Almost All" and "Almost Everywhere."

http://en.wikipedia.org/wiki/Almost_surely http://en.wikipedia.org/wiki/Almost_all http://en.wikipedia.org/wiki/Almost_everywhere

comment by lucidfox · 2011-07-29T13:12:25.942Z · score: 1 (1 votes) · LW(p) · GW(p)

When I read thakll's post, I thought they indeed meant the mathematical definition of "almost surely". The domain of an event with probability zero is indeed "almost nowhere" in the rigorous sense, since it is a measure-zero set.

comment by thakil · 2011-07-29T13:19:15.570Z · score: 1 (1 votes) · LW(p) · GW(p)

Yes, thats the concept to which I am refering. The concept comes from measure theory. If you're familiar with I'm not sure why you're confused about probability 0 events. Or are you? Perhaps I'm mis-reading your article.

comment by Matt_Simpson · 2011-07-29T16:12:18.175Z · score: 0 (0 votes) · LW(p) · GW(p)

Yes, thats the concept to which I am refering. The concept comes from measure theory. If you're familiar with I'm not sure why you're confused about probability 0 events.

I think her confusion comes from the fact that if your prior probability that an event happened is 0, no amount of evidence will convince you that it did happen. Suppose your prior probability that some random variable X is equal to 1 is P(X=1)=0. Now suppose you find out that actually, X=1. Then using Baye's rule:

P(X=1|X=1) = P(X=1|X=1)*P(X=1) / denominator

I'll leave the denominator out because the numerator is 0 (the denominator won't be 0), so P(X=1|X=1)=0, which makes no sense.

I don't claim the calculation I did above is correct - I realize conditional probabilities a fraught with difficulties, and I probably violated some rule I don't know about or have forgotten from my measure theory class. However, this does give you intuition for why lucidfox or perhaps someone else would be confused despite having knowledge of measure theory (if this is in fact why it was confusing to him/her).

comment by [deleted] · 2011-07-30T01:10:30.768Z · score: 2 (2 votes) · LW(p) · GW(p)

No finite amount of evidence will convince you. I can be convinced of infinitely unlikely things by an infinite amount of evidence just fine.

And if we're talking about a situation (like real life!) where you can't expect to receive an infinite amount of evidence, then we shouldn't be using probabilities of 0 or 1, either.

comment by lucidfox · 2011-07-29T18:07:59.938Z · score: 1 (1 votes) · LW(p) · GW(p)

Her confusion.

comment by rwallace · 2011-07-29T17:38:54.709Z · score: 3 (3 votes) · LW(p) · GW(p)

My intuition, for what it's worth, works more easily with the binary expansion of the numbers than their interpretation as physical quantities.

From that perspective, "X equals exactly pi" would normally be assigned finite probability because pi is a computable number with finite Kolmogorov complexity; there is a nonzero chance that two processes will generate the same infinite but computable bit stream.

But "X equals exactly Y" where Y is a random incomputable number, is indeed infinitely improbable, because it amounts to a statement that infinitely many coin flips will come out a particular way; the probability is 0.5^infinity, which clearly converges to zero.

comment by Thomas · 2011-07-29T17:11:13.920Z · score: 2 (2 votes) · LW(p) · GW(p)

An uniform distribution over the real interval e.g. [0,1] is possible. An algorithm is the fair coin tossing for each binary place 0 or 1. In the case of all tails you have 0. In the case of all heads - it's 1. An uniform probability distribution with P(x)=0 for every x. They are not impossible, only 0 likely.

But there is no constant probability distribution for only the rational numbers from this interval. Or from any other, for that matter. Nor there is an uniform probability distribution for all the naturals. Or for any infinite subset of naturals.

comment by timtyler · 2011-07-30T09:57:03.781Z · score: 0 (2 votes) · LW(p) · GW(p)

Then of course there is quantum physics, where it is literally impossible for any physical object, including point objects, to have a definite coordinate with arbitrary precision.

Conventionally, in the multiverse, everything is precisely somewhere. What is difficult is finding out exactly where things are.

comment by hairyfigment · 2011-07-30T18:01:36.768Z · score: 0 (0 votes) · LW(p) · GW(p)

How so? We can regard each point within a cloud of amplitude as a 'separate world' in one sense, but I understood that points less than a certain 'distance' away from each other will affect each others' futures in a meaningful way. I thought there exists no fact of the matter one second later as to which of those 'worlds' I came from.

comment by handoflixue · 2011-07-29T21:01:31.328Z · score: 0 (0 votes) · LW(p) · GW(p)

Given that the sum (0+0+0...) = 0, wouldn't that imply that P(any value at all) = 0, and that you actually cannot produce a result in this system?

Which, admittedly, strikes me as a perfectly reasonable result, given you can't actually have a continuous distribution in reality, and I'm not aware of any randomization method that could actually meet these requirements.

comment by Manfred · 2011-07-29T22:14:00.251Z · score: 2 (2 votes) · LW(p) · GW(p)

you can't actually have a continuous distribution in reality

The crushing majority of evidence suggests that continuous distributions are what reality is built on.

The problem with integrating 0 to get P(anything) = 0 is that you can't switch the order in which you take limits - the limit that gives you P(X=x) = 0 is outside the integral, and the integral itself behaves like a limit (remember Riemann sums?). So if you switch the order of the limits by integrating 0, you have committed an illegal operation.

comment by handoflixue · 2011-07-29T22:51:00.771Z · score: 1 (1 votes) · LW(p) · GW(p)

Augh, my mistake. This is why I am currently doing a math refresher. Thank you :)

comment by Manfred · 2011-07-29T23:59:24.290Z · score: 1 (1 votes) · LW(p) · GW(p)

Yeah, it makes all sorts of sense to just set things to values, but once you start using limits that breaks things. Stupid limits.

comment by timtyler · 2011-07-30T09:53:03.002Z · score: 0 (0 votes) · LW(p) · GW(p)

The crushing majority of evidence suggests that continuous distributions are what reality is built on.

Not really. Lots of discrete things look continuous - if you stand far enough back.

comment by Manfred · 2011-07-30T10:38:30.878Z · score: 1 (1 votes) · LW(p) · GW(p)

Alright, I'm curious. Are you claiming that the probability distributions that come out of quantum mechanics are discrete?

comment by timtyler · 2011-07-30T11:33:56.334Z · score: 2 (2 votes) · LW(p) · GW(p)

If you are not familiar with the idea, perhaps, see: http://en.wikipedia.org/wiki/Digital_physics

comment by Manfred · 2011-07-30T19:21:28.422Z · score: 0 (0 votes) · LW(p) · GW(p)

I am familiar with the idea. I just don't see where the evidence is. Sure, quantizing space fits well with there being a maximum entropy of space, but this seems like a classical solution to a very non-classical problem, and it eliminates relativity in the process.

comment by timtyler · 2011-07-30T20:02:53.930Z · score: 3 (3 votes) · LW(p) · GW(p)

You are the one claiming that "the crushing majority of evidence" opposes discrete theories.

My position is more that we can barely see anything down that far, and so we have very little experimental evidence about whether the universe is continuous or discrete.

In the absence of evidence, assuming uncomputable physics seems to be counter-intuitive to me. We don't know of anything else that is uncomputable.

comment by Manfred · 2011-07-30T21:09:08.196Z · score: 0 (0 votes) · LW(p) · GW(p)

We don't know of anything else that is uncomputable.

We're talking about the entire universe here, so it would be just as valid to say we don't know of anything else that is (discretely) computable.

And yeah, there is always some level of discreteness that would have no impact on our observations, just like there is some level of teapots in the asteroid belt that would have no impact on our observations. You're right that that sort of thing isn't ruled out by the evidence, so my statement was wrong.

comment by timtyler · 2011-07-31T10:33:04.719Z · score: 2 (2 votes) · LW(p) · GW(p)

Teapots in the asteroid belt are contrary to Occam's razor. The situation with discrete physics is very different. Science has a long history of showing that apparently-continuous phenomena actually turn out to be grainy on a smaller scale.

comment by wedrifid · 2011-07-29T21:59:34.816Z · score: -1 (1 votes) · LW(p) · GW(p)

P(X = exact value) = 0: Is it really counterintuitive?

Not counterintuitive, just annoying. Also used as an excuse to conclude silly things by playing with infinities (as you allude to).