No Universal Probability Space
post by Gordon Seidoh Worley (gworley) · 2009-05-06T02:58:06.165Z · LW · GW · Legacy · 43 commentsContents
43 comments
This afternoon I heard a news story about a middle eastern country where one person said of the defenses for a stockpile of nuclear weapons, "even if there is only a 1% probability of the defenses failing, we should do more to strengthen them given the consequences of their failure". I have nothing against this person's reasoning, but I do have an issue with where that 1% figure came from.
The statement above and others like it share a common problem: they are phrased such that it's unclear over what probability space the measure was taken. In fact, many journalist and other people don't seem especially concerned by this. Even some commenters on Less Wrong give little indication of the probability space over which they give a probability measure of an event, and nobody calls them on it. So what is this probability space they are giving probability measurements over?
If I'm in a generous mood, I might give the person presenting such a statement the benefit of the doubt and suppose they were unintentionally ambiguous. On the defenses of the nuclear weapon stockpile, the person might have meant to say "there is only a 1% probability of the defenses failing over all attacks", as in "in 1 attack out of every 100 we should expect the defenses to fail". But given both my experiences with how people treat probability and my knowledge of naive reasoning about probability, I am dubious of my own generosity. Rather, I suspect that many people act as though there were a universal probability space over which they may measure the probability of any event.
To illustrate the issue, consider the probability that a fair coins comes up heads. We typically say that there is a 1/2 chance of heads, but what we are implicitly saying is that given a probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}}), P({heads}) = P({tails}) = 1/2 and P({}) = 0 and P({heads, tails}) = 1. But if we look at the issue of a coin coming up heads from a wider angle, we could interpret it as "what is the probability of some particular coin sitting heads-up over the span of all time", which is another question all together. What this is asking is "what is the probability of the event that this coin sits heads-up over the universal probability space", i.e. the probability space of all events that could occur at some time during the existence of the universe, and we have no clear way to calculate the probability of such an event other than to say that the universal probability space must contain infinitely many (how infinitely is still up for debate) events of measure zero. So there is a universal probability space; it's just not very useful to us, hence the title of the article, since it practically doesn't exist for us.
None of this is to say, though, that the people committing these crimes against probability are aware of what probability space they are taking a measure over. Many people act as if there is some number they can assign to any event which tells them how likely it is to occur and questions of "probability spaces" never enter their minds. What does it mean that something happens 1% of the time? I don't know; maybe that it doesn't happen 99% of the time? How is 1% of the time measured? I don't know; maybe one out of every 100 seconds? Their crime is not one of mathematical abuse but of mathematical ignorance.
As aspiring rationalists, if we measure a probability, we ought to know over what probability space we're measuring. Otherwise a probability isn't well defined and is just another number that, at best, is meaningless and, at worst, can be used to help us defeat ourselves. Even if it's not always a good stylistic choice to make the probability space explicit in our speech and writing, we must always know over what probability space we are measuring a probability. Otherwise we are just making up numbers to feel rational.
43 comments
Comments sorted by top scores.
comment by Vladimir_Nesov · 2009-05-06T07:49:21.817Z · LW(p) · GW(p)
Use common sense. This is no different from other matters of imprecise communication. Calling up on the meaninglessness of assertions about coin flips really does sound silly.
comment by Peter_de_Blanc · 2009-05-07T22:38:10.139Z · LW(p) · GW(p)
probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}}), P({heads}) = P({tails}) = 1/2 and P({}) = P({heads, tails}) = 0
I think you mean P({heads, tails}) = 1.
Replies from: gworley↑ comment by Gordon Seidoh Worley (gworley) · 2009-05-08T04:51:54.315Z · LW(p) · GW(p)
Wow, that's a pretty bad error to miss. Fixed.
Replies from: gworley↑ comment by Gordon Seidoh Worley (gworley) · 2009-05-08T05:24:01.499Z · LW(p) · GW(p)
The more I think about it, the sadder it is that it took LW a day and a half to catch that error.
comment by badger · 2009-05-06T05:20:32.791Z · LW(p) · GW(p)
We typically say that there is a 1/2 chance of heads, but what we are implicitly saying is that given a probability measure P on the measurable space ({heads, tails}, {{}, {heads}, {tails}, {heads, tails}})
Giving a probability of one event only implies we think that particular event is possible. It doesn't say anything about what other events we are considering, so there is no necessity to describe the entire space of possibilities.
Many people act as if there is some number they can assign to any event which tells them how likely it is to occur and questions of "probability spaces" never enter their minds. What does it mean that something happens 1% of the time? I don't know; maybe that it doesn't happen 99% of the time? How is 1% of the time measured? I don't know; maybe one out of every 100 seconds?
Under a Bayesian interpretation of probability, which is generally used here, probability does not express how frequent something will occur. Instead, it represents your belief an event will occur or a proposition is true. Then p=0.01 means "possible enough to consider, but very doubtful". I think most people naturally adopt a Bayesian perspective, so I'm not sure what the problem is.
Replies from: gworley↑ comment by Gordon Seidoh Worley (gworley) · 2009-05-08T05:15:31.966Z · LW(p) · GW(p)
Giving a probability of one event only implies we think that particular event is possible. It doesn't say anything about what other events we are considering, so there is no necessity to describe the entire space of possibilities.
Just because you don't care about measuring other probabilities in the space doesn't mean that you can ignore it. If you don't know what the space is, it's like taking a blank piece of paper, putting an "x" on it, and saying that's where the treasure is buried: not only do you not know the territory, but you don't even know enough about the map for that "x" to have any value.
I think most people naturally adopt a Bayesian perspective, so I'm not sure what the problem is.
I think you're giving too much credit here. Go out and slip into casual conversation a remark about the probability of something and see how people treat it. You could be right about the human brain, though, and maybe it's really a First World problem created by "numerical literacy" education in schools to try to help people read the news. Every time they hear a percentage they think of the frequentist interpretation they learned in school.
comment by thomblake · 2009-05-07T21:54:33.227Z · LW(p) · GW(p)
I'm surprised more people don't take this point seriously here. Really, if someone says there's a 1% chance of the defenses failing, what does that actually mean? Sure, in this context it might not matter because the point stands, but probabilities are misused this way all the time.
comment by AdeleneDawner · 2009-05-06T08:44:38.733Z · LW(p) · GW(p)
This sounds just like something I've always wondered about: the percentages they give in weather reports, for likelihood of rain. Does '30% chance of rain' mean that they estimate a 30% chance of getting any rain, or that they think it'll rain for 30% of the day, or what?
Replies from: thomblake, Cosmos, MrHen↑ comment by thomblake · 2009-05-07T21:48:32.930Z · LW(p) · GW(p)
Yes, it means 30% chance of any rain today, and as Cosmos points out it's based primarily on historical data: 30% of historical situations significantly like this have head rain. And since the estimates update based on new data, they're practically tautologous.
I don't have a reference for that though.
↑ comment by Cosmos · 2009-05-06T14:40:12.725Z · LW(p) · GW(p)
I thought this was based off of historical data, although I don't remember the source and could easily be wrong.
If I am not wrong, it should be interpreted as: "It has rained 30% of the time we have had similar weather conditions in the past."
↑ comment by MrHen · 2009-05-06T12:46:50.218Z · LW(p) · GW(p)
My memory is telling me that it should be translated, "We expect 30% of the geographic area to get rain today." I have no good reference for this.
Replies from: gworley↑ comment by Gordon Seidoh Worley (gworley) · 2009-05-08T05:04:15.899Z · LW(p) · GW(p)
It depends on where you live. According to The Straight Dope it is typically used to mean probability it will rain compared to historical data. But in some places it is used differently, especially where the weather conditions permit more definite rain forecasts. I live in Florida and there is rarely a question of whether or not it will rain but rather a question of how much geographical area will receive rain. During the wet months the probability of any rain is very high, so it's only a matter of how much area is going to get hit. During the dry months, the only rain we receive typically comes in the form of fronts moving down from the north, so again chances of any rain tend towards 1 or 0 while the coverage can vary.
comment by CannibalSmith · 2009-05-06T08:39:50.668Z · LW(p) · GW(p)
Please link to the news story you're referring to.
comment by smoofra · 2009-05-06T15:09:24.975Z · LW(p) · GW(p)
I think most LW people take a Bayesian view of probability, where probability is a consistent, numerical measure of how confident we are that a proposition is true, updated according to Bayes rule, etc. You're advocating the mathematical view of probability, where probability really means "probability measure", ie the measure on a measure space of measure 1.
It's not that one of these views is right and the other is wrong. They're actually describing two slightly differentthings. Measure-theoretic probability is something we can have a completely rigorous mathematical theory of, because it's assumptions are technical and precise. But having a mathematical theory doesn't necessarily mean you can apply it to the real world. You need knowledge and judgment to determine what real-world phenomena are modeled by your theory and how well. Measure theoretic probability models card games very well (measure space = the uniform distribution over the 52! ways that the deck could be shuffled) if the dealer is honest, but it doesn't help you decide weather to trust the dealer.
Bayesian probability is a less rigorous, but more directly-applicable-to-real-life theory. It shares a lot of terminology and theorems with measure theoretic probability, but it isn't quite the same thing. In particular, in Bayesian probability you don't have a probability space, so when you see people here talking probability without specifying the space, it's not an error, they're just being Bayesian.
Jayne's "Probability Theory" is a great book on Bayesian probability theory, but if you've got a Mathematical education, which I suspect you might, it's going to piss you off. Just ignore the parts where Jaynes bloviates about things he doesn't understand, and learn from the parts where he teaches the things he does. Most of the book is the latter.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T15:22:23.387Z · LW(p) · GW(p)
Er... So, where does the measure-theoretic definition of probability become incompatible with "Bayesian probability" you talk about? Can you give a reference that supports your position? (I understand there are disputes on foundations, but representationally these all seem to be exactly the same thing.)
Replies from: smoofra↑ comment by smoofra · 2009-05-06T16:09:19.910Z · LW(p) · GW(p)
Who said they were incompatible? I only said they're different. In measure-theoretic probability theory, you start with a space, in Bayseian, you don't. In measure-theory land the propositions that get assigned probabilities must be subsets of the space, in Bayes-land they can be anything that's true or false (or will be in the future) in the real world.
The difference between the two is not a dispute on foundations. They really are two different, but overlapping theories. Measure-theoretic probability theory is a formal mathematical theory, like group theory or point-set topology, It's a set of theorems about mathematical objects (probability measures) that satisfy certain axioms. Those objects may be good models of something in real life, or they may not. Either way the theorems are still true. For a reference on this topic, see, well, any book on measure theory. There are lots of them. Here's one
Bayesian theory is just not that formal. What are the axioms of Bayesian theory? What propositions are allowed? How do you select Priors? Bayesian probability may use a lot of math, but math isn't what it is. It's more like physics than group theory.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T16:25:40.550Z · LW(p) · GW(p)
Bayesian probability may use a lot of math, but math isn't what it is.
Yet it seems that math is what it should be. Bayesian probability, as it's used in probabilistic inference, is usually founded on the same Kolmogorov axioms, standard mathematical probability theory. I don't see any problems with the mathematical part, I dispute your characterization of Bayesian probability as an inherently informal theory (hence it was taken in quotation marks in my comment).
Replies from: Cyan, smoofra↑ comment by Cyan · 2009-05-06T17:13:38.156Z · LW(p) · GW(p)
I think smoofra is talking about the same sorts of things Jaynes is when he writes:
The danger here is particularly great because mathematicians generally regard these limit theorems as the most important and sophisticated fruits of probability theory, and have a tendency to use language which implies that they are proving properties of the real world. Our point is that these theorems are valid properties of the abstract mathematical model that was defined and analyzed [emphasis in original]. The issue is: to what extent does that model resemble the real world? It is probably safe to say that no limit theorem is directly applicable in the real world, simply because no mathematical model captures every circumstance that is relevant in the real world.
- PT:LOS, pp 65-66.
Replies from: smoofra, Vladimir_Nesov↑ comment by smoofra · 2009-05-06T17:29:54.436Z · LW(p) · GW(p)
ADBOC
Jaynes aggressively scorns abstract mathematics. I love abstract mathematics. We both agree that just because you have a model or a theorem, it doesn't necessarily apply to the real world.
edit: (ADBOC directed to jaynes, not to cyan)
Replies from: Cyan↑ comment by Vladimir_Nesov · 2009-05-06T17:19:21.376Z · LW(p) · GW(p)
Yep, you shouldn't wirehead yourself into developing a theory about the mathematical formalism, you should instead develop a theory about the world. But the theory that you develop should be mathematical where possible.
Replies from: smoofra↑ comment by smoofra · 2009-05-06T17:36:17.178Z · LW(p) · GW(p)
There's nothing wrong with doing pure Math, if you know that's what your doing.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T17:42:56.032Z · LW(p) · GW(p)
Arguably there may be, if it can be shown that you normatively should worry only about the real world, even if what you are doing in the real world is thinking math.
Replies from: smoofra↑ comment by smoofra · 2009-05-06T19:00:45.530Z · LW(p) · GW(p)
if it can be shown that you normatively should worry only about the real world,
It can't be. Not in any system of norms I would give a fig about. Art, Fiction, and Math are worthwhile. They don't have to be useful. If you disagree with that, then we simply have different utility functions, and there's no point in arguing further.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T19:50:02.558Z · LW(p) · GW(p)
You are seeing "useful" too narrowly. I only stated that whatever you consider "useful", it's probably a statement exclusively about the real world, and "doing math" is one of the activities in the real world. I don't see how you could place Art in the same cached thought, since it was remarked many times that you shouldn't go Spock.
↑ comment by smoofra · 2009-05-06T17:35:06.406Z · LW(p) · GW(p)
Any theory about the real world is inherently informal.
Do you disagree that Bayesian probability theory is about as informal as physics, or do you disagree with my characterization of physics as informal? If it's the latter, then we don't disagree on anything except the meaning of words.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T17:40:18.391Z · LW(p) · GW(p)
A theory about the real world may be perfectly formal, it just won't have a perfectly formal applicability proof. On the other hand, if you can show that a theory is applicable with probability of 1-2^{-10000}, it's as good as formally proven to apply.
I disagree that it's correct terminology to call a theory informal, just because it's can't be formally proven to apply to the real world.
Replies from: smoofra↑ comment by smoofra · 2009-05-06T18:58:03.164Z · LW(p) · GW(p)
It's not the lack of a proof that makes it informal, it's that the elements themselves of the theory aren't precisely, formally, mathematically defined. A valid proposition in measure-theoretic probability is a subset of the measure space. nothing else will do. Propositions in Bayseian probability are written in natural language, about events in the real world.
I'm using the word "formal" in the sense that it is used in mathematics. If you're going to say that propositions written in natural language, about events in the real world are "formal" in that sense, then you're just refusing to communicate.
comment by kim0 · 2009-05-06T08:20:40.423Z · LW(p) · GW(p)
All recursive probability spaces converge to the same probabilities, as the information increases.
Not that those people making up probabilities knows anything about that.
If you want an universal probability space, just take some universal computer, run all programs on it, and keep those that output event A. Then you can see how many of those that output event B, and thus you can get p(B|A) whatever A and B are.
This is algorithmic information theory, and should be known by any black belt bayesian.
Kim Øyhus
Replies from: Vladimir_Nesov, smoofra↑ comment by Vladimir_Nesov · 2009-05-06T08:26:35.047Z · LW(p) · GW(p)
All recursive probability spaces converge to the same probabilities, as the information increases.
Google gives 0 hits on "recursive probability space". Blanket assertions like this need to be technically precise.
I refer interested readers to the Algorithmic probability article on Scholarpedia.
Replies from: kim0↑ comment by kim0 · 2009-05-06T08:42:32.838Z · LW(p) · GW(p)
The technically precise reference was this part:
"This is algorithmic information theory,.."
But if you claim my first line was too obfuscated, I can agree.
Kim Øyhus
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-06T08:44:46.049Z · LW(p) · GW(p)
Please specify in what sense the first line was correct, or declare it an error. Pronouncing assertions known to be incorrect and then just shrugging that off shouldn't be acceptable on this forum.
Replies from: kim0↑ comment by kim0 · 2009-05-06T19:53:25.022Z · LW(p) · GW(p)
O.K.
One wants an universal probability space where one can find the probability of any event. This is possible:
One way of making such a space is to take all recursive functions of some universal computer, run them, and storing the output, resulting in an universal probability space because every possible set of events will be there, as the results of infinitely many recursive functions, or programs as they are called. The probabilities corresponds to the density of these outputs, these events.
A counterargument is that it is too dependent on the actual universal computer chosen. However, theorems in algorithmic information theory shows that this dependence converges asymptotically as information increases, because the difference of densities of different outputs from different universal computers can at most be 2 to the power of the shortest program simulating the universal computer in another universal computer.
Kim Øyhus
Replies from: smoofra↑ comment by smoofra · 2009-05-07T13:12:02.576Z · LW(p) · GW(p)
One way of making such a space is to take all recursive functions of some universal computer, run them, and storing the output,
OK....
resulting in an universal probability space because every possible set of events will be there
what!? You haven't yet described a probability space. The aforementioned set is infinite, so the uniform distribution is unavailable. What probability distribution will you have on this set of recursive-function-runs. And in what way is the resulting probability space universal?
Replies from: kim0↑ comment by kim0 · 2009-05-07T14:43:37.826Z · LW(p) · GW(p)
All that will be answered if you study "algorithmic information theory", or "Kolmogorov komplexity" as it is also called. You can find some of it on the net of course, or you can read the book "An Introduction to Kolmogorov Complexity and its Applications" by Ming Li & Paul Vitányi. Thats the book I read, some years after I invented it myself.
Replies from: smoofra↑ comment by smoofra · 2009-05-07T16:12:14.887Z · LW(p) · GW(p)
All that will be answered if you study "algorithmic information theory", or "Kolmogorov komplexity"
I have. I'm not an expert in it, but I'm quite aware of the concept.
You have not specified a probability space, and you have not made any attempt to justify calling the space you didn't specify "universal"
Thats the book I read, some years after I invented it myself.
uh huh.
Replies from: kim0↑ comment by kim0 · 2009-05-08T09:51:52.827Z · LW(p) · GW(p)
You are wrong because I did specify a probability space.
The probability space I specified was one where the sample space was the set of all outputs of all programs for some universal computer, and the measure was one from the book I mentioned. One could for instance choose the Solomonoff measure, from 4.4.3.
From your writings I conclude that is it quite likely that you are neither quite aware of the concept, nor understanding what I write, while believing you do.
Replies from: smoofra↑ comment by smoofra · 2009-05-08T16:49:37.331Z · LW(p) · GW(p)
You are wrong because I did specify a probability space.
No, you specified the points and not the measure.
The probability space I specified was one where the sample space was the set of all outputs of all programs for some universal computer, and the measure was one from the book I mentioned. One could for instance choose the Solomonoff measure, from 4.4.3.
OK! Now we've got a space. Of course if you wanted to talk about solomonoff measure, why didn't you just say so 5 comments ago. Pretty much everyone reading less wrong would have immediately known what you were talking about. You still haven't justified calling the solomonoff space "universal".
From your writings I conclude that is it quite likely that you are neither quite aware of the concept, nor understanding what I write, while believing you do.
Now you're just being rude. You don't know me, you certainly don't know what I do or don't know.
Replies from: kim0↑ comment by kim0 · 2009-05-08T20:03:34.988Z · LW(p) · GW(p)
It is universal, because every possible sequence is generated.
It is universal, because it is based on universally recursive functions.
It is universal, because it uses an universal computer.
People knowing algorithmic complexity know that it is about probability measures, spaces, universality, etc. You apparently did not, while nitpicking instead.
Replies from: smoofra↑ comment by smoofra · 2009-05-08T21:01:56.236Z · LW(p) · GW(p)
I'm not nitpicking, you're wrong. "Universal" in this context means, to quote the original poster
i.e. the probability space of all events that could occur at some time during the existence of the universe
What the heck does that have to do with every possible sequence being generated? For that matter what does it have to do with sequences at all? The solomonoff measure is a measure over sequences in a finite alphabet, or to put it simpler, Integers. How do I express an event like "it will rain next tuesday" as a subset of the integers?
Whatever you are using the word "universal" to mean, it is not anything like what the OP had in mind. The Solomonoff measure is an interesting mathematical object for sure, and it may be quite relevant to the topic of real-world Bayesian reasoning, but it's obviously not universal in that sense.
also: what the heck does "universally recursive" mean? Did you just make up that term right now? Because I've never heard it before, it only has 10 google hits, and none of them are relevant to this discussion.
Replies from: kim0↑ comment by kim0 · 2009-05-08T22:15:09.235Z · LW(p) · GW(p)
Events, and the universe itself, are encodable as sequences.
This means that events, and possible universes, are a subset of the sequences generated from the universal computer.
Algorithmic information theory can then be used to find probabilities for events and universes.
This is one of the CENTRAL POINTS of Algorithmic Information Theory.
What I am doing now, is teaching you A.I.T., while you wrongly claim you understand it, and wrongly claim I do not, despite an amount of evidence to the contrary. I therefore conclude that you are not very rational.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2009-05-08T22:33:51.731Z · LW(p) · GW(p)
kim0, you are trolling now. You are not communicating clearly, and then claim that the objections to your unclear communication are invalid, because you can retroactively amend the bad connotations and ambiguities, but in the process of doing so, you introduce further false-sounding and ambiguous statements. You should choose your words more carefully.