Probability interpretations: Examples

post by So8res · 2019-05-11T20:32:14.841Z · LW · GW · 23 comments

Contents

  Betting on one-time events
  A coin with an unknown bias
  Probability that the 98,765th decimal digit of π is 0
None
23 comments

(Written for Arbital in 2016.)


Betting on one-time events

Consider evaluating, in June of 2016, the question: "What is the probability of Hillary Clinton winning the 2016 US presidential election?"

On the propensity view, Hillary has some fundamental chance of winning the election. To ask about the probability is to ask about this objective chance. If we see a prediction market in which prices move after each new poll — so that it says 60% one day, and 80% a week later — then clearly the prediction market isn't giving us very strong information about this objective chance, since it doesn't seem very likely that Clinton's real chance of winning is swinging so rapidly.

On the frequentist view, we cannot formally or rigorously say anything about the 2016 presidential election, because it only happens once. We can't observe a frequency with which Clinton wins presidential elections. A frequentist might concede that they would cheerfully buy for $1 a ticket that pays $20 if Clinton wins, considering this a favorable bet in an informal sense, while insisting that this sort of reasoning isn't sufficiently rigorous, and therefore isn't suitable for being included in science journals.

On the subjective view, saying that Hillary has an 80% chance of winning the election summarizes our knowledge about the election or our state of uncertainty given what we currently know. It makes sense for the prediction market prices to change in response to new polls, because our current state of knowledge is changing.


A coin with an unknown bias

Suppose we have a coin, weighted so that it lands heads somewhere between 0% and 100% of the time, but we don't know the coin's actual bias.

The coin is then flipped three times where we can see it. It comes up heads twice, and tails once: HHT.

The coin is then flipped again, where nobody can see it yet. An honest and trustworthy experimenter lets you spin a wheel-of-gambling-odds — reducing the worry that the experimenter might know more about the coin than you, and be offering you a deliberately rigged bet — and the wheel lands on (2 : 1). The experimenter asks if you'd enter into a gamble where you win $2 if the unseen coin flip is tails, and pay $1 if the unseen coin flip is heads.

On a propensity view, the coin has some objective probability between 0 and 1 of being heads, but we just don't know what this probability is. Seeing HHT tells us that the coin isn't all-heads or all-tails, but we're still just guessing — we don't really know the answer, and can't say whether the bet is a fair bet.

On a frequentist view, the coin would (if flipped repeatedly) produce some long-run frequency of heads that is between 0 and 1. If we kept flipping the coin long enough, the actual proportion of observed heads is guaranteed to approach arbitrarily closely, eventually. We can't say that the next coin flip is guaranteed to be H or T, but we can make an objectively true statement that will approach to within epsilon if we continue to flip the coin long enough.

To decide whether or not to take the bet, a frequentist might try to apply an unbiased estimator to the data we have so far. An "unbiased estimator" is a rule for taking an observation and producing an estimate of , such that the expected value of is . In other words, a frequentist wants a rule such that, if the hidden bias of the coin was in fact to yield 75% heads, and we repeat many times the operation of flipping the coin a few times and then asking a new frequentist to estimate the coin's bias using this rule, the average value of the estimated bias will be 0.75. This is a property of the estimation rule which is objective. We can't hope for a rule that will always, in any particular case, yield the true from just a few coin flips; but we can have a rule which will provably have an average estimate of , if the experiment is repeated many times.

In this case, a simple unbiased estimator is to guess that the coin's bias is equal to the observed proportion of heads, or 2/3. In other words, if we repeat this experiment many many times, and whenever we see heads in 3 tosses we guess that the coin's bias is , then this rule definitely is an unbiased estimator. This estimator says that a bet of $2 vs. $1 is fair, meaning that it doesn't yield an expected profit, so we have no reason to take the bet.

On a subjectivist view, we start out personally unsure of where the bias lies within the interval [0, 1]. Unless we have any knowledge or suspicion leading us to think otherwise, the coin is just as likely to have a bias between 33% and 34%, as to have a bias between 66% and 67%; there's no reason to think it's more likely to be in one range or the other.

Each coin flip we see is then evidence about the value of , since a flip H happens with different probabilities depending on the different values of , and we update our beliefs about using Bayes' rule. For example, H is twice as likely if than if so by Bayes's Rule we should now think is twice as likely to lie near as it is to lie near .

When we start with a uniform prior, observe multiple flips of a coin with an unknown bias, see heads and tails, and then try to estimate the odds of the next flip coming up heads, the result is Laplace's Rule of Succession which estimates () : () for a probability of .

In this case, after observing HHT, we estimate odds of 2 : 3 for tails vs. heads on the next flip. This makes a gamble that wins $2 on tails and loses $1 on heads a profitable gamble in expectation, so we take the bet.

Our choice of a uniform prior over was a little dubious — it's the obvious way to express total ignorance about the bias of the coin, but obviousness isn't everything. (For example, maybe we actually believe that a fair coin is more likely than a coin biased 50.0000023% towards heads.) However, all the reasoning after the choice of prior was rigorous according to the laws of probability theory, which is the only method of manipulating quantified uncertainty that obeys obvious-seeming rules about how subjective uncertainty should behave.


Probability that the 98,765th decimal digit of π is

What is the probability that the 98,765th digit in the decimal expansion of π is 0?

The propensity and frequentist views regard as nonsense the notion that we could talk about the probability of a mathematical fact. Either the 98,765th decimal digit of π is or it's not. If we're running repeated experiments with a random number generator, and looking at different digits of π, then it might make sense to say that the random number generator has a 10% probability of picking numbers whose corresponding decimal digit of π is . But if we're just picking a non-random number like 98,765, there's no sense in which we could say that the 98,765th digit of π has a 10% propensity to be , or that this digit is with 10% frequency in the long run.

The subjectivist considers probabilities to just refer to their own uncertainty. So if a subjectivist has picked the number 98,765 without yet knowing the corresponding digit of π, and hasn't made any observation that is known to them to be entangled with the 98,765th digit of π, and they're pretty sure their friend hasn't yet looked up the 98,765th digit of π either, and their friend offers a whimsical gamble that costs $1 if the digit is non-zero and pays $20 if the digit is zero, the Bayesian takes the bet.

Note that this demonstrates a difference between the subjectivist interpretation of "probability" and Bayesian probability theory. A perfect Bayesian reasoner that knows the rules of logic and the definition of π must, by the axioms of probability theory, assign probability either 0 or 1 to the claim "the 98,765th digit of π is a " (depending on whether or not it is). This is one of the reasons why perfect Bayesian reasoning is intractable. A subjectivist that is not a perfect Bayesian nevertheless claims that they are personally uncertain about the value of the 98,765th digit of π. Formalizing the rules of subjective probabilities about mathematical facts (in the way that probability theory formalized the rules for manipulating subjective probabilities about empirical facts, such as which way a coin came up) is an open problem; this in known as the problem of logical uncertainty.

23 comments

Comments sorted by top scores.

comment by Chris_Leong · 2019-05-12T08:42:33.358Z · LW(p) · GW(p)

"The propensity and frequentist views regard as nonsense the notion that we could talk about the probability of a mathematical fact" - couldn't a frequentist define a reference class using all the digits of Pi? And then assume that the person knows nothing about Pi so that they throw away the place of the digit?

comment by shminux · 2019-05-12T00:30:17.394Z · LW(p) · GW(p)

A perfect Bayesian reasoner that knows the rules of logic and the definition of π must, by the axioms of probability theory, assign probability either 0 or 1 to the claim "the 98,765th digit of π is a 0" (depending on whether or not it is). This is one of the reasons why perfect Bayesian reasoning is intractable. A subjectivist that is not a perfect Bayesian nevertheless claims that they are personally uncertain about the value of the 98,765th digit of π.

The term "perfect Bayesian" sounds misleading, there is nothing perfect about one's inability to make good probability estimates. This is like saying a "perfect two-boxer".

On a related note, what you call the open problem of logical uncertainty is one of the cases where postulating an objective reality (in this case, a mathematical reality), also known on this site as "the territory" runs into limitations. Once you stop insisting that any yet unmeasured value or an unproven theorem is either true or false (or undecidable), but go with the more intuitionist approach, the made-up contradiction between "but there is a 98,765th digit of π out there that has a definite value" and "before calculating the 8,765th digit of π (in effect, making an observation) the best model of π predicts equal probability of all digits" dissolves.

Replies from: SaidAchmiz
comment by Said Achmiz (SaidAchmiz) · 2019-05-12T01:41:02.224Z · LW(p) · GW(p)

I think I understand what your view means with respect to physical uncertainty, but I’m not sure what it means w.r.t. logical uncertainty. Surely, there must be some fact of the matter about what the ratio of a circle’s circumference to its diameter is? Or is there not? And if there is, does that not imply some fact of the matter about any given digit of π, even if I don’t know what said digit is?

Replies from: shminux
comment by shminux · 2019-05-12T07:23:53.497Z · LW(p) · GW(p)

Surely, there must be some fact of the matter about what the ratio of a circle’s circumference to its diameter is?

This is exactly the issue at hand. You believe in external mathematical "facts", ideal platonic objects. The mathematical territory. This is a useful belief at times, but not in this case, as it gets in the way of making otherwise obvious predictions about observations, such as "how likely that a randomly picked digit of π is zero, once it is picked, but not yet calculated?"

Replies from: SaidAchmiz
comment by Said Achmiz (SaidAchmiz) · 2019-05-12T09:01:50.984Z · LW(p) · GW(p)

Well, let me put it another way. Suppose that I calculate the 98,765th digit of π. And my friend Hasan, who lives on the other side of the world, also, separately, calculates the 98,765th digit of π. Can we get different results? (Other than by making some mistake in writing the code that does the calculation, or some such.) Is that a thing that can happen? What is the probability of the 98,765th digit of π being one thing when calculated by one person, but something else when calculated by someone else, elsewhere? (And if nonzero, how far does this go—could the 1,500th digit of π vary from person to person? The 220th? The 30th? The 3rd?!)

If you say that this sort of thing can happen, well, then you’re certainly saying something novel and strange. I guess all I have to say to that is “[citation needed]”. But, if (as seems more likely) you agree that such a thing cannot happen, then my question is: just what exactly is it that makes the 98,765th of π be the same thing when calculated by me, or by Hasan, or by anyone else? Whatever that thing is, what is wrong with calling it “a fact of the matter about what the 98,765th digit of π is”?

Replies from: shminux
comment by shminux · 2019-05-12T17:38:20.150Z · LW(p) · GW(p)

You seem to be conflating two different questions:

What is your best estimate of probability of the currently unknown to you 98,765th digit of π coming out zero, once someone calculates it?

and

What is your best estimate of probability of the 98,765th digit of π calculated by two different people being different?

Once enough people reliably do the same calculation (or if there is another reliable way to perform the observation of the 98,765th digit of π), then it can be added to the list of performed observations and, if needed used to predict future observations.

just what exactly is it that makes the 98,765th of π be the same thing when calculated by me, or by Hasan, or by anyone else? Whatever that thing is, what is wrong with calling it “a fact of the matter about what the 98,765th digit of π is”

This goes back to realism vs anti-realism, not anything I had invented. Anti-realism is a self-consistent epistemology, it pops up in many areas independently. According to Wikipedia, in science an example of it in science is instrumentalism, and in math it is intuitionism: "there are no non-experienced mathematical truths".

There is no difference between logical uncertainty and environmental uncertainty in anti-realism. OP seems to have reinvented the juxtaposition of realism and anti-realism in the setting of the probability theory, calling it "perfect Bayesianism" and "subjective Bayesianism" respectively. And "perfect Bayesianism" runs into trouble with logical vs environmental uncertainties, because of the extra (and unnecessary, in the anti-realist view) postulate of objective reality.

Replies from: quanticle
comment by quanticle · 2019-05-12T17:58:55.036Z · LW(p) · GW(p)

I still don't think you've answered Said's question. The question is whether two people can observe different values of pi. Or, to put it differently, why is it that, whenever anyone computes a value of pi, it seems to come out to the same value (3.14159...). Doesn't that indicate that there is some kind of objective reality, to which our mathematics corresponds?

One of the questions that Wigner brings up in The Unreasonable Effectiveness of Mathematics in the Natural Sciences is why does our math work so well at predicting the future? I would put the same question to you, but in a more general form. If there is no such thing as non-experienced mathematical truths, then why does everyone's experience of mathematical truths seem to be the same?

Replies from: shminux
comment by shminux · 2019-05-12T19:16:48.460Z · LW(p) · GW(p)

Doesn't that indicate that there is some kind of objective reality, to which our mathematics corresponds?

A reality behind repeatable observations is a good model, as long as it works. My point is that it doesn't always work, like in the confusion about logical uncertainty.

And I disagree with the assumptions behind the Wigner's question, "why does our math work so well at predicting the future?", specifically that math's effectiveness is "unreasonable". Human and animal brains do complicated calculations all the time in real time to get through life, like solving what amounts to non-linear partial differential equations to even get a bite of food into your mouth. Just because it is subconscious, it is no less of a math than proving theorems. What most humans mean by math is constructing conscious, not subconscious meta-models and using them in multiple contexts. But we subconscious meta-modeling like this all the time in other areas of human experience, so my answer to Wigner's question is "you are committing a mind projection fallacy, the apparently unreasonable effectiveness of mathematics is a statement about human mind, not about the world".

If there is no such thing as non-experienced mathematical truths, then why does everyone's experience of mathematical truths seem to be the same?

In general, however, your questions about the intuitionist approach to math is best directed to professional mathematicians who are actually intuitionists, though.

Replies from: quanticle
comment by quanticle · 2019-05-12T22:53:49.447Z · LW(p) · GW(p)

Human and animal brains do complicated calculations all the time in real time to get through life, like solving what amounts to non-linear partial differential equations to even get a bite of food into your mouth. Just because it is subconscious, it is no less of a math than proving theorems.

I agree. So if there is no "objective" reality, apart from that which we experience, then why is it that we all seem to experience the same reality? When I shoot a basketball, or hit a tennis ball, both I and the referee see the same trajectory and are in approximate agreement about where the ball lands. When I lift a piece of food to my mouth and eat it, it would surprise me if someone across the table said that they saw it spill from my fork and stain my shirt.

In the absence of an external reality, why is it that everyone's model of the world appears to be in such concordance with everyone else's?

Replies from: shminux
comment by shminux · 2019-05-12T23:48:07.453Z · LW(p) · GW(p)

So if there is no "objective" reality, apart from that which we experience, then why is it that we all seem to experience the same reality?

I am not saying that there is no objective reality, just that I am agnostic about it. In the example you describe, it is a useful meta-model, though not all the time. You may notice that, despite a video review and slow motion hi-res cameras, fans of different teams still argue about what happened, and the final decision is in the hands of a referee. You and your partner (especially ex partner) may disagree about "what really happened" and there is often no way to tell "who is right". One instead has to accept that what one person experienced is not necessarily what another did, and, at least instrumentally, arguing about whose reality is the "true" is likely to be not useful at all. One may as well accept the model where somewhat different things happened to different actors.

In the absence of an external reality, why is it that everyone's model of the world appears to be in such concordance with everyone else's?

Does it? Who won the World War II, Americans, British or Russians? Is Trump a hero or a villain? Did Elon Musk disclose material information or not in his tweets? Do mathematical infinities exist? Are the laws of physics invented or discovered? Was Jesus a son of God? The list of disagreements about "objective reality" is endless. Sure, there is some "concordance" between different people's views of the world, but it is much less strong than one naively assumes.

Replies from: quanticle
comment by quanticle · 2019-05-13T14:19:54.254Z · LW(p) · GW(p)

The examples you use reinforce my point. We argue about extremely fine details. When supporters of opposing teams argue over whether a point was or was not scored, they're disputing whether the ball was here or there by a few millimeters. You won't find very many people arguing that actually, the ball was clear on the other side of the field and in reality, the disputed point is one that would have been scored by the other team.

Similarly, we might argue about whether the British, Americans or Russians were primarily responsible for the United Nations' victory in World War 2, but I don't think you'll find very many people arguing that actually it was the Italians who won World War 2.

The fact that our perceptions of reality match each other 99.999% of the time, to me, indicates that there's something out there that exists regardless of whether I perceive it or not. I call that "reality".

Replies from: shminux
comment by shminux · 2019-05-14T02:42:37.051Z · LW(p) · GW(p)

I can see your point, and it's the one most people implicitly accept. Observations are predictable, therefore there is a shared reality out there generating those observations. It works most of the time. But in the edge cases (or "extremely fine details") this implicit assumption breaks down. Like in the case of "objective mathematical facts waiting to be discovered", such as the 98,765th of π before you measure it. So why insist on applying this assumption outside of its realm of applicability? Isn't it sort of like insisting that if you shoot a bullet from a ship moving with nearly the speed of light, it will travel faster than light?

Replies from: quanticle
comment by quanticle · 2019-05-14T03:25:08.877Z · LW(p) · GW(p)

You seem to be saying that "external shared reality" is an approximation in the same way that Newtonian mechanics is an approximation for Einsteinian relativity. That's fine. So what is "external shared reality" an approximation of? Just what exactly is out there generating inputs to my senses, and by what mechanism does it remain in sync with everyone else (approximately)?

Replies from: shminux
comment by shminux · 2019-05-14T04:28:44.108Z · LW(p) · GW(p)

Just what exactly is out there generating inputs to my senses, and by what mechanism does it remain in sync with everyone else (approximately)?

Sometimes the "out there" can be modeled as a shared reality, sure. The key word is "modeled". Sometimes this model is not a good one. If you insist on privileging one model over all others to be the true objective external reality valid everywhere, you pay the price where it fails. Like in the OP's case.

Replies from: TAG, Spire
comment by TAG · 2020-06-20T14:09:28.219Z · LW(p) · GW(p)

Having read through the above discussion, I don't think you have distinguished between the claim that there are mathematical entities, and the claim that there are mathematical facts. The latter can mean nothing more than different mathematicians will find the same solutions to a given problem, which you accept. Call the second claim epistemological realism, and the first metaphysical realism. To argue that convergence on a set of facts can only be, or be explained by, form of metaphysical realism is to give to much credence to realism. Metaphysical realism about mathematical entities , Platonism, is much more controversial than realism about physical bodies.

comment by Spire · 2020-06-20T10:19:42.375Z · LW(p) · GW(p)

"Sometimes this model is not a good one."

What do you mean by "good" here? And, given some definitiin of good, what alternative model is better in that sort of situation?

Replies from: shminux
comment by shminux · 2020-06-20T16:06:46.776Z · LW(p) · GW(p)

By "good" I mean (as always) "fitting the available observations and producing accurate predictions". In the OP's case of the 98,765th digit of π, the model is that "A randomly picked digit is uniformly distributed" and it is a "good" (i.e. accurate) one.

Replies from: TAG
comment by TAG · 2020-06-20T18:18:39.201Z · LW(p) · GW(p)

The 98,765th digit of π

..isn't a random digit, it's the 98,765th digit.

There's a puzzle about how probability theory would apply would apply to something that's basically determinate, but the question of how randomly selected digits of pi are distributed isn't it, because the process of picking a digit randomly bring indeterminacy in.

People pose the problem with a specific digit to make the problem determinate, and focus on the paradoxical aspect.

Replies from: shminux
comment by shminux · 2020-06-20T20:48:19.447Z · LW(p) · GW(p)

The paradox only arises if you ignore the view I've been presenting. The 98,765th digit of π is a random digit in the same way that a 98,765th reading of rand() is. Until you do some work to measure it, it's not determined.

Replies from: TAG
comment by TAG · 2020-06-20T20:55:00.759Z · LW(p) · GW(p)

It is determined in the sense of having only one possible value. The same applies to a call to rand() ,so long as it is a deterministic PRNG. We don't know what the answer is , until we have done some work, in either case, but that doesn't mean anything indeterministic is going on. Determinism is defined in terms of inevitability, ie. lack of possible alternatives. We do not regard the future as undeterminedjust because it has not happened yet.

Replies from: shminux
comment by shminux · 2020-06-20T22:30:45.628Z · LW(p) · GW(p)
Determinism is defined in terms of inevitability, ie. lack of possible alternatives. We do not regard the future as undetermined just because it has not happened yet.

I don't argue with that, in fact, the statement above makes my point: there is no difference between an as-yet-unknown to you (but predetermined) digit of pi and anything else that is not yet known to you, like the way a coin lands when you flip it.

Replies from: TAG
comment by TAG · 2020-06-20T23:11:54.096Z · LW(p) · GW(p)

It doens't make your point, since I don't agree with it.

Given any degree of realism, you can differentiate between determined but unknown things and undetermined things.

Well, you're an anti realist. But that doesn't give you the right to interpret what other people, if there are any other people, are saying in anti-realist terms.

Replies from: shminux
comment by shminux · 2020-06-20T23:33:27.146Z · LW(p) · GW(p)

Right, never mind, for a moment what your discourse style is. Disengaging.