# Foundations of Probability

post by Manfred · 2014-01-26T19:29:42.378Z · LW · GW · Legacy · 19 comments## Contents

Beginning of: Logical Uncertainty sequence Books Foundation Theorem Axioms None 19 comments

**Beginning of:** Logical Uncertainty sequence

Suppose that we are designing a robot. In order for this robot to reason about the outside world, it will need to use probabilities.

Our robot can then use its knowledge to acquire cookies, which we have programmed it to value. For example, we might wager a cookie with the robot on the motion of a certain stock price.

In the coming sequence, I'd like to add a new capability to our robot. It has to do with how the robot handles very hard math problems. If we ask "what's the last digit of the 3^^^3'th prime number?", our robot should at some point *give up*, before the sun explodes and the point becomes moot.

If there are math problems our robot can't solve, what should it do if we offer it a bet about the last digit of the 3^^^3'th prime? It's going to have to approximate - robots need to make lots of approximations, even for simple tasks like finding the strategy that maximizes cookies.

Intuitively, it seems like if we can't find the real answer, the last digit is equally likely to be 1, 3, 7 or 9; our robot should take bets as if it assigned those digits equal probability. But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1. When we learn more, it will become clear that this is a problem - we aren't ready to upgrade our robot yet.

Let's begin with a review of the foundations of probability.

What I call foundations of probability are arguments for why our robot should ever want to use probabilities. I will cover four of them, ranging from the worldly ("make bets in the following way or you lose money") to the ethereal ("here's a really elegant set of axioms"). To use the word "probability" to describe the subject of such disparate arguments can seem odd, but keep in mind the naive definition of probability as that number that's 1/6 for a fair die rolling 6 and 30% for clear weather tomorrow.

**Dutch Books**

The concretest of concrete foundations is the Dutch book arguments. A Dutch book is a collection of bets that is certain to lose you money. If you violate the rules of probability, you'll agree to these certain-loss bets (or not take a certain-win bet).

For example, if you think that each side of the coin has a 55% chance of showing up, then you'll pay $1 for a bet that pays out $0.98 if the coin lands heads and $0.98 if the coin lands tails. If taking bets where you're guaranteed to lose is bad, then you're not allowed to have probabilities for mutually exclusive things that sum to more than 1.

Similar arguments hold for other properties of probability. If your probabilities for exhaustive events add up to less than 1, you'll pass up free money, which is bad. If you disobey the sum rule or the product rule, you'll agree to a guaranteed loss, which is bad, etcetera. Thus, say the Dutch book arguments, our probabilities have to behave the way they do because we don't want to take guaranteed losses or pass up free money.

There are many assumptions underlying this whole scenario. Our agent in these arguments already tries to decide using probability-like numbers, all we show is that the numbers have to follow the same rules as probabilities. Why can't our agent follow a totally different method of decision making, like picking randomly or alphabetization?

One can show that e.g. picking randomly will sometimes throw away money. But there is a deeper principle here: an agent that wants to avoid throwing away money or passing up free money has to act *as if* it had numbers that followed probability-rules, and that's a good enough reason for our agent to have probabilities.

Still, some people dislike Dutch book arguments because they focus on an extreme scenario where a malicious bookie is trying to exploit our agent. To avoid this, we'll need a more abstract foundation.

You can learn more about Dutch book arguments here and here.

**Savage's Foundation**

Leonard Savage formulated a basis for decision-making that is sort of a grown-up version of Dutch book arguments. From seven desiderata, none of which mention probability, he derived that an agent that wants to act consistently will act as if it had probabilistic beliefs.

What are the desiderata about, if not probability? They define an agent that has preferences, and is able to take actions, which are defined as things that lead to outcomes, and can lead to different outcomes depending on external possibilities in event-space. They require that the agent's actions be consistent in commonsensical ways. These requirements are sufficient to show that assigning probabilities to the external events is the best way to do things.

Savage's theorem provides one set of conditions for when we should use probabilities. But it doesn't help us choose which probabilities to assign - anything consistent works. The idea that probabilities are degrees of belief, and that they are derived from some starting information, is left to our next foundation.

You can learn more about Savage's foundation here.

**Cox's Theorem**

Cox's theorem is a break from justifying probabilities with gambling. Rather than starting from an agent that wants to achieve good outcomes, and showing that having probabilities is a good idea, Richard Cox started with desired properties of a "degree of plausibility," and showed that probabilities are what a good belief-number should be.

One special facet of Cox's desiderata is that they refer to plausibility of an event, given your information - what will eventually become P(event | information).

There are six or so desiderata, but I think there are three interesting ones: When you're completely certain, your plausibilities should satisfy the rules of classical logic. Every rational plausibility has at least one event with that plausibility. P(A and B|X) can be found as a function of P(A|X) and P(B|A and X).

These desiderata are a motley assortment. The desideratum that there's an infinite variety of events is the most strange, but it is satisfied if our universe contains a continuous random process or if we can flip a coin as many times as we want. If the desiderata obtain, Cox's theorem shows that we can give pretty much any belief a probability. The perspective of Cox's theorem is useful because it lets us keep talking straightforwardly about probabilities even if betting or decision-making has become nontrivial.

You can learn more about Cox's theorem in the first two chapters of Jaynes here (in fact, the next few posts are parallel to the first two chapters of Jaynes), and also here. Jaynes includes an additional desideratum in this foundation, which we will cover in the next post.

**Kolmogorov Axioms**

At the far extreme of abstraction, we have the Kolmogorov axioms for probability. Here they are:

P(E) is a non-negative real number, E is an event that belongs to event-space F.

P(some event occurs)=1.

Any countable sequence of disjoint events (E1, E2...) satisfies P(E1 or E2 or...) = sum of all the P(E).

Though it was not their intended purpose, these can be seen as a Cox-style list of desiderata for degrees of plausibility. Their main virtue is that they're simple and handy to mathematicians who like set theory.

You can learn more about Kolmogorov's axioms here.

Look back at our robot trying to bet on the 3^^^3'th prime number. Our robot has preferences, so it can be Dutch booked. Its reward depends on the math problem and we want it to act consistently, so Savage's theorem applies. Cox's theorem applies if we allow our robot to make combined bets on math and dice. It even seems like the Kolmogorov axioms should hold. Resting upon these foundations, our robot should assign numbers to mathematical statements, and they should behave like probabilities.

But we can't get specific about that, because we have a problem - we don't know how to actually find the numbers yet. Our foundations tell us that the probabilities of the two sides of a coin will add to 1, but they don't care whether P(heads) is 0.5 or 0.99999. If Dutch book arguments can't tell us that a coin lands heads half the time, what can? Tune in next time to find out.

First post in the sequence *Logical Uncertainty*

Next post: Putting in the Numbers

## 19 comments

Comments sorted by top scores.

## comment by Wei_Dai · 2014-01-28T23:05:13.003Z · LW(p) · GW(p)

They [Savage's axioms] require that the agent's actions be consistent in commonsensical ways.

This seems to be a common "overselling" of Savage's ideas (and other axiomatic approaches to decision theory / probability). In order to decide that the axioms apply, you really need to understand them in detail rather than just accept that they are commonsensical.

It appears for example that they don't apply when indexical uncertainty is involved, and that seems to be why people got nowhere trying to solve problems like Absentminded Driver and Sleeping Beauty while keeping the basic subjective probability framework intact. Ironically, the original paper that spawned off this whole literature actually noted that Savage's axioms don't apply:

Another resolution would entail the rejection of expected utility maximization given consistent beliefs when the information set includes histories whose probabilities depend on the decision maker’s actions at that information set. Savage’s theory views a state as a description of a scenario which is independent of the act. In contrast, ‘‘being at the second intersection’’ is a state which is not independent from the action taken at the first, and, consequently, at the second intersection.

Note that I'm not saying that logical uncertainty *shouldn't* be handled using probabilities, just that the amount of work shown in this post seems way too low to determine that it should. Also, rather than trying to determine how to handle logical uncertainty using a foundational approach, we can just try various methods and see what works out in the end, and I'm not arguing against that either.

## ↑ comment by Manfred · 2014-01-30T08:15:48.288Z · LW(p) · GW(p)

Okay, I've changed the Savage's theorem entry to specifically call out that actions are defined as the things that lead to outcomes, and can lead to different outcomes depending on external possibilities in event-space. If that stops being true (e.g. if the outcome depends on something not in our external event-space, like which strategy you use), Savage's theorem no longer applies (at least not to those objects, it still might apply to e.g. strategies that lead to different outcomes depending only on external possibilities in event-space).

## ↑ comment by Manfred · 2014-01-29T04:11:41.558Z · LW(p) · GW(p)

This seems to be a common "overselling" of Savage's ideas (and other axiomatic approaches to decision theory / probability). In order to decide that the axioms apply, you really need to understand them in detail rather than just accept that they are commonsensical.

Ok, I'll work on making that more precise. Also, "consistent in commonsesnsical ways" is not the same as "commonsensical." We'll see why that's important in two posts.

Note that I'm not saying that logical uncertainty shouldn't be handled using probabilities, just that the amount of work shown in this post seems way too low to determine that it should.

I'd agree, especially since we are still two posts away from seeing the actual problem of logical uncertainty.

I seem to have promised you unrealistic payoff - probably because I didn't think I could keep peoples' interest by just talking about the foundations of probability for a while before any promise of payoff. Ditto for summarizing and then putting in links for people who want more rather than quoting all the definitions, desiderata, and proofs of key results.

## comment by Watercressed · 2014-01-27T19:59:44.059Z · LW(p) · GW(p)

How do we assign zero probability to 0=1 when we can't prove our logic consistent?

Replies from: Manfred## ↑ comment by Manfred · 2014-01-27T21:52:25.958Z · LW(p) · GW(p)

For this sequence we'll just stick to statements in first-order logic, thus avoiding the issue. If you want to look deeper into how consistency interacts with logical probability, you should check out MIRI's stuff on assigning probabilities to classically undecidable statements.

## comment by tristanhaze · 2014-01-30T01:24:24.393Z · LW(p) · GW(p)

Pardon a second comment (I hope that's not bad etiquette), but here are a couple of further qualms/criticisms attending to which could improve the post:

Regarding your use of the phrase 'foundations of probability' to refer to arguments for why a certain kind of robot should use probabilities: this seems like a rather odd use for a phrase that already has at least two well established uses. (Roughly (i) basic probability theory, i.e. that which gives a grounding or foundation in learning the subject, and (ii) the philosophical or metaphysical underpinnings of probability discourse: what's it about, what kinds are there, what makes true probability claims true etc.?) Is it really helpful to be different on this point, when there is already considerable ambiguity?

Furthermore, and perhaps more substantively, your bit on Dutch Books doesn't seem to give any foundations *in your sense*: Dutch Book arguments aren't arguments for using probability (i.e. at all, i.e. instead of not using it), but rather for conforming, when already using probability, to the standard probability calculus. So there seems to be a confusion in your post here.

## ↑ comment by Manfred · 2014-01-30T07:06:59.432Z · LW(p) · GW(p)

Pardon a second comment (I hope that's not bad etiquette)

Well, if they're right after each other you can always use the "edit" button to add to your original comment.

I'm going to stick with this terminology just because I like it - it won't be important later. Also, I blame Sniffoy, for calling his post "A Summary of Savage's Foundations for Probability and Utility." :P

I claim that I do cover why Dutch books do provide a foundation to some extent, but I agree that Savage's theorem is a better way to base probability upon decision-making.

## comment by tristanhaze · 2014-01-30T01:08:15.401Z · LW(p) · GW(p)

'But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1.'

Huh? This doesn't make sense to me. First of all, it seems like a basic category-mistake: acts of assigning probabilities don't seem to be the sorts of things that can bear logical relations like equivalence to each other.

Perhaps that's just pedantry and there's a simple rephrasing that says what you really want to say, but I have a feeling I would take issue with the rephrased version too. Does it trade on the idea that all false mathematical propositions are logically equivalent to each other? (If so, I'd say that's a problem, because that idea is very controversial, and hardly intuitive.)

Replies from: Manfred## comment by alex_zag_al · 2014-02-02T05:09:49.253Z · LW(p) · GW(p)

I like your writing style. For something technical, it feels very personal. And you keep it very concise while also easy to read - is there a lot of trimming down that goes on, or do you just write it that way?

Replies from: Manfred## ↑ comment by Manfred · 2014-02-02T05:51:24.547Z · LW(p) · GW(p)

Thanks! The content stayed pretty much the same throughout the editing process, but I sanded down some of the rough writing - removing useless words and rewriting confusing paragraphs. I'm a much worse writer when not given a week in advance.

## comment by Kurros · 2014-01-30T04:19:13.477Z · LW(p) · GW(p)

"But to assign some probability to the wrong answer is logically equivalent to assigning probability to 0=1."

Only if you know it is the wrong answer. You say the robot doesn't know, so what's the problem? We assign probabilities to propositions which are wrong all the time, before we know if they are wrong or not.

Replies from: Manfred## ↑ comment by Manfred · 2014-01-30T07:02:57.540Z · LW(p) · GW(p)

what's the problem?

I'll tell you on Saturday!

Replies from: Kurros, Kurros## ↑ comment by Kurros · 2014-02-03T03:19:46.896Z · LW(p) · GW(p)

Was the "Putting in the Numbers" post the one you were referring to? You didn't post that on Saturday, but now it is Monday and there doesn't seem be a third post. Anyway I did not see this question answered anywhere in "Putting in the Numbers"...

Replies from: Manfred## ↑ comment by Kurros · 2014-01-30T08:10:50.113Z · LW(p) · GW(p)

Ok, but do you really mean that sentence how it is written? To me it means the same thing as saying that assigning probability to anything is logically equivalent to assigning probability to 0=1 (which I am perfectly happy to do so if that is the point then fine, but that doesn't seem to be your implication)