Chance is in the Map, not the Territory

post by Daniel Herrmann (Whispermute), ben_levinstein (benlev), Aydin Mohseni (aydin-mohseni) · 2025-01-13T19:17:15.843Z · LW · GW · 4 comments

Contents

  Two Ways to Deal with Chance
  The Key Insight: Symmetries in Our Beliefs
  The Magic of de Finetti
  De Finetti in Practice
    1. Weather Forecasting
    2. Clinical Trials
    3. Machine Learning
  Why This Matters
  Common Objections and Clarifications
  Quick Recap
None
4 comments

"There's a 70% chance of rain tomorrow," says the weather app on your phone. "There’s a 30% chance my flight will be delayed," posts a colleague on Slack. Scientific theories also include chances: “There’s a 50% chance of observing an electron with spin up,” or (less fundamental) “This is a fair die — the probability of it landing on 2 is one in six.”

We constantly talk about chances and probabilities, treating them as features of the world that we can discover and disagree about. And it seems you can be objectively wrong about the chances. The probability of a fair die landing on 2 REALLY is one in six, it seems, even if everybody in the world thought otherwise. But what exactly are these things called “chances”?

Readers on LessWrong are very familiar with the idea that many probabilities are best thought of as subjective degrees of belief. This idea comes from a few core people, including Bruno de Finetti. For de Finetti, probability was in the map, not the territory.

But perhaps this doesn’t capture how we talk about chance. For example, our degrees of belief need not equal the chances, if we are uncertain about the chances [LW · GW].  But then what are these chances themselves? If we are uncertain about the bias of a coin, or the true underlying distribution in some environment, then we can use our uncertainty over those chances to generate our subjective probabilities over what we’ll observe.[1] But then we have these other probabilities — chances, distributions, propensities, etc. — to which we are assigning probabilities. What are these things?

Here we’ll show how we can keep everything useful about chance-based reasoning while dropping some problematic metaphysical assumptions. The key insight comes from work by, once again, de Finetti. De Finetti’s approach has been fleshed out in detail by Brian Skyrms. We’ll take a broadly Skyrmsian perspective here, in particular as given in his book Pragmatics and Empiricism. The core upshot is that we don't need to believe in chances as real things "out there" in the world to use chance effectively. Instead, we can understand chance through patterns and symmetries in our beliefs.

Two Ways to Deal with Chance

When philosophers and scientists have tried to make sense of chance, they've typically taken one of two approaches. The first tries to tell us what chance IS – maybe it's just long-run frequency, or maybe it's some kind of physical property like mass or charge. Or maybe it is some kind of lossy compression of information. The second approach, which we'll explore here, asks a different question: what role does chance play in our reasoning, and can we fulfill that role without assuming chances exist?

Let's look (briefly) at why the first approach is problematic. Frequentists say chance is just long-run frequency:[2] The chance of heads is 1/2 because in the long run, about half the flips will be heads. But this has issues. What counts as "long run"? What if we never actually get to infinity? And how do we handle one-off events that can't be repeated?[3]

Others say chance is a physical property – a "propensity" of systems to produce certain outcomes. But this feels suspiciously like adding a mysterious force [LW · GW] to our physics.[4] When we look closely at physical systems (leaving quantum mechanics aside for now), they often seem deterministic: if you could flip a coin exactly the same way twice, it would land the same way both times.

The Key Insight: Symmetries in Our Beliefs

To see how this second approach works in a more controlled setting, imagine an urn containing red and blue marbles. Before drawing any marbles, you have certain beliefs about what you'll observe. You might think the sequence "red, blue, red" is just as likely as "blue, red, red"—the order doesn't matter, but you can learn from the observed frequencies of red and blue draws.

This symmetry in your beliefs—that the order doesn't matter—is called exchangeability. As you observe more draws, updating your beliefs each time, you develop increasingly refined expectations about future draws. The key insight is that you're not discovering some "true chance" hidden in the urn. Instead, de Finetti showed that when your beliefs have this exchangeable structure, you'll naturally reason as if there were underlying chances you were learning about in a Bayesian way—even though we never needed to assume they exist.[5]

This is different from just saying the draws are independent. If they were truly independent, seeing a hundred red marbles in a row wouldn't tell you anything about the next draw. But this isn't how we actually reason! Seeing mostly red marbles leads us to expect more red draws in the future. Exchangeability captures this intuition: we can learn from data while maintaining certain symmetries in our beliefs.

The Magic of de Finetti

De Finetti showed something remarkable: if your beliefs about a sequence of events are exchangeable, then mathematically, you must act exactly as if you believed there was some unknown chance governing those events. In other words, exchangeable beliefs can always be represented as if you had beliefs about chances – even though we never assumed chances existed!

For Technical Readers: De Finetti's theorem shows that any exchangeable probability distribution over infinite sequences can be represented as a mixture of i.i.d. distributions. Furthermore, as one observes events in the sequence and updates one’s probability over events via Bayes’ rule, this corresponds exactly to updating one’s distribution over chance distributions via Bayes’ rule, and then using that distribution over chances to generate the probability of the next event. This means you can treat these events as if there's an unknown parameter (the "chance")—even though we never assumed such a parameter exists.

Let's see how this works in practice. When a doctor says a treatment has a "60% chance of success", traditionally we might think they're describing some real, physical property of the treatment. But in the de Finetti view, they're expressing exchangeable beliefs about patient outcomes—beliefs that happen to be mathematically equivalent to uncertainty about some "true" chance. The difference? We don't need to posit any mysterious chance properties. In this situation, since the doctor says it is 60%, she has probably observed enough outcomes (or reports of outcomes) that her posterior in the chance representation is tightly concentrated near 0.6.

De Finetti in Practice

This perspective transforms how we think about evidence and prediction across many domains:

1. Weather Forecasting

When your weather app says "70% chance of rain," it's not measuring some metaphysical "rain chance" property. It's expressing a pattern of beliefs about similar weather conditions that have proven reliable for prediction. Just like in the urn or medical examples, each new bit of data refines the forecast, and the weather model used by the app updates its probability estimates accordingly. This is true even though we sometimes talk about weather as being chaotic, or unpredictable. That is a statement about us, about our map, not the territory.[6]

2. Clinical Trials

This same pattern of learning applies in medical trials—though the stakes are far higher than drawing marbles. When a doctor says a treatment has a "60% chance of success" they're not measuring some fixed property of the drug. Instead, they're summarizing a learning process that starts with exchangeable beliefs about patient outcomes, whose representation as a mixture over chances ended up concentrating around 0.6.

Think of how researchers approach a new treatment. Before any trials, they treat each future patient's potential outcome as exchangeable—so "success, failure, success" is considered no more or less likely than "failure, success, success." As they observe real outcomes, each success or failure refines their model of the treatment's effectiveness, pushing their estimated success rate up or down accordingly. Just like with the urn, they're not discovering a true success rate hidden in the drug; they're building and refining a predictive model.

Crucially, this is different from treating outcomes as independent. If patient outcomes were truly independent, for the researchers, then seeing the treatment work in a hundred patients wouldn't affect their expectations for the hundred-and-first. But that's not how clinical knowledge works—consistent success makes doctors more confident in recommending the treatment. In other words, they're updating their map of the world, not uncovering a territory fact about the drug.

This exchangeable approach to patient outcomes captures how we actually learn from clinical data while maintaining certain symmetries in our beliefs—giving us all the practical benefits of "chances" without positing them as objective properties in the world.[7]

3. Machine Learning

When we train models on data, we often assume that the data points are “i.i.d.” (independent and identically distributed). From a de Finetti perspective, this i.i.d. assumption can be seen as an expression of exchangeable beliefs rather than a literal statement about the world. If you start with an exchangeable prior—meaning you assign the same probability to any permutation of your data—then de Finetti’s Representation Theorem says you can treat those observations as if they were generated i.i.d. conditional on some unknown parameter. In other words, you don’t need reality to be i.i.d.; you simply need to structure your beliefs in a way that allows an “as if” i.i.d. interpretation.

This means that when an ML practitioner says, “Assume the data is i.i.d.,” they’re effectively saying, “I have symmetrical (exchangeable) beliefs about the data-generating process.” As new data arrives, you update your posterior on an unknown parameter—much like the urn or medical examples—without ever needing to claim there’s a literal, unchanging probability distribution out there in the territory. Instead, you’ve adopted a coherent, Bayesian viewpoint that models the data as i.i.d. from your perspective, which is enough to proceed with standard inference and learning techniques from statistics and machine learning.

Furthermore, the de Finetti perspective might help shed light on what is going on inside transformers. Some initial attempts have been made to do this rigorously, though we haven’t worked carefully through them, so we can’t ourselves yet fully endorse them. In general, the de Finetti approach seems to vindicate the intuition that a system that is trained to predict observable variables/events might use a latent variable approach to do so, which of course we see empirically in many ways. Furthermore, it might suggest failure modes of AI systems. Just as humans have reified chances in certain ways, so too might AI systems reify certain latents. This is speculative, and we don’t want the scope of this post to bloat too much, but we it think deserves some thought.

We also suspect that there are connection to Wentworth and Lorell’s Natural Latents [LW · GW] and how they hope to apply it to AI, but looking at the connections in a serious way should be a separate post.

Why This Matters

This approach aligns perfectly with the rationalist emphasis on "the map is not the territory [LW · GW]." Like latitude and longitude, chances are helpful coordinates on our mental map, not fundamental properties of reality. When we say there's a 70% chance of rain, we're not making claims about mysterious properties in the world. Instead, we're expressing beliefs that have certain symmetries, beliefs that let us reason effectively about patterns we observe.

This perspective transforms how we think about statistical inference. When a scientist estimates a parameter or tests a hypothesis, they often talk about finding the "true probability" or "real chance." But now we can see this differently: they're working with beliefs that have certain symmetries, using the mathematical machinery of chance without needing to believe in chances as real things.

Common Objections and Clarifications

"But surely," you might think, "when we flip a fair coin, there really IS a 50% chance of heads!" The pragmatic response is subtle: we're not saying chances don't exist (though the three of us do tend to lean that way). Instead, we're saying we don't need them to exist to vindicate our reasoning. It works just as well if we have exchangeable beliefs about coin flips. The "50% chance" emerges from the symmetries in our beliefs, not from some metaphysical property of the coin.

Some might ask about quantum mechanics, which famously involves probabilities at a fundamental level. Even here, the debate about whether wave function collapse probabilities are "real" or just a device in our predictive models is ongoing. The pragmatic perspective can be extended into interpretations of quantum mechanics, but that's a bigger topic for another post.[8]

Quick Recap

Three key takeaways:

  1. We can talk about chance in purely pragmatic terms.
  2. Exchangeability and de Finetti's theorem show we lose nothing in predictive power.
  3. This viewpoint integrates well with Bayesian rationality and the "map vs. territory" framework.

 

  1. ^
  2. ^
  3. ^

    Also, the limiting relative frequency doesn’t change if we append any finite number of flips to the front of the sequence, which can mess up inference we try to make in the short to medium to even very long run. In general there are other issues like this, but we’ll keep it brief here.

  4. ^

    Of course, chances do play a role in inference, so they do constrain expectations. This makes them not the worst kind of mysterious answer. The upshot of the de Finetti theorem is the sifting the useful part of chance from the mysterious. This allows us to use chance talk, without reifying chance.

  5. ^

    There are generalizations of exchangeability, such as partial exchangeability and Markov exchangeability. For exposition, and since it is a core case, we focus here on the basic exchangeability property.

  6. ^

    Of course, there are sophisticated ways to try to bridge this gap, by showing that for a certain class of agents, certain dynamics will render an environment only predictable up to a certain degree.

  7. ^

    There is also a deep way in which the de Finetti perspective can help us make sense of randomized control trials.

  8. ^

    Although it is worth noting that many theories of quantum mechanics— in particular, Everettian and Bohmian quantum mechanics—are perfectly deterministic. Here is a summary of why Everett wanted a probability-free theory—the core idea is that most versions of QM that make reference to chances do so via measurement-induced collapses, which leads into the measurement problem. We think the genuinely chancey theory that is most likely to pan out is something like GRW, which doesn’t have measurement as a fundamental term in the theory. Jeff Barrett’s The Conceptual Foundations of Quantum Mechanics has greatly informed our views on QM, and is a great in-depth introduction.

4 comments

Comments sorted by top scores.

comment by AnthonyC · 2025-01-13T22:59:44.391Z · LW(p) · GW(p)

One pet peeve of mine is that actual weather forecasts for the public don't disambiguate interpretations of rain chance. Is it the chance of any rain at some point in that day or hour? Is it the expected proportion of that day or hour during which it will be raining?

comment by transhumanist_atom_understander · 2025-01-13T20:33:00.624Z · LW(p) · GW(p)

Yes, de Finetti's theorem shows that if our beliefs are unchanged by exchanging members of the sequence, that's mathematically equivalent to having some "unknown probability" that we can learn by observing the sequence.

Importantly, this is always against some particular background state of knowledge, in which our beliefs are exchangeable. We ordinary have exchangeable beliefs about coin flips, for example, but may not if we had less information (such as not knowing they're coin flips) or more (like information about the initial conditions of each flip).

In my post on unknown probabilities [LW · GW], I give more detail on how they are definedc which turns out to involve a specific state of background knowledge, so they only act like a "true" probability relative to that background knowledge. And how they can be interpreted as part of a physical understanding of the situation.

Personally, rather than observing that my beliefs are exchangeable and inferring an unknown probability as a mathematical fiction, I would rather "see" the unknown probability directly in my understanding of the situation, as described in my post.

comment by winstonBosan · 2025-01-13T19:40:10.634Z · LW(p) · GW(p)

Great stuff! I don't have strong fundamentals in math and statistics but I was still able to hobble along and understand the post. It reminds me of what Rissanen said about data/observation - that data is really all we have, and there is no true state of nature. Our job is to squeeze as much alpha out of observation as possible, instead of trying to find a "true" generator function. This post hit the same spot for me :)

comment by Knight Lee (Max Lee) · 2025-01-14T08:02:51.535Z · LW(p) · GW(p)

Another example: an AI risk skeptic might say that there is only a 10% chance ASI will emerge this decade, there is only a 1% chance the ASI will want to take over the world, and there is only a 1% chance it'll be able to take over the world. Therefore, there is only a 0.001% chance of AI risk this decade.

However he can't just multiply these probabilities since there is actually a very high correlation between them. Within the "territory," these outcomes do not correlate with each other that much, but within the "map," his probability estimates are likely to be wrong in the same direction.

Since chance is in the map and not the territory, anything can "correlate" with anything.

PS: I think not all uncertainty is in the map rather than the territory. In indexical uncertainty [LW · GW], one copy of you will discover one outcome and another copy of you will discover another outcome. This actually is a feature of the territory.