What is a probabilistic physical theory?

post by Ege Erdil (ege-erdil) · 2021-12-25T16:30:27.331Z · LW · GW · 16 comments

This is a question post.

Contents

  Answers
    4 davidad
    4 Jon Garcia
    3 davidad
    2 dkirmani
    1 DaemonicSigil
None
16 comments

This is a question I asked on Physics Stack Exchange a while back, and I thought it would be interesting to hear people's thoughts on it here. You can find the original question here.

What do we mean when we say that we have a probabilistic theory of some phenomenon?

Of course, we know from experience that probabilistic theories "work", in the sense that they can (somehow) be used to make predictions about the world, they can be considered to be refuted under appropriate circumstances and they generally appear to be subject to the same kinds of principles that govern other kinds of explanations of the world. The Ising model predicts the ferromagnetic phase transition, scattering amplitude computations of quantum field theories predict the rates of transition between different quantum states, and I can make impressively sharp predictions of the ensemble properties of a long sequence of coin tosses by using results such as the central limit theorem. Regardless, there seem to be a foundational problem at the center of the whole enterprise of probabilistic theorizing - the construction of what is sometimes called "an interpretation of the probability calculus" in the philosophical literature, which to me seems to be an insurmountable problem.

A probabilistic theory comes equipped with an event space and a probability measure attached to it, both of which are fixed by the theory in some manner. However, the probability measure occupies a strictly epiphenomenal position relative to what actually happens. Deterministic theories have the feature that they forbid some class of events from happening - for instance, the second law of thermodynamics forbids the flow of heat from a cold object to a hot object in an isolated system. The probabilistic component in a theory has no such character, even in principle. Even if we observed an event of zero probability, formally this would not be enough to reject the theory; since a set of zero probability measure need not be empty. (This raises the question of, for instance, whether a pure quantum state in some energy eigenstate could ever be measured to be outside of that eigenstate - is this merely an event of probability , or is it in fact forbidden?)

The legitimacy of using probabilistic theories then rests on the implicit assumption that events of zero (or sufficiently small) probability are in some sense negligible. However, it's not clear why we should believe this as a prior axiom. There are certainly other types of sets we might consider to be "negligible" - for instance, if we are doing probability theory on a Polish space, the collection of meager sets and the collection of null measure sets are both in some sense "negligible", but these notions are in fact perpendicular to each other: can be written as the union of a meager set and a set of null measure. This result forces us to make a choice as to which class of sets we will neglect, or otherwise we will end up neglecting the whole space !

Moreover, ergodic theorems (such as the law of large numbers) which link spatial averages to temporal averages don't help us here, even if we use versions of them with explicit estimates of errors (like the central limit theorem), because these estimates only hold with a probability for some small , and even in the infinite limit they hold with probability , and we're back to the problems I discussed above. So while these theorems can allow one to use some hypothesis test to reject the theory as per the frequentist approach, for the theory to have any predictive power at all this hypothesis test has to be put inside the theory.

The alternative is to adopt a Bayesian approach, in which case the function of a probabilistic theory becomes purely normative - it informs us about how some agent with a given expected utility should act. I certainly don't conceive of the theory of quantum mechanics as fundamentally being a prescription for how humans should act, so this approach seems to simply define the problem out of existence and is wholly unsatisfying. Why should we even accept this view of decision theory when we have given no fundamental justification for the use of probabilities to start with?

Answers

answer by davidad · 2021-12-26T16:53:51.658Z · LW(p) · GW(p)

Taking another shot at what the fundamental question is: a normative theory tells us something about how agents ought to behave, whereas a descriptive theory tells us something about what is; physical theories seem to be descriptive rather than normative, but when they're merely probabilistic, how can probabilities tell us anything about what is?

The idea that a descriptive theory tells us about "what really is" is rooted in the correspondence theory of truth, and deeper in a generally Aristotelian metaphysics and logic which takes as a self-evident first-principle the Law of Excluded Middle (LEM), that "of one subject we must either affirm or deny any one predicate". Even if a probabilistic theory enables us to affirm the open sets of probability 1, and to deny the open sets of probability 0, the question remains: how can a probabilistic theory "tell us" anything more about what really is? What does "a probability of 0.4" correspond to in reality?

If we accept LEM wholesale in both metaphysics (the domain of what is) and logic (for my purposes, the normative characterization of rational speech), then our descriptive theories are absolutely limited to deterministic ones. For any metaphysical proposition P about reality, either P actually is or P actually is not; "P actually is" is a logical proposition Q, and a rational speaker must either affirm Q or deny Q, and he speaks truth iff his answer agrees with what actually is. To accommodate nondeterministic theories, one must give way either in the metaphysics or the logic.

This is so pragmatically crippling that even Aristotle recognized it, and for propositions like "there will be a sea-battle tomorrow", he seems to carve out an exception (although what exactly Aristotle meant in this particular passage is the subject of embarrassingly much philosophical debate). My interpretation is that he makes an exception on the logical side only, i.e. that a rational speaker may not be required to affirm or deny tomorrow's sea-battle, even though metaphysically there is an actual fact of the matter one way or the other. If the rational speaker does choose either to affirm or to deny tomorrow's sea-battle, then the truth of his claim is determined by its correspondence with the actual fact (which presumably will become known soon). My guess is that you'd be sympathetic to this direction, and that you're willing to go further and get on board with probabilistic logic, but then your question is: how could a probabilistic claim like "with probability 0.4, there will be a sea-battle tomorrow" conceivably have any truth-making correspondence with actual facts?

A similar problem would arise for nondeterminism if someone said "it is indeterminate whether there will be a sea-battle tomorrow": how could that claim correspond, or fail to correspond, to an actual fact? However, we can adopt a nondeterministic theory and simply refuse to answer, and then we make no claim to judge true or false, and the crisis is averted. If we adopt a probabilistic theory and try the same trick, refusing to answer about when its probability is , then we can say exactly as much as the mere nondeterminist who knows only our distribution's support—in other words, not very much (especially if we thoroughly observe Cromwell's Rule). We have to be able to speak in indeterminate cases to get more from probabilistic theories than merely nondeterministic theories.

The metaphysical solution (for the easier case of nondeterminism) is Kripke's idea of branching time, where possible worlds are reified as ontologically real, and the claim "it is indeterminate whether there's a sea-battle tomorrow" is true iff there really is a possible future world where there is a sea-battle tomorrow and another possible future world where there isn't. Kripke's possible-world semantics can be naturally extended to the case where there is a probability measure over possible successor worlds, and "with probability 0.4, there will be a sea-battle tomorrow" is made true by the set of {possible future worlds in which a sea battle takes place tomorrow} in fact having measure exactly 2/3 that of the set of {other possible future worlds}. But there are good epistemological reasons to dislike this metaphysical move. First, the supposed truthmakers are, as you point out, epiphenomenal—they are in counterfactual worlds, not observable even in principle, so they fail Einstein's criterion for reality. Second, some people can be better-informed about uncertain events than others, even if both of their forecasts are false in this metaphysical sense—as would almost surely always be the case if, metatheoretically, the "actual" probabilities are continuous quantities. The latter issue can be mitigated by the use of credal sets, a trick I learned from Definability of Truth by Christiano, Yudkowsky, et al.; we can say a credal set is made true by the actual probability lying within it. But still, one credal set can be closer to true than another.

The epistemological solution, which I prefer, is to transcend the paradigm that rational claims such as those about probabilities must be made true or false by their correspondence with some facts about reality. Instead of being made true or false, claims accrue a quantitative score based on how surprised they are by actual facts (as they appear in the actual world, not counterfactual worlds). With the rule , if you get the facts exactly right, you score zero points, and if you deny something which turns out to be a fact, you score points. In place of the normative goal of rational speech to say claims that are true, and the normative goal of rational thought to add more true claims to your knowledge base, the normative goals are to say and believe claims that are less wrong. Bayesian updating, and the principle of invariant measures, and the principle of maximum entropy (which relies on having some kind of prior [LW(p) · GW(p)], by the way), are all strategies for scoring better by these normative lights. This is also compatible with Friston's free energy principle, in that it takes as a postulate that all life seeks to minimize surprise (in the form of ). Note, I don't (currently) endorse such sweeping claims as Friston's, but at least within the domain of epistemology, this seems right to me.

This doesn't mean that probabilistic theories are normative themselves, on the object-level. For example, the theory that Brownian motion (the physical phenomenon seen in microscopes) can be explained probabilistically by a Wiener process is not a normative theory about how virtuous beings ought to respond when asked questions about Brownian motion. Of course, the Wiener process is instead a descriptive theory about Brownian motion. But, the metatheory that explains how a Wiener process can be a descriptive theory of something, and how to couple your state of belief in it to observations, and how to couple your speech acts to your state of belief—that is a normative metatheory.

It might seem like something is lost here, that in the Aristotelian picture with deterministic theories we didn't need a fiddly normative metatheory. We had what looked like a descriptive metatheory: to believe or say of what is that it is, is truth. But I think actually this is normative. For example, in a heated moment, Aristotle says that someone who refuses to make any determinate claims "is no better off than a vegetable". But really, any theory of truth is normative; to say what counts as true is to say what one ought to believe. I think the intuition behind correspondence theories of truth (that truth must be determined by actual, accessible-in-principle truth-makers) is really a meta-normative intuition, namely that good norms should be adjudicable in principle. And that the intuition behind bivalent theories of truth (that claims must be either True or False) is also a meta-normative intuition, that good norms should draw bright lines leaving no doubt about which side an act is on. The meta-norm about adjudication can be satisfied by scoring rules, but in the case of epistemology (unlike jurisprudence), the bright-line meta-norm just isn't worth the cost, which is that it makes talk of probabilities meaningless unless they are zero or one.

comment by Ege Erdil (ege-erdil) · 2021-12-26T18:25:15.873Z · LW(p) · GW(p)

So I agree with most of what you say here, and as a Metaculus user I have some sympathy for trying to make proper scoring rules the epistemological basis of "probability-speak". There are some problems with it, like different proper scoring rules give different incentives to people when it comes to distributing finite resources across many questions to acquire info about them, but broadly I think the norm of scoring models (or even individual forecasters) by their Brier score or log score and trying to maximize your own score is a good norm.

There are probably other issues, but the immediate problem for me is that this way of bootstrapping probabilistic theories seems to be circular. Given that you accept the whole Bayesian framework already, it's obvious that under this meta-normative theory you're supposed to report your true credence for any event because that's what will maximize your expected log score. This is perfectly consistent but the proper scoring rule appears to be superfluous if you already are a Bayesian. However, if you don't already accept the Bayesian way of looking at the problem then "maximize " is useless advice: is a function from the states of the world to the real numbers and there's no total order on that space for you to use for this maximization problem. In practice we would act like Bayesians and this would work, but then we're right back where we started because we're using probabilities when they don't seem to add any epistemic content.

There are other versions of this which I've mentioned in other comments: for example you can have a norm of "try to make money by betting on stuff" and you can use a Dutch book argument to show that contingent claim prices are going to give you a probability measure. While that justifies the use of some probabilities with a fairly natural sounding norm, it doesn't explain what I'm doing when I price these contingent claims or what the funny numbers I get as a result of this process actually mean. (It also leads to some paradoxes when the contingent claim payoffs are correlated with your marginal utility, but I'm setting that issue aside here.)

My central point of disagreement with your answer is that I don't think "claims must be either True or False" is a meta-normative intuition and I think it can't be necessary to abandon the law of excluded middle in order to justify the use of probabilities. In fact, even the proper scoring rule approach you outline doesn't really throw out the law of excluded middle, because unless there's some point at which the question will resolve as either True or False there's no reason for you to report your "true credence" to maximize your expected score and so the whole edifice falls apart.

Replies from: davidad
comment by davidad · 2022-01-13T16:51:58.535Z · LW(p) · GW(p)

the immediate problem for me is that this way of bootstrapping probabilistic theories seems to be circular.

I think it is not circular, though I can imagine why it seems so. Let me try to elaborate the order of operations as I see it.

  1. Syntax: Accept that a probability-sentence like "P(there will be a sea-battle tomorrow)  0.4" is at least syntactically parseable, i.e. not gibberish, even if it is semantically disqualified from being true (like "the present King of France is a human").
    • This can be formalized as adding a new term-former , other term-formers such as , constants , and finally a predicate .
  2. Logic: Accept that probability-sentences can be the premises and/or conclusions of valid deductions, such as .
    • Axiomatizing the valid deductions in a sound and complete way is not as easy as it may seem, because of the interaction with various expressive features one might want (native conditional probabilities, higher-order probabilities, polynomial inequalities) and model-theoretic and complexity-theoretic issues (pathological models, undecidable satisfiability). Some contenders:
      • LPWF, which has polynomial inequalities but not higher-order probabilities
      • LCP, which has higher-order conditional probabilities but not inequalities
      • LPP, which has neither, but has decidable satisfiability.
    • Anyway, the basic axioms about probability that we need for such logics are:
    • Those axioms can, if you wish, be derived from much weaker principles by Cox-style theorems. It's important to admit that Cox's proof of his original theorem (as cited by Jaynes) was mistaken, so there isn't actually a "Cox's theorem", but rather a family of variants that actually work given different assumed principles. My favorite is Van Horn 2017, which uses only the following principles:
      • Equivalence-invariance: If  and , then .
      • Definition-invariance: If  is an atomic proposition not appearing in , or , then .
      • Irrelevance-invariance: If  is a noncontradictory formula sharing no symbols with either  or , then .
      • Implication-compatibility: If  but not , then .
  3. Epistemics: Revise the Aristotelian norms, as follows:
    • Instead of demanding that a rational speaker either assert or deny any classical sentence about relevant propositions, instead demand that
      • a rational speaker assert or deny any probability-sentence about relevant propositions, and that
      • all their assertions be coherent, in the sense that probability-logic cannot deduce  from any subset of them.
    • Instead of classifying a speaker as either correct or incorrect (depending on whether they assert what is and deny what is not or deny what is and assert what is not), score them on the basis of the greatest rational  for which they asserted  (where  is the conjunction of all of "what is", or rather what is observed), and award them  points.
      • The  rule in particular can be justified and characterized at this stage just by the property of invariance under observation orderings, i.e.  (discussed more below)
  4. Decision theory: Optionally, you can now assume the vNM axioms on top of the probabilistic logic, prove the vNM theorem, formalize a speech-act game internalizing the  rule, and then prove a revelation theorem that says that the optimal policy for obtaining epistemic points is to report one's actual internal beliefs.

I think the key confusion here is that it may seem like one needs the decision theory set up already in order to justify the scoring rule (to establish that it incentivizes honest revelation), but the decision theory also depends on the scoring rule. I claim that the scoring rule can be justified on other grounds than honest revelation. If you don't buy the argument of invariance under observation orderings, I can probably come up with other justifications, e.g. from coding theory. Closing the decision-theoretic loop also does provide some justificatory force, even if it is circular, since being able to set up a revelation theorem is certainly a nice feature of this  norm.

But fundamentally, whether in this system or Aristotle's, one doesn't identify the epistemic norms by trying to incentivize honest reporting of beliefs, but rather by trying to validate reports that align with reality. The  rule stands as a way of extending the desire for reports that align with reality to the non-Boolean logic of probability, so that we can talk rationally about sea-battles and other uncertain events, without having to think about in what order we find things out.

different proper scoring rules give different incentives to people when it comes to distributing finite resources across many questions to acquire info about them

I haven't studied this difference, but I want to register my initial intuition that to the extent other proper scoring rules give different value-of-information incentives than the  scoring rule, they are worse and the incentives from the  rule are better. In particular, I expect the incentives of the  rule to be more invariant to different ways of asking multiple questions that basically add up to one composite problem domain, and that being sensitive to that would be a misfeature.

In fact, even the proper scoring rule approach you outline doesn't really throw out the law of excluded middle, because unless there's some point at which the question will resolve as either True or False there's no reason for you to report your "true credence" to maximize your expected score and so the whole edifice falls apart.

Even if a question never resolves fully enough to make all observables either True or False (i.e., if the possibility space is Hausdorff, resolves to a Dirac delta), but just resolves incrementally to more and more precise observations , the log scoring rule remains proper, since

I don't think the same can be said for the Brier scoring rule; it doesn't even seem to have a well-defined generalization to this case.

There are a couple fiddly assumptions here I should bring out explicitly:

  1. when it comes to epistemic value, we should have a temporal discount factor of , very much unlike prudential or ethical values where I argue [LW(p) · GW(p)] the discount factor must be .
    • If we don't do this, then we get an incentive to smear out our forecasts to the extent we expect high precision to take a long time to obtain.
    • This is one reason to keep epistemic value as a separate normative domain from other kinds of value.
      • The point you mentioned parenthetically about contingencies correlating with marginal utility is another reason to keep utility separate from epistemic value.
  2. When we decide what probabilistic statements to make, we should act as-if either the question will eventually resolve fully, or "there will always be more to discover" and that more is always discovered eventually.
  • Big tangent: There is a resonance here with CEV [? · GW], where we try to imagine an infinite future limit of all ethical knowledge having been learned, and judge our current intentions by that standard, without discounting it for being far in the future, or discounting the whole scenario for being less-than-certain that ethical beings will survive and continue their ethical development indefinitely or until there is nothing more to learn.
    • Here we are sort-of in the domain of ethics, where I'd say temporal discounting is necessary, but methodologically the question of how to determine ethical value is an epistemic one. So we shouldn't discount future ethical-knowledge Bayes-points, but we can still discount object-level ethical value.
answer by Jon Garcia · 2021-12-25T19:59:41.071Z · LW(p) · GW(p)

As I see it, probability is essentially just a measure of our ignorance, or the ignorance of any model that's used to make predictions. An event with a probability of 0.5 implies that in half of all situations where I have information indistinguishable from the information I have now, this event will occur; in the other half of all such indistinguishable situations, it won't happen.

For example, all I know is that I have a coin with two sides of equal weight that I plan to flip carelessly through the air until it lands on a flat surface. I'm not tracking how all the action potentials in the neurons of my motor cortex, cerebellum, and spinal cord will affect the precise twitches of individual muscle fibers as I execute the flip, nor the precise orientation of the coin prior to the flip, nor the position of every bone and muscle in my body, nor the minute air currents that might interact differently with the textures on the heads versus tails side, nor any variations in the texture of the landing surface, nor that sniper across the street who's secretly planning to shoot the coin once it's in the air, nor etc., etc., etc. Under the simplified model, where that's all you know, it really will land heads half the time and tails half the time across all possible instantiations of the situation where you can't tell any difference in the relevant initial conditions. In the reality of a deterministic universe, however, the coin (of any particular Everett branch of the multiverse) will either land heads-up or it won't, with no in-between state that could be called "probability".

Similarly, temperature also measures our ignorance, or rather lack of control, of the trajectories of a large number of particles. There are countless microstates that produce identical macrostates. We don't know which microstate is currently happening, how fast and in what direction each atom is moving. We just know that the molecules in the fluid in the calorimeter are bouncing around fast enough to cause the mercury atoms in the thermometer to bounce against each other hard enough to cause the mercury to expand out to the 300K mark. But there are vigintillions of distinct ways this could be accomplished at the subatomic level, which are nevertheless indistinguishable to us at the macroscopic level. You could shoot cold water through a large pipe at 100 mph and we would still call it cold, even though the average kinetic energy of the water molecules is now equivalent to a significantly higher temperature. This is because we have control over the largest component of their motion, because we can describe it with a simple model.

To a God-level being that actually does track the universal wave function and knows (and has the ability to control) the trajectories of every particle everywhere, there is no such thing as temperature, no such thing as probability. Particles just have whatever positions and momenta they have, and events either happen or they don't (neglecting extra nuances from QM). For those of us bound by thermodynamics, however, these same systems of particles and events are far less predictable. We can't see all the lowest-level details, much less model them with the same precision as reality itself, much less control them with God-level orchestration. Thus, probability, temperature, etc. become necessary tools for predicting and controlling reality at the level of rational agents embedded in the physical universe, with all the ignorance and impotence that comes along with it.

comment by Ege Erdil (ege-erdil) · 2021-12-25T20:25:40.617Z · LW(p) · GW(p)

As I see it, probability is essentially just a measure of our ignorance, or the ignorance of any model that's used to make predictions. An event with a probability of 0.5 implies that in half of all situations where I have information indistinguishable from the information I have now, this event will occur; in the other half of all such indistinguishable situations, it won't happen.

Here I think you're mixing two different approaches. One is the Bayesian apporach: it comes down to saying probabilistic theories are normative. The question is how to reconcile that with how these theories make some predictions that don't look normative at all: for example, saying that blackbody radiation flux scales with the fourth power of temperature seems like a concrete prediction that doesn't have much to do with the ignorance of any particular observer. QM is even more troublesome but you don't need to go there to begin to see some puzzles.

The second is to say that in some circumstances you'll get a unique probability measure on an event space by requiring that the measure is invariant under the action of some symmetry group on the space. I think this is a useful meta-principle for choosing probability measures (for example, unitary symmetry of QM -> Born rule), and it can get you somewhere if you combine it with Dutch book style arguments, but in practice I give probabilities on lots of events which don't seem to have this kind of nice symmetry that die rolls or coin flips have, and I think what I'm doing there is a reasonable thing to do. I just don't know how to explain what I'm doing or how to justify it properly.

Similarly, temperature also measures our ignorance, or rather lack of control, of the trajectories of a large number of particles... To a God-level being that actually does track the universal wave function and knows (and has the ability to control) the trajectories of every particle everywhere, there is no such thing as temperature, no such thing as probability.

The problem here is that there are plenty of physical phenomena which are probably best understood in terms of temperature even if you're God. Phase transitions are one example of that: it's unlikely that the "good understanding" of the superconducting phase transition doesn't involve mentioning temperature/statistical mechanics at all, for example.

Thus, probability, temperature, etc. become necessary tools for predicting and controlling reality at the level of rational agents embedded in the physical universe, with all the ignorance and impotence that comes along with it.

I agree with this in general, but we use probability in many different senses, some of them not really connected to this problem of uncertainty. I've given some examples already in the comment, and you can even produce ones from mathematics: for example, plenty of analytic number theory can be summed up as trying to understand in what sense the Liouville function is random (i.e. you can model it as a "coin flip") and how to prove that it is so.

I think none of this unfortunately answers the question of what the "epistemic status" of a probabilistic theory actually is.

answer by davidad · 2021-12-26T06:27:18.321Z · LW(p) · GW(p)
  1. You spend a few paragraphs puzzling about how a probabilistic theory could be falsified. As you say, observing an event in a null set or a meagre set does not do the trick. But observing an event which is disjoint from the support of the theory's measure does falsify it. Support is a very deep concept; see this category-theoretic treatise that builds up to it.
  2. However, I think the more fundamental question here isn't "how can I discard most of the information in a probabilistic theory so that it fits into Popperian falsificationism?", but rather "why should I accept Bayesian epistemology when it doesn't seem to fit into Popperian falsificationism?" For that, I refer you to Andrew Gelman's nuanced views and Sprenger and Hartmann's Bayesian Philosophy of Science.
comment by Ege Erdil (ege-erdil) · 2021-12-26T08:55:32.050Z · LW(p) · GW(p)

You spend a few paragraphs puzzling about how a probabilistic theory could be falsified. As you say, observing an event in a null set or a meagre set does not do the trick. But observing an event which is disjoint from the support of the theory's measure does falsify it. Support is a very deep concept; see this category-theoretic treatise that builds up to it.

You can add that as an additional axiom to some theory, sure. It's not clear to me why that is the correct notion to have, especially since you're adding some extra information about the topology of your probability space when interpreting the measure theoretic structure which seems "illegitimate" and difficult to generalize to some other situations.

My point with meager sets was that they are orthogonal to sets of null measure, so you really need to give some explanation for why you "break the symmetry" between two classes of small sets by privileging one over the other.

However, I think the more fundamental question here isn't "how can I discard most of the information in a probabilistic theory so that it fits into Popperian falsificationism?", but rather "why should I accept Bayesian epistemology when it doesn't seem to fit into Popperian falsificationism?" For that, I refer you to Andrew Gelman's nuanced views and Sprenger and Hartmann's Bayesian Philosophy of Science.

I don't think the question is about Popperian falsificationism, though Popperians are usually more able to notice the philosophical problem I'm talking about in the question. I simply don't actually know what the relationship of probability to anything "real" is when a theory says "here is a space of outcomes and here is a probability measure on it". The probability measure doesn't seem to "tell you" anything.

Thanks for the references, I'll take a look. I'm not very hopeful since if there is a good answer to my question I think it should fit in the space of an answer - most of what's in these sources seems to be irrelevant to what I'm asking.

Replies from: davidad
comment by davidad · 2021-12-26T11:20:53.655Z · LW(p) · GW(p)

Okay, I now think both of my guesses about what's really being asked were misses. Maybe I will try again with a new answer; meanwhile, I'll respond to your points here.

You're right that I'm sneaking something in when invoking support because it depends on the sample space having a topological structure, which cannot typically be extracted from just a measurable structure. What I'm sneaking in is that both the σ-algebra structure and the topological structure on a scientifically meaningful space ought to be generated by the (finitely) observable predicates. In my experience, this prescription doesn't contradict with standard examples, and situations to which it's "difficult to generalize" feel confused and/or pathological until this is sorted out. So, in a sense I'm saying, you're right that a probability space by itself doesn't connect to reality—because it lacks the information about which events in are opens.

As to why I privilege null sets over meagre sets: null sets are those to which the probability measure assigns zero value, while meagre sets are independent of the probability measure—the question of which sets are meagre is determined entirely by the topology. If the space is Polish (or more generally, any Baire space), then meagre sets are never inhabited open sets, so they can never conceivably be observations, therefore they can't be used to falsify a theory.

But, given that I endorse sneaking in a topology, I feel obligated to examine meagre sets from the same point of view, i.e. treating the topology as a statement about which predicates are finitely observable, and see what role meagre sets then have in philosophy of science. Meagre sets are not the simplest concept; the best way I've found to do this is via the characterization of meagre sets with the Banach–Mazur game:

  • Suppose Alice is trying to claim a predicate X is true about the world, and Bob is trying to claim it isn't true.
  • They engage in armchair "yes-but" reasoning, taking turns saying "suppose we observe Eᵢ".
  • The rules are the same for Alice and Bob: they can choose any finitely observable event, as long as it is consistent with all the previous suppositions (i.e. has nonempty intersection).
  • In the limit of countably-infinitely many rounds, Alice wins if all the suppositions remain consistent with X, or Bob wins if he can rule out X.
  • Alice gets to move first.

Of course, for any claim X that is finitely observable, the first-move advantage is decisive: Alice can simply say "suppose we observe X," and now Bob is doomed. But there are some sets X for which Bob has a guaranteed winning strategy, and those are the meagre sets.

From a philosophy-of-science perspective, meagre sets are propositions internal to a scientific ontology which, even if the ontology is assumed true, could always be falsified by a stream of experimental outcomes from an adversarial Nature (Bob's moves), even if each such outcome must be consistent with the best possible outcome for the proposition (Alice's moves). That's the sense in which meagre sets are negligible. Very loosely, they are hypotheses that, in a specific way, it doesn't make sense to argue for. For example, the proposition that the fine-structure constant is a Martin-Löf random number has probability 1, but it doesn't make sense to argue that this is "in fact" the case, essentially because the proposition is meagre.

Replies from: davidad, ege-erdil
comment by davidad · 2021-12-26T13:01:32.057Z · LW(p) · GW(p)

A related perspective on meagre sets as propositions (mostly writing down for my own interest):

  • The interior operator can be thought of as "rounding down to the nearest observable proposition", since it is the upper bound of all opens that imply .
  • The condition for  to be nowhere dense is equivalent to .
  • If we are working with a logic of observables, where every proposition must be an observable, the closest we can get to a negation operator is a pseudo-negation .
  • So a nowhere dense set is a predicate whose double-pseudo-negation  is false, or equivalently  is true.
  • Another slogan, derived from this, is "a nowhere dense hypothesis is one we cannot rule out ruling out".
  • The meagre propositions are the -ideal generated by nowhere dense propositions.
comment by Ege Erdil (ege-erdil) · 2021-12-26T12:50:26.031Z · LW(p) · GW(p)

What I'm sneaking in is that both the σ-algebra structure and the topological structure on a scientifically meaningful space ought to be generated by the (finitely) observable predicates. In my experience, this prescription doesn't contradict with standard examples, and situations to which it's "difficult to generalize" feel confused and/or pathological until this is sorted out.

It's not clear to me how finitely observable predicates would generate a topology. For a sigma algebra it's straightforward to do the generation because they are closed under complements, but for a topology if you allow both a predicate and its negation to be "legitimate" then you'll end up with all your basis elements being clopen. This would then give you something that looks more like a Cantor set than a space like .

I agree "morally" that the topology should have something to do with finitely observable predicates, but just taking it to be generated by them seems to exclude a lot of connected spaces which you might want to be "scientifically meaningful", starting from .

From a philosophy-of-science perspective, meagre sets are propositions internal to a scientific ontology which, even if the ontology is assumed true, could always be falsified by a stream of experimental outcomes from an adversarial Nature (Bob's moves), even if each such outcome must be consistent with the best possible outcome for the proposition (Alice's moves). That's the sense in which meagre sets are negligible. Very loosely, they are hypotheses that, in a specific way, it doesn't make sense to argue for. For example, the proposition that the fine-structure constant is a Martin-Löf random number has probability 1, but it doesn't make sense to argue that this is "in fact" the case, essentially because the proposition is meagre.

Your point is taken, though as a side remark I think it's ludicrous to claim that something like the fine structure constant has any property like this with probability 1, given that it's most likely so far from being a number chosen randomly from some range.

I think putting meager sets in context using the Banach-Mazur game makes sense, but to me this only makes the issue worse, since the existence of comeager & null events would mean that there are some hypotheses that

  1. it doesn't make sense to argue against and yet
  2. you should give arbitrarily large odds in a bet to anyone who would claim they are correct.

You're saved from a contradiction because in this setup neither the comeager event nor its complement contain any nonempty open, so the event can neither be proven true or false if we assume the opens are all the finitely observable predicates. In that sense it "doesn't matter" what odds you give on the event being true or false, but it still seems to me like on a probability space that is also a Polish space you have two structures living together which give you contradictory signals about which events are "small", and it's difficult to reconcile these two different ways of looking at things.

I also think that this discussion, though interesting, is somewhat beside the point: even if we deal with some probability space which is also a Polish space, I'm still not sure what is the information added by the probability measure beyond what its support is. Any two measures absolutely continuous wrt each other will have the same support, but obviously we would treat them very differently in practice.

Replies from: davidad
comment by davidad · 2021-12-26T13:32:43.782Z · LW(p) · GW(p)

(I agree with your last paragraph—this thread is interesting but unfortunately beside the point since probabilistic theories are obviously trying to "say more" than just their merely nondeterministic shadows.)

Negations of finitely observable predicates are typically not finitely observable. [0,0.5) is finitely observable as a subset of [0,1], because if the true value is in [0,0.5) then there necessarily exists a finite precision with which we can know that. But its negation, [0.5,1], is not finitely observable, because if the true value is exactly 0.5, no finite-precision measurement can establish with certainty that the value is in [0.5,1], even though it is.

The general case of why observables form a topology is more interesting. Finite intersections of finite observables are finitely observable because I can check each one in series and still need only finite observation in total. Countable unions of finite observables are finitely observable because I can check them in parallel, and if any are true then its check will succeed after only finite observation in total.

Uncountable unions are thornier, but arguably unnecessary (they're redundant with countable unions if the space is hereditarily Lindelöf, for which being Polish is sufficient, or more generally second-countable), and can be accommodated by allowing the observer to hypercompute. This is very much beside the point, but if you are still interested anyway, check out Escardó's monograph on the topic.

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-26T13:52:11.429Z · LW(p) · GW(p)

Negations of finitely observable predicates are typically not finitely observable. [0,0.5) is finitely observable as a subset of [0,1], because if the true value is in [0,0.5) then there necessarily exists a finite precision with which we can know that. But its negation, [0.5,1], is not finitely observable, because I'd the true value is exactly 0.5, no finite-precision measurement can establish with certainty that the value is in [0.5,1], even though it is.

Ah, I didn't realize that's what you mean by "finitely observable" - something like "if the proposition is true then there is a finite precision measurement which will show that it's true". That does correspond to the opens of a metric space if that's how you formalize "precision", but it seems like a concept that's not too useful in practice because you actually can't measure things to arbitrary precision in the real world. [0, 0.5) is not going to actually be observable as long as your apparatus of observation has some small but nonzero lower bound on its precision.

What's the logic behind not making this concept symmetric, though? Why don't we ask also for "if the proposition is false then there is a finite precision measurement which will show that it's false", i.e. why don't we ask for observables to be clopens? I'm guessing it's because this concept is too restrictive, but perhaps there's some kind of intuitionist/constructivist justification for why you'd not want to make it symmetric like this.

Uncountable unions are thornier, but arguably unnecessary, and can be accommodated by allowing the observer to hypercompute. This is very much beside the point, but if you are still interested anyway, check out Escardó's monograph on the topic.

I'll check it out, thanks.

Replies from: davidad
comment by davidad · 2021-12-26T18:28:08.253Z · LW(p) · GW(p)

What's the logic behind not making this concept symmetric, though?

It's nice if the opens of can be internalized as the continuous functions for some space of truth values with a distinguished point such that . For this, it is necessary (and sufficient) for the open sets of to be generated by . I could instead ask for a distinguished point such that , and for this it is necessary and sufficient for the open sets of to be generated by . Put them together, and you get that must be the Sierpiński space: a "true" result () is finitely observable ( is open), but a "false" result is not ( is not open).

perhaps there's some kind of intuitionist/constructivist justification

Yes, constructively we do not know a proposition until we find a proof. If we find a proof, it is definitely true. If we do not find a proof, maybe it is false, or maybe we have not searched hard enough—we don't know.

Also related is that the Sierpiński space is the smallest model of intuitionistic propositional logic (with its topological semantics) that rejects LEM, and any classical tautology rejected by Sierpiński space is intuitionistically equivalent to LEM. There's a sense in which the difference between classical logic and intuitionistic logic is precisely the assumption that all open sets of possibility-space are clopen (which, if we further assume , leads to an ontology where possibility-space is necessarily discrete). (Of course it's not literally a theorem of classical logic that all open sets are clopen; this is a metatheoretic claim about semantic models, not about objects internal to either logic.) See A Semantic Hierarchy for Intuitionistic Logic.

answer by dkirmani · 2021-12-26T01:36:12.560Z · LW(p) · GW(p)

What do we mean when we say that we have a probabilistic theory of some phenomenon?

If you have a probabilistic theory of a phenomenon, you have a probability distribution whose domain, or sample space, is the set of all possible observations of that phenomenon.

comment by Ege Erdil (ege-erdil) · 2021-12-26T08:32:12.308Z · LW(p) · GW(p)

The question is about the apparently epiphenomenal status of the probability measure and how to reconcile that with the probability measure actually adding information content to the theory. This answer is obviously "true", but it doesn't actually address my question.

answer by DaemonicSigil · 2021-12-26T07:21:17.262Z · LW(p) · GW(p)

A probabilistic theory can be considered as a function that maps random numbers to outcomes. It tells us to model the universe as a random number generator piped through that function. A deterministic theory is a special case of a probabilistic theory that ignores its random number inputs, and yields the same output every time.

Here's an example: We can use the probabilistic theory of quantum mechanics to predict the outcome of a double slit experiment. If we feed a random number to the theory it will predict a photon to hit in a particular location on the screen. If we feed in another random number, it will predict another hit somewhere else on the screen. Feed in lots of random numbers, and we'll get a probability distribution of photon hits. Believing in the probabilistic theory of quantum mechanics means we expect to see the same distribution of photon hits in real life.

Suppose we have a probabilistic theory, and observe an outcome that is unlikely according to our theory. There are two explanations: The first is that the random generator happened to generate an unlikely number which produced that outcome. The second is that our theory is wrong. We'd expect some number of unlikely events by chance. If we see too many outcomes that our theory predicts should be unlikely, then we should start to suspect that the theory is wrong. And if someone comes along with a deterministic theory that can actually predict the random numbers, then we should start using that theory instead. Yudkowsky's essay "A Technical Explanation of Technical Explanation" covers this pretty well, I'd recommend giving it a read.

The takeaway is that quantum mechanics isn't a decision theory of how humans should act. It's a particular (very difficult to compute) function that maps random numbers to outcomes. We believe with very high probability that quantum mechanics is correct, so if quantum mechanics tells us a certain event has probability 0.5, we should believe it has probability 0.50001 or 0.49999 or something.

Also, in real life, we can never make real-number measurements, so we don't have to worry about the issue of observing events of probability 0 when sampling from a continuous space. All real measurements in physics have error bars. A typical sample from the interval [0,1] would be an irrational, transcendental, uncomputable number. Which means it would have infinitely many digits, and no compressed description of those digits. The only way to properly observe the number would be to read the entire number, digit by digit. Which is a task no finite being could ever complete.

comment by davidad · 2021-12-26T08:16:21.723Z · LW(p) · GW(p)

On the point about real-life measurements: we can observe events of probability 0, such as 77.3±0.1 when the distribution was uniform on [0,1]. What we can't observe are events that are non-open sets. I actually think that "finitely observable event" is a great intuitive semantics for the topological concept of "open set"; see Escardó's Synthetic Topology.

My proposal (that a probabilistic theory can be falsified when an observed event is disjoint from its support) is equivalent to saying that a theory can be falsified by an observation which is a null set, provided we assume that any event we could possibly observe is necessarily an open set (and I think we should indeed set up our topologies so that this is the case).

comment by Ege Erdil (ege-erdil) · 2021-12-26T09:04:38.140Z · LW(p) · GW(p)

Believing in the probabilistic theory of quantum mechanics means we expect to see the same distribution of photon hits in real life.

No it doesn't! That's the whole point of my question. "Believing the probabilistic theory of quantum mechanics" means you expect to see the same distribution of photon hits with a very high probability (say ), but if you have not justified what the connection of probabilities to real world outcomes is to begin with, that doesn't help us. Probabilistic claims just form a closed graph of reference in which they only refer to each other but never to claims of the form "X happens" or "X does not happen".

I've already received multiple comments and answers which don't actually understand my question and tells me things I already know, such as "a probabilistic theory is just some outcome space + a probability measure on it" or some analog of that. I know that already, my question is about the epistemic status of the probability measure in such a theory.

Replies from: DaemonicSigil
comment by DaemonicSigil · 2021-12-27T00:16:04.830Z · LW(p) · GW(p)

Okay, thanks for clarifying the question. If I gave you the following answer, would you say that it counts as a connection to real-world outcomes?

The real world outcome is that I run a double slit experiment with a billion photons, and plot the hit locations in a histogram. The heights of the bars of the graph closely match the probability distribution I previously calculated.

What about 1-time events, each corresponding to a totally unique physical situation? Simple. For each 1 time event, I bet a small amount of money on the result, at odds at least as good as the odds my theory gives for that result. The real world outcome is that after betting on many such events, I've ended up making a profit.

It's true that both of these outcomes have a small chance of not-happening. But with enough samples, the outcome can be treated for all intents and purposes as a certainty. I explained above why the "continuous distribution" objection to this doesn't hold.

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-27T08:29:54.965Z · LW(p) · GW(p)

It's true that both of these outcomes have a small chance of not-happening. But with enough samples, the outcome can be treated for all intents and purposes as a certainty.

I agree with this in practice, but the question is philosophical in nature and this move doesn't really help you get past the "firewall" between probabilistic and non-probabilistic claims at all. If you don't already have a prior reason to care about probabilities, results like the law of large numbers or the central limit theorem can't convince you to care about it because they are also probabilistic in nature.

For example, all LLN can give you is "almost sure convergence", i.e. convergence with probability 1, and if I don't have a prior reason to disregard events of probability 0 there's no reason for me to care about this result.

I think davidad [LW · GW] gave the best answer out of everyone so far, so you can also read his answers along with my conversation with him in the comment threads if you want to better understand where I'm coming from.

16 comments

Comments sorted by top scores.

comment by Zac Hatfield-Dodds (zac-hatfield-dodds) · 2021-12-26T04:10:23.727Z · LW(p) · GW(p)

Deterministic theories have the feature that they forbid some class of events from happening - for instance, the second law of thermodynamics forbids the flow of heat from a cold object to a hot object in an isolated system. The probabilistic component in a theory has no such character, even in principle.

This seems like an odd example to me, since the second law of thermodynamics is itself probabilistic!

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-26T08:27:54.871Z · LW(p) · GW(p)

This is not true. You can have a model of thermodynamics that is statistical in nature and so has this property, but thermodynamics itself doesn't tell you what entropy is, and the second law is formulated deterministically.

comment by JBlack · 2021-12-26T02:07:53.876Z · LW(p) · GW(p)

I'm not sure what the problem is, nor why you connect Bayesian approaches with "how some agent with a given expected utility should act". There is a connection between those concepts, but they're certainly not the same thing.

The Bayesian approach is simply that you can update prior credences of hypotheses using evidence to get posterior credences. If the posterior credence is literally zero then that hypothesis is eliminated in the sense that every remaining hypothesis with nonzero credence now outweighs it. There will always be hypotheses that have nonzero credence.

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-26T08:37:19.076Z · LW(p) · GW(p)

See my response [LW(p) · GW(p)] to a similar comment below.

comment by 52ceccf20f20130d0f8c2716521d24de · 2022-01-13T15:33:08.318Z · LW(p) · GW(p)

(Why) are you not happy with Velenik's answer or "a probabilistic theory tells us that if we look at an event and perform the same experiment times, then the fraction of experiments where happened approaches in a LLN-like manner"? Is there something special about physical phenomena as opposed to observables?

 

> can be written as the union of a meager set and a set of null measure. This result forces us to make a choice as to which class of sets we will neglect, or otherwise we will end up neglecting the whole space !

Either neither of these sets are measurable or this meagre set has measure 1. Either way, it seems obvious what to neglect.

comment by Signer · 2021-12-30T06:47:39.831Z · LW(p) · GW(p)

It's actually worse: you need bridge laws even for deterministic theories, because you can't observe outcomes directly. You need "if the number on this device looks to me like the one predicted by theory, then the theory is right" just like you need "if I run billion experiments and frequency looks to me like probability predicted by the theory, then the theory is right". The only advantage of deterministic theories is that fundamental math is also deterministic and so you may want to say things like "but the laws themselves are true", but it's only advantage if you think that math is more fundamental than physics - from inside of probabilistic physical theory all implementations of math are probabilistic. So yes, you either abandon the concept of deterministic truth or use probabilistic theory normatively.

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-30T09:25:31.258Z · LW(p) · GW(p)

You need "if the number on this device looks to me like the one predicted by theory, then the theory is right" just like you need "if I run billion experiments and frequency looks to me like probability predicted by the theory, then the theory is right".

You can say that you're trying to solve a "downward modeling problem" when you try to link any kind of theory you have to the real world. The point of the question is that in some cases the solution to this problem is more clear to us than in others, and in the probabilistic case we seem to be using some unspecified model map to get information content out of the probability measure that comes as part of a probabilistic theory. We're obviously able to do that but I don't know how we do it, so that's what the question is about.

Saying that "it's just like a deterministic theory" is not a useful comment because it doesn't answer this question, it just says "there is a similar problem to this which is also difficult to answer, so we should not be optimistic about the prospects of answering this one either". I'm not sure that I buy that argument, however, since the deterministic and probabilistic cases look sufficiently different to me that I can imagine the probabilistic case being resolved while treating the deterministic one as a given.

So yes, you either abandon the concept of deterministic truth or use probabilistic theory normatively.

You don't actually know you have to do that, so this seems like a premature statement to make. It also seems highly implausible to me that these are your only two options in light of some of the examples I've discussed both in the original question and in the replies to some of the answers people have submitted. Again, I think phase transition models offer a good example.

Replies from: Signer
comment by Signer · 2021-12-30T16:05:16.114Z · LW(p) · GW(p)

it doesn’t answer this question

Hence it's a comment and not an answer^^.

I don't get your examples: for a theory that predicts phase transition to have information content in the desired sense you would also need to specify model map. What's the actual difference with deterministic case? That "solution is more clear"? I mean it's probably just because of what happened to be implemented in brain hardware or something and I didn't have the sense that it was what the question was about.

Or is it about non-realist probabilistic theories not specifying what outcomes are impossible in realist sense? Then I don't understand what's confusing about treating probabilistic part normatively - that just what being non-realist about probability means.

comment by dkirmani · 2021-12-26T01:17:20.757Z · LW(p) · GW(p)

The alternative is to adopt a Bayesian approach, in which case the function of a probabilistic theory becomes purely normative - it informs us about how some agent with a given expected utility should act.

Not sure I buy this assertion. A Bayesian approach tells you how to update the plausibilities of various competing {propositions/hypotheses/probabilistic theories}. Sure, you could then use those plausibilities to select an action that maximizes the expectation of some utility function. But that isn't what Bayes' rule is about.

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-26T08:36:23.845Z · LW(p) · GW(p)

Here I'm using "Bayesian" as an adjective which refers to a particular interpretation of the probability calculus, namely one where agents have credences about an event and they are supposed to set those credences equal to the "physical probabilities" coming from the theory and then make decisions according to that. It's not the mere acceptance of Bayes' rule that makes someone a Bayesian - Bayes' rule is a theorem so no matter how you interpret the probability calculus you're going to believe in it.

With this sense of "Bayesian", the epistemic content added by a probability measure to a theory appears to be normative. It tells you how you should or should not act instead of telling you something about the real world, or so it seems.

Replies from: JBlack
comment by JBlack · 2021-12-27T07:47:12.425Z · LW(p) · GW(p)

The use of the word "Bayesian" here means that you treat credences according to the same mathematical rules as probabilities, including the use of Bayes' rule. That's all.

comment by tivelen · 2021-12-25T20:01:26.215Z · LW(p) · GW(p)

Suppose an answer appeared here, and when you read it, you were completely satisfied by it. It answered your question perfectly. How would this world differ from one in which no answer remotely satisfied you? Would you expect yourself to have more accurate beliefs or help you achieve your goals?

If not, to the best of your knowledge, why have you decided to ask the question in the first place?

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-25T20:10:03.509Z · LW(p) · GW(p)

I don't know what you mean here. One of my goals is to get a better answer to this question than what I'm currently able to give, so by definition getting such an answer would "help me achieve my goals". If you mean something less trivial than that, well, it also doesn't help me to achieve my goals to know if the Riemann hypothesis is true or false, but RH is nevertheless one of the most interesting questions I know of and definitely worth wondering about.

I can't know how an answer I don't know about would impact my beliefs or behavior, but my guess is that the explanation would not lead us to change how we use probability, just like thermodynamics didn't lead us to change how we use steam engines. It was, nevertheless, still worthwhile to develop the theory.

Replies from: tivelen
comment by tivelen · 2021-12-26T03:34:49.218Z · LW(p) · GW(p)

My approach was not helpful at all, which I can clearly see now. I'll take another stab at your question.

You think it is reasonable to assign probabilities, but you also cannot explain how you do so or justify it. You are looking for such an explanation or justification, so that your assessment of reasonableness is backed by actual reason.

Are you unable to justify any probability assessments at all? Or is there some specific subset that you're having trouble with? Or have I failed to understand your question properly?

Replies from: ege-erdil
comment by Ege Erdil (ege-erdil) · 2021-12-26T09:08:51.248Z · LW(p) · GW(p)

I think you can justify probability assessments in some situations using Dutch book style arguments combined with the situation itself having some kind of symmetry which the measure must be invariant under, but this kind of argument doesn't generalize to any kind of messy real world situation in which you have to make a forecast on something, and it still doesn't give some "physical interpretation" to the probabilities beyond "if you make bets then your odds have to form a probability measure, and they better respect the symmetries of the physical theory you're working with".

If you phrase this in terms of epistemic content, I could say that a probability measure just adds information about the symmetries of some situation when seen from your perspective, but when I say (for example) that there's a 40% chance Russia will invade Ukraine by end of year 2022 this doesn't seem to correspond to any obvious symmetry in the situation.

Replies from: tivelen
comment by tivelen · 2021-12-26T14:44:56.101Z · LW(p) · GW(p)

Perhaps such probabilities are based on intuition, and happen to be roughly accurate because the intuition has formed as a causal result of factors influencing the event? In order to be explicitly justified, one would need an explicit justification of intuition, or at least intuition within the field of knowledge in question.

I would say that such intuitions in many fields are too error-prone to justify any kind of accurate probability assessment. My personal answer then would be to discard probability assessments that cannot be justified, unless you have sufficient trust in your intuition about the statement in question.

What is your thinking on this prong of the dilemma (retracting your assessment of reasonableness on these probability assessments for which you have no justification)?