Posts
Comments
Thanks for raising this important point. When modeling these situations carefully, we need to give terms like "today" a precise semantics that's well-defined for the agent. With proper semantics established, we can examine what credences make sense under different ways of handling indexicals. Matthias Hild's paper "Auto-epistemology and updating" demonstrates how to carefully construct time-indexed probability updates. We could then add centered worlds or other approaches for self-locating probabilities.
Some cases might lead to puzzles, particularly where epistemic fixed points don't exist. This might push us toward modeling credences differently or finding other solutions. But once we properly formalize "today" as an event, we can work on satisfying richness conditions. Whether this leads to inconsistent attitudes depends on what constraints we place on those attitudes - something that reasonable people might disagree about, as debates over sleeping beauty suggest.
Ah, so not like, A is strongly preferred to B and B is strongly preferred to A, but more of a violation of transitivity. Then I still think that the Broome paper is a place I'd look at, since you get that exact kind of structure in preference aggregation.
The Bradley paper assumes everything is transitive throughout, so I don't think you get the kind of structure you want there. I'm not immediately aware of any work of that kind of inconsistency in JB that isn't in the social choice context, but there might be some. I'll take a look.
There are ways to think about degrees and measures of incoherence, and how that connects up to decision making. I'm thinking mainly of this paper by Schervish, Seidenfeld, and Kadane, Measures of Incoherence: How Not to Gamble if You Must. There might a JB-style version of that kind of work, and if there isn't, I think it would be good to have one.
But to your core goal or weakening the preference axioms to more realistic standards, you can definitely do that in JB by weakening the preference axioms, but still keeping the background objects of preference be propositions in a single algebra. I think this would still preserve many of what I consider the naturalistic advantages of the JB system. For modifying the preference axioms, I would guess descriptively you might want something like prospect theory, or something else along those broad lines. Also depends on what kinds of agents we want to describe.
The JB framework as standardly formulated assumes complete and consistent preferences. Of course, you can keep the same JB-style objects of preference (the propositions) and change modify the preference axioms. For incomplete preferences, there's a nice paper by Richard Bradley, Revising Incomplete Attitudes, that looks at incomplete attitudes in a very Jeffrey-Bolker style framework (all prospects are propositions). It has a nice discussion of different things that might lead to incompleteness (one of which is "Ignorance", related to the kind of Knightian uncertainty you asked about), and also some results and perspectives on attitude changes for imprecise Bayesian agents.
I'm less sure about inconsistent preferences - it depends what exactly you mean by that. Something related might be work on aggregating preferences, which can involve aggregating preferences that disagree and so look inconsistent. John Broome's paper Bolker-Jeffrey Expected Utility Theory and Axiomatic Utilitarianism is excellent on this - it examines both the technical foundations of JB and its connections to social choice and utilitarianism, proving a version of the Harsanyi Utilitarian Theorem in JB.
On imprecise probabilities: the JB framework actually has a built-in form of imprecision. Without additional constraints, the representation theorem gives non-unique probabilities (this is part of Bolker's uniqueness theorem). You can get uniqueness by adding extra conditions, like unbounded utility or primitive comparative probability judgments, but the basic framework allows for some probability imprecision. I'm not sure about deeper connections to infraprobability/Bayesianism, but given that these approaches often involve sets of probabilities, there may be interesting connections to explore.
Yes, austerity does have an interesting relationship with counterfactuals, which I personally consider a feature, not a bug. A strong version of austerity would rule out certain kinds of counterfactuals, particularly those that require considering events the agent is certain won't happen. This is because austerity requires us to only include events in our model that the agent considers genuinely possible.
However, this doesn't mean we can't in many cases make sense of apparently counterfactual reasoning. Often when we say things like "you should have done B instead of A" or "if I had chosen differently, I would have been richer", we're really making forward-looking claims about similar future situations rather than genuine counterfactuals about what could have happened.
For example, imagine a sequence of similar decision problems (similar as in, you view what you learn as one decision problem as informative about the others, in a straightforward way) where you must choose between rooms A and B (then A' and B', etc.), where one contains $100 and the other $0. After entering a room, you learn what was in both rooms before moving to the next choice. When we say "I made the wrong choice - I should have entered room B!" (for example, after learning that you chose the room with less money), from an austerity perspective we might reconstruct the useful part of this reasoning as not really making a claim about what could have happened. Instead, we're learning about the expected value of similar choices for future decisions, and considering the counterfactual is just an intuitive heuristic for doing that. If what was in room A is indicative of what will be in A', then this apparent counterfactual reasoning is actually forward-looking learning that informs future choices. Now of course not all uses of counterfactuals can get this kind of reconstruction, but at least many of them that seem useful can.
It's also worth noting that while austerity constrains counterfactuals, the JB framework can still accommodate causal decision theory approaches (like Joyce's or Bradley's versions) that many find attractive, and so in a sense allows certain kinds of decision-theoretic counterfactuals. Now, I think one could push back on austerity grounds even here, and I do think that some versions of CDT implemented in JB would run afoul of certain strong interpretations of austerity. However, I'd say that even with these additions, JB remains more austere than Savage's framework, which forces agents to rank clearly impossible acts.
The core insight is that we can capture much of the useful work done by counterfactual reasoning without violating austerity by reinterpreting apparently counterfactual claims as forward-looking learning opportunities.
Thanks for this example. I'm not sure if I fully understand why this is supposed to pose a problem, but maybe it helps to say that by "meaningfully consider" we mean something like, is actually part of the agent's theory of the world. In your situation, since the agent is considering which envelope to take, I would guess that to satisfy richness she should have a credence in the proposition.
I think (maybe?) what makes this case tricky or counterintuitive is that the agent seems to lack any basis for forming beliefs about which envelope contains the money - their memory is erased each time and the location depends on their previous (now forgotten) choice.
However, this doesn't mean they can't or don't have credences about the envelope contents. From the agent's subjective perspective upon waking, they might assign 0.5 credence to each envelope containing the money, reasoning that they have no information to favor either envelope. Or they might have some other credence distribution based on their (perhaps incorrect) theory of how the experiment works.
The richness condition simply requires that if the agent does form such credences, they should be included in their algebra. We're not making claims about what propositions an agent should be able to form credences about, nor about whether those credences are well-calibrated. The framework aims to represent the agent's actual beliefs about the world, as in, how things are or might be from the agent's perspective, even in situations where forming accurate beliefs might be difficult or impossible.
This also connects to the austerity condition - if the agent truly believes it's impossible for them to have any credence about the envelope contents, then such propositions wouldn't be in their algebra. But that would be quite an unusual case, since most agents will form some beliefs in such situations, even if those beliefs end up being incorrect or poorly grounded.
This post is partially motivated by a comment I made on AnnaSalamon's post about alternatives to VNM. There I wanted to consider not just decision rule alternatives to vNM, but also decision framework alternatives to vNM. I hope that this post helps demonstrate that there can be value in thinking about the underlying frameworks we use.
I agree with both of you --- QM is one of our most successful physical theories, and we should absolutely take it seriously! We de-empahsized QM in the post so we could focus on the de Finetti perspective, and what it teaches us about chance in many contexts. QM is also very much worth discussing --- it would just be a longer, different, more nuanced post.
It is certainly true that certain theories of QM --- such as the GRW one mentioned in footnote 8 of the post --- do have chance as a fundamental part of the theory. Insofar as we assign positive probability to such theories, we should not rule out chance as being part of the world in a fundamental way.
Indeed, we tried to point out in the post that the de Finetti theorem doesn't rule out chances, it just shows we don't need them in order to apply our standard statistical reasoning. In many contexts --- such as the first two bullet points in the comment to which I am replying --- I think that the de Finetti result gives us strong evidence that we shouldn't reify chance.
I also think --- and we tried to say this in the post --- that it is an open question and active debate how much this very pragmatic reduction of chance can extend to the QM context. Indeed, it might very well be that the last two bullet points above do involve chance being genuinely in the territory.
So I suspect we pretty much agree the broad point --- QM definitely gives us some evidence that chances are really out there, but there are also non-chancey candidates. We tried to mention QM and indicate that things get subtle there without it distracting from the main text.
Some remarks on the other parts of the comments are below, but they are more for fun & completeness, as they get in the weeds a bit.
***
In response to the discussion of whether or not adding randomness or removing randomness makes something more complex, we didn't make any such claim.
Complexity isn't a super motivating property for me in thinking about fundamental physics. Though I do find the particular project of thinking about randomness in QM really interesting --- here is a paper I enjoy that shows things can get pretty tricky.
I also agree that how different theories of QM interact with the constraints of special relativity (esp. locality) is very important for evaluating the theory.
With respect to the many worlds interpretation, at least Everett himself was clear that he thought his theory didn't involve probability (though of course we don't have to blindly agree with what he says about his version of many worlds --- he could be wrong about his own theory, or we could be considering a slightly different version of many worlds). This paper of his is particularly clear about this point. At the bottom of page 18 he discusses the use of probability theory mathematically in the theory, and writes:
"Now probability theory is equivalent to measure theory mathematically, so that we can make use of it, while keeping in mind that all results should be translated back to measure theoretic language."
Jeff Barrett, whose book I linked to in the QM footnote in the main text, and whose annotations are present in the linked document, describes the upshot of this remark (in a comment):
"The reason that Everett insists that all results be translated back to measure theoretic language is that there are, strictly speaking, no probabilities in pure wave mechanics; rather, the measure derived above provides a standard of typicality for elements in the superposition and hence for relative facts."
In general, Everett thought "typicality" a better way to describe the norm squared amplitude of a branch in his theory. On Everett's view, It would not be appropriate to confuse a physical quantity (typicality) and probability (the kind of thing that guides our actions in an EU way and drives our epistemology in a Bayesian way), even if they obey the same mathematics.
In general, my understanding is that in many worlds you need to add some kind of rationality principle or constraint to an agent in the theory so that you get out the Born rule probabilities, either via self-locating uncertainty (as the previous comment suggested) or via a kind of decision theoretic argument. For example, here is a paper that uses an Epistemic Separability Principle to yield the required probabilities. Here is another paper that takes the more decision theoretic approach, introducing particular axioms of rationality for the many worlds context. So while I absolutely agree that there are attractive strategies for getting probability out of many worlds, they tend to involve some rationality principles/constraints, which aren't themselves supplied by the theory, and which make it look a bit more like the probability is in the map, in those cases. Though, of course, as an aspiring empiricist, I want my map to be very receptive to the territory. If there is some relevant structure in the territory that constraints my credences, in conjunction with some rationality principles, then that seems useful.
But a lot of these remarks are very in the weeds, and I am very open to changing my mind about any of them. It is a very subtle topic.
I wouldn't say that is a clear exception. There are perfectly normal, subjective probability ways to make sense of mixed strategies in game theory. For example, this paper by Aumann and Brandenburger provides epistemic conditions for Nash equilibria, that don't require objective probabilities to randomize. From their paper:
"Mixed strategies are treated not as conscious randomizations, but as conjectures, on the part of other players, as to what a player will do." (p. 1161)
In slightly more detail:
"According to [our] view, players do not randomize; each player chooses some definite action. But other players need not know which one, and the mixture represents their uncertainty, their conjecture about his choice. This is the context of our main results, which provide sufficient conditions for a probability of conjectures to constitute a Nash equilibrium." (p. 1162)
Interestingly, this paper is very motivated by embedded agency type concerns. For example, on page 1174 they write:
"Though entirely apt, use of the term “state of the world” to include the actions of the players has perhaps caused confusion. In Savage (1954), the decision maker cannot affect the state; he can only react to it. While convenient in Savage’s one person context, this is not appropriate in the interactive, many-person world under study here. Since each player must take into account the actions of the others, the actions should be included in the description of the state. Also the plain, everyday meaning of the term “state of the world” includes one’s actions: Our world is shaped by what we do. It has been objected that prescribing what a player must do at a state takes away his freedom. This is nonsensical; the player may do what he wants. It is simply that whatever he does is part of the description of the state. If he wishes to do something else, he is heartily welcome to do it, but he thereby changes the state."
In general, getting back to reflective oracles, indeed I think that is one way that one might try to provide a formalism underlying some application of game theory! And I think it is a very interesting one. But, as the Aumann and Brandenburger paper shows, there are totally normal ways to do this without fundamental chance. They have some references in their paper to other papers with this perspective, and it forms one of many motivations for the approach of epistemic game theory.
And, in general, I would resist the inference from "this kind of reasoning requires the world to be a certain way" to "the world must be a certain way".
Edit: Lightly edited for typos.
Absolutely!
I often think that we use the infinite to approximate the very large but finite, and I think that is a good way to think about the de Finetti theorem for finite sequences. In particular, every finite sequence of exchangeable random variables is equivalent to a mixture over random sampling without replacement. As the length grows, the difference between i.i.d and sampling without replacement gets smaller (in a precise sense). This paper by Diaconis and Freedman on finite de Finetti looks at the variation distance between the mixture of i.i.d. distributions and sampling without replacement in the context of finite sequences, as the length of the finite sequences grows.
I'm also aware of work that tries to recover an exact version of a de Finetti style result in the finite context. This paper by Kerns and Székely extends the notion of mixture and shows that with respect to that notion you get a de Finetti style representation for any finite sequence.
Thanks for the post. I think there are at least two ways that we could try to look for different rational-mind-patterns, in the way this post suggests. The first is to keep the underlying mathematical framework of options the same (in VNM, the set of gambles, outcomes, etc.), and looks at different patterns of preference/behaviour/valuing/etc. The second is to change the background mathematical framework in order to have some more realistic, or at the very least different idealizing assumptions. Within the different framework we can then explore also different preference structures/behaviour norms, etc. Here I will focus more on the second approach.
In particular, I want to point folks towards a decision theory framework that I think has a lot of virtues (no doubt many readers on LW will already be familiar with it). The Jeffrey-Bolker framework provides a concrete example of the kind of alternative "mathematically describable mind-patterns" that the post and clarifying comment talks about. Like the VNM framework, it proves that rational preferences can be represented as expected utility maximization (although things are subtle, as we conditional on acts as opposed to treating them like exogenous probability distributions/functions/random objects, so the mathematics is a bit different). But it does so with very different assumptions about agency, and the background conceptual space in which preference and agency operate.
I have a write-up here of some key differences between Jeffrey-Bolker and Savage (which is ~kind of~ like a subjective probability version of VNM) that I find exciting for an embedded agency point of view. Here are two quick examples. First, VNM requires a very rich domain of preference – typically preference is defined over all probability distributions over conseqeunces. Savage similarly requires that an agent have preferences defined over all functions from states to consequences. This forces agents to rank logically impossible or causally incoherent scenarios. Jeffrey-Bolker instead only requires preferences over propositions closed under logical operations, allowing agents to only evaluate scenarios they consider possible. Second, Savage style approaches require act-state independence - agents can't think their actions influence the world's state. Jeffrey-Bolker drops this, letting agents model themselves as part of the world they're reasoning about. Both differences stem from Jeffrey-Bolker's core conceptual/formal innovation: treating acts, states, and consequences as the same type of object in a unified algebra, rather than fundamentally different things.
Considering the Jeffrey-Bolker framework is valuable in two ways: first, as an alternative 'stable attractor' for minds that avoids VNM's peculiarities, and second, as a framework within which we can precisely express different decision theories like EDT and CDT. This highlights how progress on alternative models of agency requires both exploring different background frameworks for modeling rational choice AND exploring different decision rules within those frameworks. Rather than assuming minds must converge to VNM-style agency as they get smarter, we should actively investigate what shape the background decision context takes for real agents.
For more detail in general and on my take in particular, I recommend:
My writeup with Gerard about Jeffrey-Bolker (linked above as well).
The first chapter of my thesis, which gives my approach to embedded agency and my take of why Jeffrey-Bolker in particular is a very attractive decision theory.
This great writeup of different decision theory frameworks by Fishburn that gives a sense of how many different alternatives to VNM there are. Ends with a brief description of Jeffrey-Bolker, but more more detail about earlier decision frameworks.
Thank you for this. Yes, the problem is that (in some cases) we think it can sometimes be difficult to specify what the probability distribution would be without the agent. One strategy would be to define some kind of counterfactual distribution that would obtain if there were no agent, but then we need to have some principled way to get this counterfactual (which might be possible). I think this is easier in situations in which the presence of an agent/optimizer is only one possibility, in which case we have a defined probability distribution, conditional on there not being an agent. Perhaps that is all that matters (I am somewhat partial to this), but then I don't think of this as giving us a definition of an optimizing system (since, conditional on their being an optimizing system, there would cease to be an optimizing system---for a similar idea, see Vingean Agency).
I like your suggestions for connecting (1) and (3).
And thanks for the correction!
Thanks for this. We agree it’s natural to think that a stronger optimizer means less information from seeing the end state, but the question shows up again here. The general tension is that one version of thinking of optimization is something like, the optimizer has a high probability of hitting a narrow target. But the narrowness notion is often what is doing the work in making this seem intuitive, and under seemingly relevant notions of narrowness (how likely is this set of outcomes to be realized), then the set of outcomes we wanted to say is narrow is, in fact, not narrow at all. The lesson we take is that a lot of the ways we want to measure the underlying space rely on choices we make in describing the (size of the) space. If the choices reflect our uncertainty, then we get the puzzle we describe. I don't see how moving to thinking in terms of entropy would address this. Given that we are working in continuous spaces, I think one way to see that we often makes choices like this, even with entropy, is to look at continuous generalizations of entropy. When we move to the continuous case, things become more subtle. Differential entropy (the most natural generalization) lacks some of the important properties that makes entropy a useful measure of uncertainty (it can be negative, and it is not invariant under continuous coordinate transformations). You can move to relative entropy to try to fix these problems, but this depends on a choice of an underlying measure m. What we see in both these cases is that the generalizations of entropy --- both differential and relative --- rely on some choice of a way to describe the underlying space (for differential, it is the choice of coordinate system, and for relative, the underlying measure m).