Posts
Comments
My use of 'next' need not be read temporally, though it could be. You might simply want to define a transitive preference relation for the agent over {A,A+,B,B+} in order to predict what it would choose in an arbitrary static decision problem. Only the incomplete one I described works no matter what the decision problem ends up being.
As a general point, you can always look at a decision ex post and back out different ways to rationalise it. The nontrivial task is here prediction, using features of the agent.
If we want an example of sequential choice using decision trees (rather than repeated 'de novo' choice through e.g. unawareness), it'll be a bit more cumbersome but here goes.
Intuitively, suppose the agent first picks from {A,B+} and then, in addition, from {A+,B}. It ends up with two elements from {A,A+,B,B+}. Stated within the framework:
- The set of possible prospects is X = {A,A+B,B+}×{A,A+B,B+}, where elements are pairs.
- There's a tree where, at node 1, the agent picks among paths labeled A and B+.
- If A is picked, then at the next node, the agent picks from terminal prospects {(A,A+),(A,B)}. And analogously if path B+ is picked.
- The agent has appropriately separable preferences: (x,y) (x',y') iff x x'' and y y'' for some permutation (x'',y'') of (x'y'), where is a relation over components.
Then (A+,x) (A,x) while (A,x) (B,x) for any prospect component x, and so on for other comparisons. This is how separability makes it easy to say "A+ is preferred to A" even though preferences are defined over pairs in this case. I.e., we can construct over pairs out of some over components.
In this tree, the available prospects from the outset are (A,A+), (A,B), (B+,A+), (B+,B).
Using the same as before, the (dynamically) maximal ones are (A,A+), (B+,A+), (B+,B).
But what if, instead of positing incomparability between A and B+ we instead said the agent was indifferent? By transitivity, we'd infer A B and thus A+ B. But then (B+,B) wouldn't be maximal. We'd incorrectly rule out the possibility that the agent goes for (B+,B).
Yes that's right (regardless of whether it's resolute or whether it's using 'strong' maximality).
A sort of of a decision tree where the agent isn't representable as having complete preferences is the one you provide here. We can even put the dynamic aspect aside to make the point. Suppose that the agent is fact inclined to pick A+ over A, but doesn't favour or disfavour B to either one. Here's my representation: maximal choice with A+ A and B A,A+. As a result, I will correctly predict its behaviour: it'll choose something other than A.
Can I also do this with another representation, using a complete preference relation? Let's try out indifference between A+ and B. I'd indeed make the same prediction in this particular case. But suppose the agent were next to face a choice between A+, B, and B+ (where the latter is a sweetening of B). By transitivity, we know B+ A+, and so this representation would predict that B+ would be chosen for sure. But this is wrong, since in fact the agent is not inclined to favour B-type prospects over A-type prospects. In contrast, the incomplete representation doesn't make this error.
Summing up: the incomplete representation works for {A+,A,B} and {A+,B,B+} while the only complete one that also works for the former fails for the latter.
Thanks. Let me end with three comments. First, I wrote a few brief notes here that I hope clarify how Independence and IIA differ. Second, I want to stress that the problem with the use of Dutch books in the articles is a substantial one, not just a verbal one, as I explained here and here. Finally, I’m happy to hash out any remaining issues via direct message if you’d like—whether it’s about these points, others I raised in my initial comment, or any related edits.
I don't apprecaite the hostility. I aimed to be helpful in spending time documenting and explaining these errors. This is something a heathy epistemic community is appreciative of, not annoyed by. If I had added mistaken passages to Wikipedia, I'd want to be told, and I'd react by reversing them myself. If any points I mentioned weren't added by you, then as I wrote in my first comment:
...let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.
The point of writing about the mistakes here is to make clear why they indeed are mistakes, so that they aren't repeated. That has value. And although I don't think we should encourage a norm that those who observe and report a problem are responsible for fixing it, I will try to find and fix at least the pre-existing errors.
I agree that there exists the dutch book theorem, and that that one importantly relates to probabilism
I'm glad we could converge on this, because that's what I really wanted to convey.[1] I hope it's clearer now why I included these as important errors:
- The statement that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems” is false since these theorems only relate to belief norms like probabilism. Changing this to 'money pump arguments' would fix it.
- There's a claim on the main Dutch book page that the arguments demonstrate that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” I wouldn't have said it was false if this was about money pumps.[2] I would've said there was a terminological issue if the page equated Dutch books and money pumps. But it didn't.[3] It defined a Dutch book as "a set of bets that ensures a guaranteed loss." And the theorems and arguments relating to that do not support the vNM axioms.
Would you agree?
- ^
The issue of which terms to use isn't that important to me in this case, but let me speculate about something. If you hear domain experts go back and forth between 'Dutch books' and 'money pumps', I think that is likely either because they are thinking of the former as a special case of the latter without saying so explicitly, or because they're listing off various related ideas. If that's not why, then they may just be mistaken. After all, a Dutch book is named that way because a bookie is involved!
- ^
Setting asside that "demonstrates" is too strong even then.
- ^
It looks like OP edited the page just today and added 'or money pump'. But the text that follows still describes a Dutch book, i.e. a set of bets. (Other things were added too that I find problematic but this footnote isn't the place to explain it.)
I think it'll be helpful to look at the object level. One argument says: if your beliefs aren't probabilistic but you bet in a way that resembles expected utility, then you're succeptible to sure loss. This forms an argument for probabilism.[1]
Another argument says: if your preferences don't satisfy certain axioms but satisfy some other conditions, then there's a sequence of choices that will leave you worse off than you started. This forms an agument for norms on preferences.
These are distinct.
These two different kinds of arguments have things in common. But they are not the same argument applied in different settings. They have different assumptions, and different conclusions. One is typically called a Dutch book argument; the other a money pump argument. The former is sometimes referred to as a special case of the latter.[2] But whatever our naming convensions, it's a special case that doesn't support the vNM axioms.
Here's why this matters. You might read assumptions of the Dutch book theorem, and find them compelling. Then you read a article telling you that this implies the vNM axioms (or constitutes an argument for them). If you believe it, you've been duped.
- ^
(More generally, Dutch books exist to support other Bayesian norms like conditionalisation.)
- ^
This distinction is standard and blurring the lines leads to confusions. It's unfortunate when dictionaries, references, or people make mistakes. More reliable would be a key book on money pumps (Gustafsson 2022) referring to a key book on Dutch books (Pettigrew 2020):
"There are also money-pump arguments for other requirements of rationality. Notably, there are money-pump arguments that rational credences satisfy the laws of probability. (See Ramsey 1931, p. 182.) These arguments are known as Dutch-book arguments. (See Lehman 1955, p. 251.) For an overview, see Pettigrew 2020." [Footnote 9.]
check the edit history yourself by just clicking on the "View History" button and then pressing the "cur" button
Great, thanks!
I hate to single out OP but those three points were added by someone with the same username (see first and second points here; third here). Those might not be entirely new but I think my original note of caution stands.
Scott Garrabrant rejects the Independence of Irrelevant Alternatives axiom
*Independence, not IIA. Wikipedia is wrong (as of today).
I appreciate the intention here but I think it would need to be done with considerable care, as I fear it may have already led to accidental vandalism of the epistemic commons. Just skimming a few of these Wikipedia pages, I’ve noticed several new errors. These can be easily spotted by domain experts but might not be obvious to casual readers.[1] I can’t know exactly which of these are due to edits from this community, but some very clearly jump out.[2]
I’ll list some examples below, but I want to stress that this list is not exhaustive. I didn’t read most parts of most related pages, and I omitted many small scattered issues. In any case, I’d like to ask whoever made any of these edits to please reverse them, and to triple check any I didn’t mention below.[3] Please feel free to respond to this if any of my points are unclear![4]
False statements
The page on Independence of Irrelevant Alternatives (IIA) claims that IIA is one of the vNM axioms, and that one of the vNM axioms “generalizes IIA to random events.”
Both are false. The similar-sounding Independence axiom of vNM is neither equivalent to, nor does it entail, IIA (and so it can’t be a generalisation). You can satisfy Independence while violating IIA. This is a not a technicality; it’s a conflation of distinct and important concepts. This is repeated in several places.
- The mathematical statement of Independence there is wrong. In the section conflating IIA and Independence, it’s defined as the requirement that
for any and any outcomes Bad, Good, and N satisfying Bad≺Good. This mistakes weak preference for strict preference. To see this, set p=1 and observe that the line now reads N≺N. (The rest of the explanation in this section is also problematic but the reasons for this are less easy to briefly spell out.) The Dutch book page states that the argument demonstrates that “rationality requires assigning probabilities to events [...] and having preferences that can be modeled using the von Neumann–Morgenstern axioms.” This is false. It is an argument for probabilistic beliefs; it implies nothing at all about preferences. And in fact, the standard proof of the Dutch book theorem assumes something like expected utility (Ramsey’s thesis).
This is a substantial error, making a very strong claim about an important topic. And it's repeated elsewhere, e.g. when stating that the vNM axioms “apart from continuity, are often justified using the Dutch book theorems.”
- The section ‘The theorem’ on the vNM page states the result using strict preference/inequality. This is a corollary of the theorem but does not entail it.
Misleading statements
- The decision theory page states that it’s “a branch of applied probability theory and analytic philosophy concerned with the theory of making decisions based on assigning probabilities to various factors and assigning numerical consequences to the outcome.” This is a poor description. Decision theorists don’t simply assume this, nor do they always conclude it—e.g. see work on ambiguity or lexicographic preferences. And besides this, decision theory is arguably more central in economics than the fields mentioned.
- The IIA article’s first sentence states that IIA is an “axiom of decision theory and economics” whereas it’s classically one of social choice theory, in particular voting. This is at least a strange omission for the context-setting sentence of the article.
- It’s stated that IIA describes “a necessary condition for rational behavior.” Maybe the individual-choice version of IIA is, but the intention here was presumably to refer to Independence. This would be a highly contentious claim though, and definitely not a formal result. It’s misleading to describe Independence as necessary for rationality.
- The vNM article states that obeying the vNM axioms implies that agents “behave as if they are maximizing the expected value of some function defined over the potential outcomes at some specified point in the future.” I’m not sure what ‘specified point in the future’ is doing there; that’s not within the framework.
- The vNM article states that “the theorem assumes nothing about the nature of the possible outcomes of the gambles.” That’s at least misleading. It assumes all possible outcomes are known, that they come with associated probabilities, and that these probabilities are fixed (e.g., ruling out the Newcomb paradox).
Besides these problems, various passages in these articles and others are unclear, lack crucial context, contain minor issues, or just look prone to leave readers with a confused impression of the topic. (This would take a while to unpack, so my many omissions should absolutely not be interpreted as green lights.) As OP wrote: these pages are a mess. But I fear the recent edits have contributed to some of this.
So, as of now, I’d strongly recommend against reading Wikipedia for these sorts of topics—even for a casual glance. A great alternative is the Stanford Encyclopedia of Philosophy, which covers most of these topics.
- ^
I checked this with others in economics and in philosophy.
- ^
E.g., the term ‘coherence theorems’ is unheard of outside of LessWrong, as is the frequency of italicisation present in some of these articles.
- ^
I would do it myself but I don’t know what the original articles said and I’d rather not have to learn the Wikipedia guidelines and re-write the various sections from scratch.
- ^
Or to let me know that some of the issues I mention were already on Wikipedia beforehand. I’d be happy to try to edit those.
Two nitpicks and a reference:
an agent’s goals might not be linearly decomposable over possible worlds due to risk-aversion
Risk aversion doesn't violate additive separability. E.g., for we always get whether (risk neutrality) or (risk aversion). Though some alternatives to expected utility, like Buchak's REU theory, can allow certain sources of risk aversion to violate separability.
when features have fixed marginal utility, rather than being substitutes
Perfect substitutes have fixed marginal utility. E.g., always has marginal utilities of 1 and 2.
I'll focus on linearly decomposable goals which can be evaluated by adding together evaluations of many separate subcomponents. More decomposable goals are simpler
There's an old literature on separability in consumer theory that's since been tied to bounded rationality. One move that's made is to grant weak separability accross goups of objects---features---to rationalise the behaviour of optimising accross groups first, and within groups second. Pretnar et al (2021) describe how this can arise from limited cognitive resources.
It may be worth thinking about why proponents of a very popular idea in this community don't know of its academic analogues, despite them having existed since the early 90s[1] and appearing on the introductory SEP page for dynamic choice.
Academics may in turn ask: clearly LessWrong has some blind spots, but how big?
I argued that the signal-theoretic[1] analysis of meaning (which is the most common Bayesian analysis of communication) fails to adequately define lying, and fails to offer any distinction between denotation and connotation or literal content vs conversational implicature.
In case you haven't come accross this, here are two papers on lying by the founders of the modern economics literature on communication. I've only skimmed your discussion but if this is relevant, here's a great non-technical discussion of lying in that framework. A common thread in these discussions is that the apparent "no-lying" implication of the analysis of language in the Lewis-Skyrms/Crawford-Sobel signalling tradition relies importantly on common knowledge of rationality and, implicitly, on common knowledge of the game being played, i.e. of the available actions and all the players' preferences.
In your example, DSM permits the agent to end up with either A+ or B. Neither is strictly dominated, and neither has become mandatory for the agent to choose over the other. The agent won't have reason to push probability mass from one towards the other.
You can think of me as trying to run an obvious-to-me assertion test on code which I haven't carefully inspected, to see if the result of the test looks sane.
This is reasonable but I think my response to your comment will mainly involve re-stating what I wrote in the post, so maybe it'll be easier to point to the relevant sections: 3.1. for what DSM mandates when the agent has beliefs about its decision tree, 3.2.2 for what DSM mandates when the agent hadn't considered an actualised continuation of its decision tree, and 3.3. for discussion of these results. In particular, the following paragraphs are meant to illustrate what DSM mandates in the least favourable epistemic state that the agent could be in (unawareness with new options appearing):
It seems we can’t guarantee non-trammelling in general and between all prospects. But we don’t need to guarantee this for all prospects to guarantee it for some, even under awareness growth. Indeed, as we’ve now shown, there are always prospects with respect to which the agent never gets trammelled, no matter how many choices it faces. In fact, whenever the tree expansion does not bring about new prospects, trammelling will never occur (Proposition 7). And even when it does, trammelling is bounded above by the number of comparability classes (Proposition 10).
And it’s intuitive why this would be: we’re simply picking out the best prospects in each class. For instance, suppose prospects were representable as pairs that are comparable iff the -values are the same, and then preferred to the extent that is large. Then here’s the process: for each value of , identify the options that maximise . Put all of these in a set. Then choice between any options in that set will always remain arbitrary; never trammelled.
The key question is whether the revealed preferences are immune to trammelling. This was a major point of confusion for me in discussion with Sami - his proposal involves a set of preferences passed into a decision rule, but those “preferences” are (potentially) different from the revealed preferences. (I'm still unsure whether Sami's proposal solves the problem.)
I claim that, yes, the revealed preferences in this sense are immune to trammeling. I'm happy to continue the existing discussion thread but here's a short motivation: what my results about trammelling show is that there will always be multiple (relevant) options between which the agent lacks a preference and the DSM choice rule does not mandate picking one over another. The agent will not try to push probability mass toward one of those options over another.
(I learned from Sami’s post that this is called “trammelling” of incomplete preferences.)
Just for reference: this isn't a standard term of art; I made it up. Though I do think it's fitting.
Great, I think bits of this comment help me understand what you're pointing to.
the desired behavior implies a revealed preference gap
I think this is roughly right, together with all the caveats about the exact statements of Thornley's impossibility theorems. Speaking precisely here will be cumbersome so for the sake of clarity I'll try to restate what you wrote like this:
- Useful agents satisfying completeness and other properties X won't be shutdownable.
- Properties X are necessary for an agent to be useful.
- So, useful agents satisfying completeness won't be shutdownable.
- So, if a useful agent is shutdownable, its preferences are incomplete.
This argument would let us say that observing usefulness and shutdownability reveals a preferential gap.
I think the question I'm interested in is: "do trammelling-style issues imply that DSM agents will not have a revealed preference gap (under reasonable assumptions about their environment and capabilities)?"
A quick distinction: an agent can (i) reveal p, (ii) reveal ¬p, or (iii) neither reveal p nor ¬p. The problem of underdetermination of preference is of the third form.
We can think of some of the properties we've discussed as 'tests' of incomparability, which might or might not reveal preferential gaps. The test in the argument just above is whether the agent is useful and shutdownable. The test I use for my results above (roughly) is 'arbitrary choice'. The reason I use that test is that my results are self-contained; I don't make use of Thornley's various requirements for shutdownability. Of course, arbitrary choice isn't what we want for shutdownability. It's just a test for incomparability that I used for an agent that isn't yet endowed with Thornley's other requirements.
The trammelling results, though, don't give me any reason to think that DSM is problematic for shutdownability. I haven't formally characterised an agent satisfying DSM as well as TND, Stochastic Near-Dominance, and so on, so I can't yet give a definitive or exact answer to how DSM affects the behaviour of a Thornley-style agent. (This is something I'll be working on.) But regarding trammelling, I think my results are reasons for optimism if anything. Even in the least convenient case that I looked at—awareness growth—I wrote this in section 3.3. as an intuition pump:
we’re simply picking out the best prospects in each class. For instance, suppose prospects were representable as pairs that are comparable iff the -values are the same, and then preferred to the extent that is large. Then here’s the process: for each value of , identify the options that maximise . Put all of these in a set. Then choice between any options in that set will always remain arbitrary; never trammelled.
That is, we retain the preferential gap between the options we want a preferential gap between.
[As an aside, the description in your first paragraph of what we want from a shutdownable agent doesn't quite match Thornley's setup; the relevant part to see this is section 10.1. here.]
On my understanding, the argument isn’t that your DSM agent can be made better off, but that the reason it can’t be made better off is because it is engaging in trammeling/“collusion”, and that the form of “trammeling” you’ve ruled out isn’t the useful kind.
I don't see how this could be right. Consider the bounding results on trammelling under unawareness (e.g. Proposition 10). They show that there will always be a set of options between which DSM does not require choosing one over the other. Suppose these are X and Y. The agent will always be able to choose either one. They might end up always choosing X, always Y, switching back and forth, whatever. This doesn't look like the outcome of two subagents, one preferring X and the other Y, negotiating to get some portion of the picks.
As far as an example goes, consider a sequence of actions which, starting from an unpressed world state, routes through a pressed world state (or series of pressed world states), before eventually returning to an unpressed world state with higher utility than the initial state.
Forgive me; I'm still not seeing it. For coming up with examples, I think for now it's unhelpful to use the shutdown problem, because the actual proposal from Thornley includes several more requirements. I think it's perfectly fine to construct examples about trammelling and subagents using something like this: A is a set of options with typical member . These are all comparable and ranked according to their subscripts. That is, is preferred to , and so on. Likewise with set B. And all options in A are incomparable to all options in B.
If your proposed DSM agent passes up this action sequence on the grounds that some of the intermediate steps need to bridge between “incomparable” pressed/unpressed trajectories, then it does in fact pass up the certain gain. Conversely, if it doesn’t pass up such a sequence, then its behavior is the same as that of a set of negotiating subagents cooperating in order to form a larger macroagent.
This looks to me like a misunderstanding that I tried to explain in section 3.1. Let me know if not, though, ideally with a worked-out example of the form: "here's the decision tree(s), here's what DSM mandates, here's why it's untrammelled according to the OP definition, and here's why it's problematic."
That makes sense, yeah.
Let me first make some comments about revealed preferences that might clarify how I'm seeing this. Preferences are famously underdetermined by limited choice behaviour. If A and B are available and I pick A, you can't infer that I like A more than B — I might be indifferent or unable to compare them. Worse, under uncertainty, you can't tell why I chose some lottery over another even if you assume I have strict preferences between all options — the lottery I choose depends on my beliefs too. In expected utility theory, beliefs and preferences together induce choice, so if we only observe a choice, we have one equation in two unknowns.[1] Given my choice, you'd need to read my mind's probabilities to be able to infer my preferences (and vice versa).[2]
In that sense, preferences (mostly) aren't actually revealed. Economists often assume various things to apply revealed preference theory, e.g. setting beliefs equal to 'objective chances', or assuming a certain functional form for the utility function.
But why do we care about preferences per se, rather than what's revealed? Because we want to predict future behaviour. If you can't infer my preferences from my choices, you can't predict my future choices. In the example above, if my 'revealed preference' between A and B is that I prefer A, then you might make false predictions about my future behaviour (because I might well choose B next time).
Let me know if I'm on the right track for clarifying things. If I am, could you say how you see trammelling/shutdown connecting to revealed preferences as described here, and I'll respond to that?
I disagree; see my reply to John above.
if the subagents representing a set of incomplete preferences would trade with each other to emulate more complete preferences, then an agent with the plain set of incomplete preferences would precommit to act in the same way
My results above on invulnerability preclude the possibility that the agent can predictably be made better off by its own lights through an alternative sequence of actions. So I don't think that's possible, though I may be misreading you. Could you give an example of a precommitment that the agent would take? In my mind, an example of this would have to show that the agent (not the negotiating subagents) strictly prefers the commitment to what it otherwise would've done according to DSM etc.
Yeah, I wasn't using Bradley. The full set of coherent completions is overkill, we just need to nail down the partial order.
I agree the full set won't always be needed, at least when we're just after ordinal preferences, though I personally don't have a clear picture of when exactly that holds.
On John's-simplified-model-of-Thornley's-proposal, we have complete preference orderings over trajectories-in-which-the-button-isn't-pressed and trajectories-in-which-the-button-is-pressed, separately, but no preference between any button-pressed and button-not-pressed trajectory pair.
For the purposes of this discussion, this is right. I don't think the differences between this description and the actual proposal matter in this case.
Represented as subagents, those incomplete preferences require two subagents:
- One subagent always prefers button pressed to unpressed, is indifferent between unpressed trajectories, and has the original complete order on pressed trajectories.
- The other subagent always prefers button unpressed to pressed, is indifferent between pressed trajectories, and has the original complete order on unpressed trajectories.
I don't think this representation is quite right, although not for a reason I expect to matter for this discussion. It's a technicality but I'll mention it for completeness. If we're using Bradley's representation theorem from section 2.1., the set of subagents must include every coherent completion of the agent's preferences. E.g., suppose there are three possible trajectories. Let denote a pressed trajectory and two unpressed trajectories, where gets you strictly more coins than . Then there'll be five (ordinal) subagents, described in order of preference: , , , , and .
But you and Thornley want to interpret "no preference" as "just choose arbitrarily/randomly" rather than "don't trade either for the other", so... this agent just chooses arbitrarily/randomly between the best-available pressed-option and the best-available unpressed-option? But that would imply that the agent is choosing (albeit arbitrarily/randomly) between button-pressed and button-unpressed, which is not what we want, so presumably you're imagining something else?
Indeed, this wouldn't be good, and isn't what Thornley's proposal does. The agent doesn't choose arbitrarily between the best pressed vs unpressed options. Thornley's proposal adds more requirements on the agent to ensure this. My use of 'arbitrary' in the post is a bit misleading in that context. I'm only using it to identify when the agent has multiple permissible options available, which is what we're after to get TND. If no other requirements are added to the agent, and it's acting under certainty, this could well lead it to actually choose arbitrarily. But it doesn't have to in general, and under uncertainty and together with the rest of Thornley's requirements, it doesn't. (The requirements are described in his proposal.)
With that flagged, on to the trammelling issue. The potential problem is that the two subagents might want to trade, so that the system sometimes tries to make the button be pressed (in cases which get relatively high pressed-utility), and sometimes tries to make the button be unpressed (in cases which get relatively high unpressed-utility). (E.g. the pink arrows in the diagram.) And... locking in a trajectory at the start doesn't seem to help that problem at all? Like, the subagents do that trading in logical time (i.e. time zero), that adds preferences, and then sometimes they lock in a plan which involves manipulating the button.
I'll first flag that the results don't rely on subagents. Creating a group agent out of multiple subagents is possibly an interesting way to create an agent representable as having incomplete preferences, but this isn't the same as creating a single agent whose single preference relation happens not to satisfy completeness.
That said, I will spend some more time thinking about the subagent idea, and I do think collusion between them seems like the major initial hurdle for this approach to creating an agent with preferential gaps.
Why does wanting to maintain indifference to shifting probability mass between (some) trajectories, imply that we care about ex-ante permissibility?
The ex-ante permissible trajectories are the trajectories that the agent lacks any strict preference between. Suppose the permissible trajectories are {A,B,C}. Then, from the agent's perspective, A isn't better than B, B isn't better than A, and so on. The agent considers them all equally choiceworthy. So, the agent doesn't mind picking any one of them over any other, nor therefore switching from one lottery over them with some distribution to another lottery with some other distribution. The agent doesn't care whether it gets A versus B, versus an even chance of A or B, versus a one-third chance of A, B, or C.[1]
Suppose we didn't have multiple permissible options ex-ante. For example, if only A was permissible, then the agent would dislike shifting probability mass away from A and towards B or C—because B and C aren't among the best options.[2] So that's why we want multiple ex-ante permissible trajectories: it's the only way to maintain indifference to shifting probability mass between (those) trajectories.
[I'll respond to the stuff in your second paragraph under your longer comment.]
- ^
The analogous case with complete preferences is clearer: if there are multiple permissible options, the agent must be indifferent between them all (or else the agent would be fine picking a strictly dominated option). So if options are permissible, then . Assuming expected utility theory, we'll then of course have for any probability functions . This means the agent is indifferent to shifting probability mass between the permissible options.
- ^
This is a bit simplified but it should get the point across.
This is a tricky topic to think about because it's not obvious how trammelling could be a worry for Thornley's Incomplete Preference Proposal. I think the most important thing to clarify is why care about ex-ante permissibility. I'll try to describe that first (this should help with my responses to downstream concerns).
Big picture
Getting terminology out of the way: words like "permissibility" and "mandatory" are shorthand for rankings of prospects. A prospect is permissible iff it's in a choice set, e.g. by satisfying DSM. It's mandatory iff it's the sole element of a choice set.
To see why ex-ante permissibility matters, note that it's essentially a test to see which prospects the agent is either indifferent between or has a preferential gap between (and are not ranked below anything else). When you can improve a permissible prospect along some dimension and yet retain the same set of permissible prospects, for example, you necessarily have a preferential gap between those remaining prospects. In short, ex-ante permissibility tells you which prospects the agent doesn't mind picking between.
The part of the Incomplete Preference Proposal that carries much of the weight is the Timestep Near-Dominance (TND) principle for choice under uncertainty. One thing it does, roughly, is require that the agent does not mind shifting probability mass between trajectories in which the shutdown time differs. And this is where incompleteness comes in. You need preferential gaps between trajectories that differ in shutdown time for this to hold in general. If the agent had complete preferences over trajectories, it would have strict preferences between at least some trajectories that differ in shutdown time, giving it reason to shift probability mass by manipulating the button.
Why TND helps get you shutdownability is described in Thornley's proposal, so I'll refer to his description and take that as a given here. So, roughly, we're using TND to get shutdownability, and we're using incompleteness to get TND. The reason incompleteness helps is that we want to maintain indifference to shifting probability mass between certain trajectories. And that is why we care about ex-ante permissibility. We need the agent, when contemplating manipulating the button, not to want to shift probability mass in that direction. That'll help give us TND. The rest of Thornley's proposal includes further conditions on the agent such that it will in fact, ex-post, not manipulate the button. But the reason for the focus on ex-ante permissibility here is TND.
Miscellany
For purposes of e.g. the shutdown problem, or corrigibility more generally, I don't think I care about the difference between "mandatory" vs "actually chosen"?
The description above should help clear up why we care about multiple options being permissible and none mandatory: to help satisfy TND. What's "actually chosen" in my framework doesn't neatly connect to the Thornley proposal since he adds extra scaffolding to the agent to determine how it should act. But that's a separate issue.
The rough mental model I have of DSM is: at time zero, the agent somehow picks between a bunch of different candidate plans (all of which are "permissible", whatever that means), and from then on it will behave-as-though it has complete preferences consistent with that plan.
...
it sounds like the proposal in the post just frontloads all the trammelling - i.e. it happens immediately at timestep zero.
The notion of trammelling I'm using refers to the set of permissible options shrinking as a result of repeated choice. And I argued that there's no trammelling under certainty or uncertainty, and that trammelling under unawareness is bounded. Here's why I don't think you can see it as the agent behaving as if its preferences were complete.
Consider the case of static choice. It's meaningful to say that an agent has incomplete preferences. (I don't think you disagree with that but just for the sake of completeness, I'll give an example.) Suppose the agent has preferential gaps between all different-letter prospects. From {A,A+,B} the agent will pick either A+ or B. Suppose it picks B. That doesn't imply, say, that the agent can be thought of as having a strict preference for B over A+. After all, if you offered it {A,A+,B} once again, it might just pick A+, a contradiction. And you can set up something similar with transitivity to get a contradiction from inferring indifference between A+ and B.
Onto dynamic choice. As you write, it's reasonable to think of various dynamic choice principles as immediately, statically, choosing a trajectory at timestep zero. Suppose we do that. Then by the argument just above, it's still not appropriate to model the agent as having complete preferences at the time of choosing. We're not frontloading any trammelling; the set of ex-ante permissible prospects hasn't changed. And that's what we care about for TND.
I can kinda vaguely pattern-match the setup in this post to the problem: I want to have one "permissible" choice which involves the shutdown button not being pressed, and another "permissible" choice which involves the button being pressed, and I want these two choices to be incomparable to the agent. Now (my mental model of) the DSM rule says: when the agent is turned on, it somehow chooses between (two plans leading to) those two options, and from then on out acts as though it has complete preferences consistent with the choice - i.e. it either (follows a plan which) makes sure the button is pressed, or (follows a plan which) makes sure the button is not pressed, and actively prevents operators from changing it. Which sounds like not-at-all what I wanted for the shutdown problem!
Agreed! The ex-ante permissibility of various options is not sufficient for shutdownability. The rest of Thornley's proposal outlines how the agent has to pick (lotteries over) trajectories, which involves more than TND.
Thanks Sylvester; fixed!
Thanks for saying!
This is an interesting topic. Regarding the discussion you mention, I think my results might help illustrate Elliott Thornley's point. John Wentworth wrote:
That makes me think that the small decision trees implicitly contain a lot of assumptions that various trades have zero probability of happening, which is load-bearing for your counterexamples. In a larger world, with a lot more opportunities to trade between various things, I'd expect that sort of issue to be much less relevant.
My results made no assumptions about the size or complexity of the decision trees, so I don't think this itself is a reason to doubt my conclusion. More generally, if there exists some Bayesian decision tree that faithfully represents an agent's decision problem, and the agent uses the appropriate decision principles with respect to that tree, then my results apply. The existence of such a representation is not hindered by the number of choices, the number of options, or the subjective probability distributions involved.
I think my results under unawareness (section 3) are particularly likely to be applicable to complex real-world decision problems. The agent can be entirely wrong about their actual decision tree—e.g., falsely assigning probability zero to events that will occur—and yet appropriate opportunism remains and trammelling is bounded. This is because any suboptimal decision by an agent in these kinds of cases is a product of its epistemic state; not its preferences. Whether the agent's preferences are complete or not, it will make wrong turns in the same class of situations. The globally-DSM choice function will guarantee that the agent couldn't have done better given its knowledge and values, even if the agent's model of the world is wrong.
Good question. They implicitly assume a dynamic choice principle and a choice function that leaves the agent non-opportunistic.
- Their dynamic choice principle is something like myopia: the agent only looks at their node's immediate successors and, if a successor is yet another choice node, the agent represents it as some 'default' prospect.
- Their choice rule is something like this: the agent assigns some natural 'default' prospect and deviates from it iff it prefers some other prospect. (So if some prospect is incomparable to the default, it's never chosen.)
These aren't the only approaches an agent can employ, and that's where it fails. It's wrong to conclude that "non-dominated strategy implies utility maximization" since we know from section 2 that we can achieve non-domination without completeness—by using a different dynamic choice principle and choice function.
I take certainty to be a special case of uncertainty. Regarding proof, the relevant bit is here:
This argument does not apply when the agent is unaware of the structure of its decision tree, so I provide some formal results for these cases which bound the extent to which preferences can de facto be completed. ... These results apply naturally to cases in which agents are unaware of the state space, but readers sceptical of the earlier conceptual argument can re-purpose them to make analogous claims in standard cases of certainty and uncertainty.
No, the codomain of gamma is the set of (distributions over) consequences.
Hammond's notation is inspired by the Savage framework in which states and consequences are distinct. Savage thinks of a consequence as the result of behaviour or action in some state, though this isn't so intuitively applicable in the case of decision trees. I included it for completeness but I don't use the gamma function explicitly anywhere.
It's the set of elementary states.
So is an event (a subset of elementary states, ).
E.g., we could have be all the possible worlds; be the possible worlds in which featherless bipeds evolved; and be our actual world.
doesn’t Bostrom’s model of “naive unilateralists” by definition preclude updating on the behavior of other group members?
Yeah, this is right; it's what I tried to clarify in the second paragraph.
isn’t updating on the beliefs of others (as signaled by their behavior) an example of adopting a version of the “principle of conformity” that he endorses as a solution to the curse? If so, it seems like you are framing a proof of Bostrom’s point as a rebuttal to it.
The introduction of the post tries to explain how this post relates to Bostrom et al's paper (e.g., I'm not rebutting Bostrom et al). But I'll say some more here.
You're broadly right on the principle of conformity. The paper suggests a few ways to implement it, one of which is being rational. But they don't go so far as to endorse this because they consider it mostly unrealistic. I tried to point to some reasons it might not be. Bostrom et al are sceptical because (i) identical priors are assumed and (ii) it would be surprising for humans to be this thoughtful anyway. The derivation above should help motivate why identical priors are sufficient but not necessary for the main upshot, and what I included in the conclusion suggests that many humans—or at least some firms—actually do the rational thing by default.
But the main point of the post is to do what I explained in the introduction: correct misconceptions and clarify. My experience of informal discussions of the curse suggests people think of it as a flaw of collective action that applies to agents simpliciter, and I wanted to flesh out this mistake. I think the formal framework I used is better at capturing the relevant intuition than the one used in Bostrom et al.
probabilities should correspond to expected observations and expected observations only
FWIW I think this is wrong. There's a perfectly coherent framework—subjective expected utility theory (Jeffrey, Joyce, etc)—in which probabilities can correspond to many other things. Probabilities as credences can correspond to confidence in propositions unrelated to future observations, e.g., philosophical beliefs or practically-unobservable facts. You can unambiguously assign probabilities to 'cosmopsychism' and 'Everett's many-worlds interpretation' without expecting to ever observe their truth or falsity.
However, there is another source of uncertainty: observational uncertainty. The other person might be uncertain whether they have all the facts that feed into their model, or whether their observations are correct.
This is reasonable. If a deterministic model has three free parameters, two of which you have specificied, you could just use your prior over the third parameter to create a distribution of model outcomes. This kind of situation should be pretty easy to clarify though, by saying something like "my model predicts event E iff parameter A is above A*" and "my prior P(A>A*) is 50% which implies P(E)=50%."
But generically, the distribution is not coming from a model. It just looks like your all things considered credence that A>A*. I'd be hesitant calling a probability based on it your "inside view/model" probability.
These are great. Though Sleeping Mary can tell that she's colourblind on any account of consciousness. Whether or not she learns a phenomenal fact when going from 'colourblind scientist' to 'scientist who sees colour', she does learn the propositional fact that she isn't colourblind.
So, if she sees no colour, she ought to believe that the outcome of the coin toss is Tails. If she does see colour, both SSA and SIA say P(Heads)=1/2.