Uncertainty in all its flavours
post by Cleo Nardo (strawberry calm) · 2024-01-09T16:21:07.915Z · LW · GW · 6 commentsContents
Introduction The correspondence explicitly. What's a flavour of uncertainty? What's a (commutative) monad? How do they correspond to each other? Examples of Myers' correspondence 1 - nonempty powerset monad 2 - distribution monad 3 — reader monad from H 4 — writer monad to [0,1] 5 — identity monad 6 — maybe monad 7 — K-distribution monad 8 — quantum monad 9 — smooth state monad 10 — continuation monad 11 — signature monad 12 — algebraic theory 13 — convex powerset of distributions monad 14 — free convex lattice monad 15 — infrabayesianism Implications for AI safety Further questions None 6 comments
Acknowledgements:
This research began during the SERI MATS program, under the joint mentorship of John Wentworth, Nicholas Kees, and Janus. Thanks also to Davidad, Jack Sagar, and David Jaz Myers for discussion.
Abstract:
I think that there is a uniform correspondence between flavours of uncertainty and monads taking state-spaces to belief-state-spaces, for different characterisation of belief. In this essay, I describe this correspondence explicitly and list 15 diverse and well-motivated examples. I explore some applications to model-building and agent foundations. Along the way, I characterise infrabayesianism uncertainty as the minimal way to encompass possibilistic uncertainty, probabilistic uncertainty, and reward.
No prerequisites are required beyond a high-school familiarity with sets, functions, real numbers, etc. Feedback welcome.
Introduction
Suppose I'm facing the following problem. There's an upcoming election between candidates, and you're uncertain who will win. How can I model both your belief about the election and the election itself in a coherent way? By "belief" here, I mean your epistemic attitude, your internal model, your opinion, judgement, prediction, etc, etc. Think map-territory distinction [? · GW]: the election is the territory, your belief is the map, and I need to model both the map and the territory coherently despite the fact that the map and the territory are (typically speaking) two completely different types of thing.
Well, to model the election itself, I'll use a set with an element for each electoral candidate. To represent your belief about the election, I must find another set with an element for each belief that you might have about the election. I'll call the state space and the belief-state space. A solution to our problem is given by a mathematical operator sending each state-space to the matching belief-state space .
One may feel prompted to ask: does any operator suffice here? Can the belief-state space be anything whatsoever, or must it carry some extra structure, possibly satisfying some additional constraints? Or, stated more philosophically, can any territory serve as a map for any other? I say no. Roughly speaking, the operator must be a so-called monad, which will be the central object of this essay. But more on that later.
The first thing to note is that the appropriate operator will depend on how exactly I wish to characterise a "belief" about the election, and there are multiple options here. For example, I might choose to characterise your belief by the set of candidates that you think have a possibility of winning. In this case, , denoting the set of non-empty subsets of . Alternatively, I might choose to characterise your belief by the likelihood that you give each candidate. In this case, , denoting the set of finite-support probability distributions over , i.e. functions such that is finite and .
In the first option, I'm characterising your belief-state by your possibilistic uncertainty, often encountered in doxastic or epistemic logic. In the second option, I'm characterising your belief-state by your probabilistic uncertainty, which is a finer-grained characterisation of belief because it differentiates between e.g. thinking a coin is fair and thinking a coin is slightly biased.
The second option has its merits. Indeed, many readers will instinctively reach for as soon as they hear the word "uncertainty", and this instinct would serve them well. There's been a fruitful enterprise (in philosophy, mathematics, computer science, linguistics, etc) of replacing possibilistic uncertainty with probabilistic uncertainty in any model or concept where one finds it. But I want to note that both and would count as a solution to the problem. I'll return to these two examples throughout this essay because they are the flavours of uncertainty which will be most familiar to the reader.
Flavour of uncertainty | Monad |
---|---|
Possibilistic | Nonempty-powerset monad |
Probabilistic | Distribution monad |
As we will see, these two operators, and , are both monads. The central claim of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. By "flavour of uncertainty" I mean a particular way of characterising someone's potentially uncertain belief about something. Possibilistic and probabilistic are paradigm cases, but in this essay we'll meet fifteen examples.
The forward-implication of this claim, that every flavour of uncertainty is a monad, is perhaps uncontroversial in some circles.[1] The backwards-implication, that every monad is a flavour of uncertainty, is worthy of more scepticism.
In this essay —
- I will describe the correspondence explicitly.
- I'll present a step-by-step method for formalising different flavours of uncertainty using monads.
- I'll list fifteen examples of the correspondence, which I hope the reader finds well-motivated.
- Finally, I'll discuss the relevance to agent foundations, with reference to infrabayesianism in particular.
Don't worry if you don't yet know what monads are. By the end of this essay you'll understand them as well as I do, which is enough to nod along when you hear "monad this" and "monad that".
The correspondence explicitly.
What's a flavour of uncertainty?
Recall from the introduction that I'm tasked with representing or modelling both the election itself and your belief about the election. The first step of this task is to settle on a particular flavour of uncertainty to characterise the belief-states — possibilistic, probabilistic, infrabayesian, etc. One might ask, of this flavour of uncertainty, the following four questions —
- Count?
What's counts as a distinct belief about the election? Concretely, if there are electoral candidates then how many distinct belief-states are there? - Certainty?
If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief-state? - Collapse?
Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself? - Combine?
Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
These four questions — Count? Certainty? Collapse? Combine? — are essentially epistemological questions, and they collectively pin down what I mean by a flavour of uncertainty.[2] As we will see, a monad corresponds to answers to the first three questions and a commutative monad corresponds to answers to all four questions.
Exercise 1: How would you answer these questions for possibilistic uncertainty? Or for probabilistic uncertainty?
Exercise 2: As I mentioned before, an answer to Count? is a set for each set . What about for Certainty? Collapse? and Combine?
What's a (commutative) monad?
Monads were born of category theory — a field of mathematics which many regard as arcane, mystical, or downright kabbalistic — but monads can (I think) be understood by someone lacking any acquaintance with category theory whatsoever. Indeed, my claim in this essay is that monads correspond exactly to Map-Territory-like relations, and such relations will be familiar to anyone who's both got a brain and pondered this predicament.
I'll first write down the mathematical definition of a monad, and then I'll explain how this definition mirrors the four epistemological questions.
Definition: A monad consists of three operators[3]:
- The construct operator which assigns a set to each set .
- The return operator which assigns a function to each set .
- The bind operator which assigns a function to each pair of sets .
Moreover, a commutative monad is a monad equipped with a fourth operator:
- The product operator which assigns a function to each pair of sets .
These operators must also satisfy some basic algebraic laws to qualify as a (commutative) monad. See here for details.
Notation: I'll use variables for elements of , and boldface variables for elements of . I may talk loosely of the monad rather than or of the commutative monad rather than . I may write , , or for clarification. I may write instead of , and instead of .
How do they correspond to each other?
In short, there is an exact correspondence between the operators of a (commutative) monad and the four epistemological questions. Let's go one-by-one.
1. Count?
What's counts as a distinct belief about the election? Concretely, if there are electoral candidates then how many distinct belief-states are there?
An answer to this question is the constructor operator, assigning a set to each set . If is the set of potential outcomes of an event then is the set of beliefs about the event.
As we discussed before, for possibilistic uncertainty , and for probabilistic uncertainty .
2. Certainty?
If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief?
Here, an answer will be the return operator assigning a function to each set . If you're certain that a state will occur, then is your belief-state.
For possibilistic uncertainty, , the singleton set containing . And for probabilistic uncertainty, , the dirac distribution at given by .
The function describes how the state-space embeds in the belief-state-space. This is related, I think, to the idea that each territory can serve as its own map. (See Borges' On Exactitude in Science for an exploration of this theme.) Or in the words of Norbert Wiener, “The best model of a cat is another, or preferably the same, cat.”
3. Collapse?
Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
Here, an answer will be the bind operator assigning a function to each pair of sets and . You should think of the bind operator as collapsing your second-order beliefs to your first-order beliefs — i.e. if each forecaster has an first-order belief , and is your second-order belief about which forecaster is correct, then should be your first-order belief about the election.
For possibilistic uncertainty, is the union . And for probabilistic uncertainty, is the summation/integral .
This is related to the idea that a map of a map of a territory is a map of that same territory; a depiction of a depiction of person is a depiction of that same person, a representation of a representation of an idea is a representation of that same idea; etc.
One might think of as some parameterisation of the belief-state using some parameters . Then the bind operator gives us the function for finding your -belief from you -belief. Explicitly, this function is.
Moreover, the bind operator doesn't just flatten one level of "meta". Often we have an entire hierarchy of state-spaces where beliefs about are parameterised by some "higher" state-space via a function . Here, the state-space is the object-level system, the state-space parametrises your first-order beliefs about , the state-space parameterises your second-order beliefs about , and so on. Then the bind operator says that I can collapse your th-order beliefs all the way to your first-order beliefs via the function .[4]
4. Combine?
Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?
An answer will be the product operator assigning a function to each pair of sets and . If is your belief about the first election and is your belief about an unrelated second election, then is your belief about the pair of elections.
For possibilistic uncertainty, is the cartesian product . And for probabilistic uncertainty, is the joint distribution .
Thinking of as a factorisation of the state-space , the product operator implies that your beliefs about each combine to yield your overall belief about . That is, a commutative monad corresponds to a flavour of uncertainty that you can have to parts of the world, whereas a non-commutative monad corresponds to a flavour of uncertainty that you can only have to the world in its entirety.
Historical note: The central thesis of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. I call this Myers' correspondence after David Jaz Myers, because I first encountered the idea in his book Categorical Systems Theory, where he devotes a chapter to using commutative monads to model various nondeterminism of automata. Nonetheless, he idea did not originate with him, he's never claimed it is true, and I don't know if he agrees with it.
Examples of Myers' correspondence
The correspondence between he operators of the (commutative) monad and the epistemological questions also serves as a practical recipe for formalising different flavours of uncertainty using monads. I've personally found it useful. First, think about the particular flavour of uncertainty, then answer the Four C's (Count? Certainty? Collapse? Combine?), convert those answers into mathematical operators, and voilà you've got yourself a monad.
I'll now zoom through fifteen examples, beginning (without commentary) with the paradigm examples of and .
1 - nonempty powerset monad
Flavour of uncertainty | Possibilistic |
---|---|
Monad | Nonempty powerset |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | if you consider the outcome to be possible. |
2 - distribution monad
Flavour of uncertainty | Probabilistic |
---|---|
Monad | Distribution |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | is your subjective credence in the outcome . |
3 — reader monad from
Okay, now let's deal with a flavour of uncertainty which is sometimes called "indeterminacy". An indeterminate belief is something like "Well, if is true then , but if is true then , but–", i.e. it's a belief which is uncertain because your best guess depends on some unknown variable. More formally, your belief-state is given by a particular function from (the possible values of the unknown variable) to (the state-space).
This is an ordinary usage of the word "uncertain" so, by Myers' correspondence, it must correspond to a monad, and we can discover which monad by answering the four Cs. If is the state-space then the belief-state-space is given by , the set of functions . So our construct operator is . If you're certain tha tthe outcome is then your belief-state is the constant function . The intuitive answers to Collapse? and Combine? give us our bind and product operators.
Overall, we get what's called the reader monad from .
Flavour of uncertainty | -indeterminacy |
---|---|
Monad | Reader monad from |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | if is your best guess about the outcome conditioned on the information . |
4 — writer monad to
Often, people will report their uncertain beliefs like "The coin will land heads (98%)" or "AI will disempower humanity (60%)". That is, their belief is a best guess paired with their confidence, which they offer as a lower-bound on the likelihood of that their guess is correct. A certain belief-state would be something like "The coin will land heads (100%)".
What monad corresponds to this flavour of uncertainty?
If is the state-space then is the belief-state-space, i.e. there's a distinct belief-state for each pair . If you're certain that the outcome is then your belief-state is . Uncertainty is collapsed by multiplying the confidences. Uncertainty is combined also by multiplying the confidences.
Flavour of uncertainty | Confidence-marked guess |
---|---|
Monad | Writer to monad |
Construct | |
Return | |
Bind | where and |
Product | where and . |
Interpretation | if is your confidence in the outcome, i.e. you think that the likelihood of is at least . |
Using the writer to monad, we've characterised a belief-state as an outcome marked with some additional metadata, namely a confidence . What properties of the interval did we appeal to in this definition? Well, firstly that we can multiply different elements (see bind and product operators). And secondly, that there's a fixed element such that multiplying with this element does nothing (see return operator).
Hence we can generalise: given any monoid we have a monad called the writer-to- monad.[5] By using different monoids, we can model different flavours of uncertainty, but note that this is only a commutative monad when is a commutative monoid.
There's another ordinary usage of the word "uncertainty" where an uncertain belief would be something like "AGI arrives before 2040 unless there's a nuclear war" and a certain belief would be something like "AI will arrive before 2040." At least, with regards to teh binary question of whether AGI arrives before 2040. That is, an uncertain belief is one with an "unless..." clause.
Formalising this, we have a fixed set of events , and a belief-state is a pair . Your belief-state is when you commit to the state occurring unless the event occurs. This flavour of uncertainty corresponds to the writer monad , where is a monoid when equipped with union and the empty set .
One might use this flavour of uncertainty to models various kinds of defeasible reasoning, where a belief-state is characterised by the precondition under which the belief would be defeated or disavowed.
Flavour of uncertainty | Unless-claused guess |
---|---|
Monad | Writer monad to |
Construct | |
Return | |
Bind | where and |
Product | where and . |
Interpretation | if you think will occur unless event occurs. |
Or maybe an uncertain belief is a one full of amendments, clarifications, conditions, disclaimers, excuses, hedges, limitations, qualification, refinements, reservations, restrictions, stipulations, temperings, etc. By contrast, a certain belief is made "with no ifs or buts", bare and direct.
Formalising this, we have a fixed set of clarifications , and a belief-state is a pair . Here, is the free monoid over the set of clarifications equipped with concatenation and the empty list .
Flavour of uncertainty | Clarified guess |
---|---|
Monad | Writer to monad |
Construct | |
Return | |
Bind | where and |
Product | N/A (See below.) |
Interpretation | if you think will occur and is a list of your clarifications. |
Now, the writer to monad isn't a commutative monad. Or interpreted philosophically, a clarified guess isn't the kind of uncertainty you can have to parts of the world. Suppose "I think Alice is happy but I don't know her very well" is my belief-state about Alice, and "I think Bob is happy but he's difficult to read" is my belief-state about Bob. What's my belief-state about both Alice and Bob? Is it (1) "Alice and Bob are both happy, but I don't know Alice very well and Bob is difficult to read" or (2) "Alice and Bob are both happy, but Bob is difficult to read and I don't know Alice very well". That is, in which order should we combine the clarifications?
The instinctive trick is to declare that two belief-states are equal if the lists of clarifications are equal up-to-permutation — this implies that (1) and (2) are the same belief-state, which does seem intuitive to me. If we play this trick, then the resulting flavour of uncertainty is captured by the writer-to- monad, where is the free commutative monoid. This does indeed give a commutative monad!
Flavour of uncertainty | Unordered clarified guess |
---|---|
Monad | Writer monad to |
Construct | |
Return | |
Bind | where and |
Product | where and . |
Interpretation | if you think will occur and is an unordered list of your clarifications. |
5 — identity monad
If we've anticipating an election between candidates, then the simplest way to characterise your belief about the election by your best guess with no additional information about how unsure you are. If is the state-space then is also the belief-state-space, i.e. there's a distinct belief-state for each . The set of belief-states is therefore equal (up to bijection) to the set of outcomes itself.
I'll admit that this flavour of uncertainty is somewhat degenerate — e.g. every belief-state is a certainty in some particular state — but it's worth including nonetheless. On some readings of Wittgenstein's Tractatus, this is his model of how language represents the world, our utterances stand in direct isomorphism with the state-of-affairs.
Anyway, answering the four Cs would give the identity monad!
Flavour of uncertainty | Best guess |
---|---|
Monad | identity monad |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | if is your best guess about the outcome |
6 — maybe monad
The last example was a bit silly, so how about this instead..?
If we've anticipating an election between candidates, then I'll characterise your belief about the election either by your best guess (with no additional information) or an "I don't know" response. This is an very coarse-grained flavour of uncertainty — the only belief-state about the election (other than certainty in a particular candidate) is the belief-state of utter cluelessness, or shrugging one's shoulders!
Despite the coarse-grained-ness, it's pretty commonly encountered in the wild. For example, it's the typical flavour of uncertainty encountered in surveys/questionnaires, where is read as "no opinion/don't know". It's also encountered in voting, where is read as "abstention".
Formally speaking, if is the state-space then there's a distinct belief-state for each state plus an additional option denoted . The belief-state-space is therefore , denoting the disjoint union of with the singleton set . If you're certain that the outcome is then your belief-state is . This flavour of uncertainty corresponds to the famous maybe monad.
Flavour of uncertainty | guess-or-shrug |
---|---|
Monad | maybe monad |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | if is your best guess for the outcome, and if you offer no best guess. |
7 — -distribution monad
You might, at this point, feel short-changed. I've discussed so far a range of flavours of uncertainty which are all coarser-grained than probabilistic knowledge, so why not stick to ? Let's consider then a more fined-grained characterisation of belief-state, one that tracks infinitesimal differences between probability assignments.
The Levi-Civita Field is an extension of the real numbers which contains infinitesimal values like and infinite values like . We can replace in the definition of with to obtain a monad corresponding this flavour of uncertainty. On this account, a belief-state is something which tracks the potentially infinitesimal likelihood of each outcome . This flavour of uncertainty has applications in infinite ethics [? · GW] and cooperation in large worlds [LW · GW].
For example, in a universe with infinite radius , what's your prior likelihood that you occupy the most central galaxy? Presumably, the likelihood should be , where is the density of galaxies.
Now suppose you were offered a lottery which promises to benefit everyone by if you indeed occupy the most central galaxy but otherwise benefits no one. What's this lottery worth? Presumably, it's worth , because the infinitary stakes are cancelled out by the infinitesimal chance of winning .
Note that because is totally-ordered, once we assign values to different lotties, we can perform expected utility maximisation as usual, and get sensible results. I think that infinitesimal probabilities resolves some (but not all) problems in infinite ethics. I'm particularly lured by the hope that, in an infinite cosmos, the infinitary stakes might somehow cancel out with infinitesimal probabilities to yield finite values. See Joe Carlsmith's essay On Infinite Ethics [LW · GW] for further discussion.
Flavour of uncertainty | infinitesimal probabilistic |
---|---|
Monad | -distribution monad |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | is your potentially infinitesimal subjective credence in the outcome |
How far can one generalise the kind of entity that a "probability" must be, before our definition breaks? Well, so long as we have some rig , we can define a monad by replacing with . A rig is a set equipped with a zero element , a unit element , an addition function , and a multiplication function , satisfying certain algebraic laws. By choosing different rigs then we obtain different monads corresponding to different flavours of uncertainty.
When we obtain the ordinary probability distributions, and when we obtain the rational probability distributions, etc. Toby Fritz suggests that by using similar tricks we might obtain quantum uncertainty, fuzzy uncertainty, and Dempster–Shafer uncertainty, but I haven't checked whether this is true.
Flavour of uncertainty | -probabilistic |
---|---|
Monad | -distribution monad |
Construct | |
Return | |
Bind | |
Product | |
Interpretation | is your subjective credence in the outcome , where is whatever rig of exotic probabilities |
8 — quantum monad
For sure, quantum mechanics is endowed with its own flavour of uncertainty, hence the term Heisenberg's Uncertainty Principle. It's not impossible to catch a physicist saying "it's uncertain whether the qubit is 0 or 1" or "it's uncertain whether the cat is alive or dead", regardless of whether they consider quantum uncertainty as strictly speaking epistemic. By Myers' correspondence, this flavour of uncertainty must correspond to a monad.
Exercise 3: Which?[6]
9 — smooth state monad
The position of the North Star in the night sky is constant, static, immutable, certain; the position of Mercury, by contrast, is variable, dynamic, mutable, uncertain. Is this not a common sense of the word? Might one not say that my belief-state about Mercury's position will forever be uncertain, no matter how accurate my telescope or exhaustive my calculations, because my belief is always revised? If so, then by Myers' correspondence this flavour of uncertainty corresponds to a monad.
To formalise this, let's fix a differentiable manifold parameterising your internal mental state as you think about a question. Note that because is a differentiable manifold, it's equipped with tangent space at every .
If is the state-space, then is your belief-state-space. In other words, we have a distinct belief-state for each smooth transition function . A belief-state is characterised by a pair for each , where is your current guess and is the tangent vector describing how your mental state is evolving. If you're certain that the winner is then your belief-state is the static transition function where is the zero vector.
This is the smooth state monad — it's a differentiable version of the discrete-time state monad, with the additional benefit that it's commutative monad.
Flavour of uncertainty | evolving guess |
---|---|
Monad | smooth state monad |
Construct | |
Return | |
Bind | where and |
Product | where and . |
Interpretation | The transition function describes how your internal mental state evolves over time and produces guesses. |
10 — continuation monad
What are belief-states actually for anyway? What purpose do they play in rational decision-making? According to one school of thought, belief-states are simply gadgets for taking expected values, and chiefly for taking expected utility values.
Let's say is the set of candidates running in the election, and is your utility function, i.e. measures how happy you'd be to hear that the candidate has won. Then your ex-ante utility is some measuring how happy you are now in anticipation of the outcome. Given your belief-state, I should be able to determine from , which implies that I can just characterise your belief-state about the election by how is determined from . Neat.
This is formalised by the so-called continuation to monad. If is the state-space then is the belief-state-space, where is the set of functionals . And a belief-state is certain in the outcome if determines your ex-ante utility simply by evaluating your utility function at , i.e. .
The continuation monad encompasses both possibilistic uncertainty and probabilistic uncertainty. If the nonempty subset models your possibilistic uncertainty then the associated functional is given by . If the distribution models your probabilistic uncertainty then the associated functional is given by .
Flavour of uncertainty | ex-ante utility |
---|---|
Monad | Continuation monad |
Construct | |
Return | |
Bind | |
Product | Unfortunately, is not a commutative monad.[7] |
Interpretation | If assigns your ex-post utility to each outcome , then is your ex-ante utility. |
Exercise 4: (Beginner) Prove that the two maps and are injections. (Advanced) Prove these injections are monad transformers.[8]
11 — signature monad
Maybe I should characterise your belief-state about something by the sentence that you'd utter about the outcome. This will result in a more syntactic or linguistic account of belief. You might imagine here a shared language, like English or Python, with which a speaker may report their beliefs to a friend. Or you might imagine a private mental language in which a brain/AI will store their knowledge about the world.
To make this rigorous, I must introduce a language containing all the sentences that you might utter about the outcome. Our language will include an atomic sentence for every outcome , along with certain connectives for combining sentences. For example, suppose we have a language with two symbols, a binary connective called disjunction and a unary connective called negation. If are the candidates in an election, then a belief-state about the electoral outcome is a sentence like or .
The logical connectives can be specified by a signature. A signature is a set equipped with a map sending each connective to its arity. So the aforementioned language has the signature with and .
We denote the resulting set of sentences by . This is a set containing all the sentences freely generated from using the connectives in . Explicitly, is the smallest set such that for every and for every , , and .
With this machinery in place, we can answer the Four C's, and thereby find the corresponding monad.
- If is the state-space then there's a distinct belief-state for each sentence .
- If you're certain that the winner of the election is , then your belief-state is the sentence .
- Let be the function assigning to each forecaster their belief-state about the election. And let be your belief-state about the forecasters. Then your belief about the election itself is given by uniform substitution: loop through the sentence and, every time you come across an atomic letter , replace it with the sentence . This results in a sentence .[9]
- Unfortunately, isn't generally a commutative monad.[10]
Flavour of uncertainty | utterance in a language |
---|---|
Monad | signature monad |
Construct | |
Return | |
Bind | Uniform substitution of every with in the sentence |
Product | N/A |
Interpretation | is the sentence that you would utter about the outcome, in a language which contains an atomic letter for each outcome and a logical connective for each . |
Many monads are equivalent to for some signature , including many monads we've already encountered.
- When , then is equivalent to the identity monad. This is intuitive. If there's no connectives in the language, then every utterance is a single atomic sentence positing one of the outcomes.
- When consists of one constant symbol (i.e. zero-arity connective) then will contain the atomic sentences plus one additional sentence . So is equivalent to the maybe monad. We encountered this before as modelling the guess-or-shrug flavour of uncertainty.
- When consists of many constant symbols, then will contain atomic sentences plus additional sentences for every . So is equivalent to what's called the exception monad . This is like the guess-or-shrug, except there are multiple ways to shrug one's shoulders.
- When consists of one unary connective, then will contain sentences like . So is equivalent to the writer monad to the monoid . If consists of many unary connectives, then is equivalent to the writer monad to . We encountered this before as modelling the clarified guess.
- When consists of one a binary connective, then will consist of sentences like . So is equivalent to the set of full binary trees over . As Vanessa Kosoy notes, "we think of such a tree as a way to select an element of by reading a stream of bits." (See here.)
Isn't the archetypal symbol of uncertainty... a fork in the road? Imagine a traveller facing two paths, left and right, each forking further ahead, and so on unboundedly, forming a fractal canopy of binary choices.
12 — algebraic theory
There's something a bit perverse about characterising your belief-state with a single utterance about the outcome. Namely, some utterances will be logically equivalent to each other, such as and , and therefore the belief-state in which you're willing to utter is the exact same as the belief-state in which you're willing to utter , assuming that you're both rational and honest. Therefore, our previous characterisation was overcounting the belief-states by distinguishing logically-equivalent sentences. Bizarrely, there would be infinitely-many belief-states about a single coin flip — i.e. , , , and so on.
To fix this, what we need isn't just a signature , but rather a signature paired with a set of equational axioms, which is called an algebraic theory. An equational axiom is a pair of sentences built using the connectives in and some placeholder sentence variables . We use to define an equivalence relation on by taking the deductive closure of the axioms, and then the equivalence classes of the sentences will be our belief-states.
For example, if our signature is and we intend to interpret the connective as disjunction, then should consist of three axioms:
- Idempotency,
- Commutativity,
- Associativity,
Furnished with the concept of an algebraic theory, we can now improve our answers:
- If is the state-space then there is a distinct belief-state for each equivalence class of sentences . This set is denoted .
- If you're certain that the winner is , then your belief-state is the sentence .
- Let be the function assigning to each forecaster their belief-state about the election. And let be your belief-state about the forecasters. Then your belief about the election itself is given by uniform substitution modulo equivalence. We know for some and that for some . Then where is the bind operator for the signature monad. This is operation is well-defined because the deductive system satisfies referential transparency — i.e. if then .
- Again, isn't generally a commutative monad.
Flavour of uncertainty | equivalence class of utterances |
---|---|
Monad | utterances-modulo-equivalence |
Construct | |
Return | |
Bind | where and |
Product | N/A |
Interpretation | is the set of sentence that you would assert about the outcome, in a language which contains an atomic letter for each outcome , a logical connective for each , and where is the set of equational axioms governing the connectives of . |
If a monad is equivalent to for some algebraic theory then we call a presentation of the monad.[11] A presentation of a monad is a rather nice description of a flavour of uncertainty via some operators for defining belief-states in terms of other belief-states and some rules governing those operators.
- When is empty, then is obviously just the signature monad .
- When contains a unary connective for every and contains the axioms , then is equivalent to the writer monad to the monoid . We encountered this before as the confidence-marked guess. In general, we can give a similar presentation for the writer monad to any monoid . So the unless-claused guess has a similar presentation.
- When and is idempotency, commutativity, and associativity (shown above), then there is a distinct class for each non-empty finite subset of . So is equivalent to the nonempty finite powerset monad . This is a finitary version of the monad which we've encountered as modelling possibilistic uncertainty. This algebraic theory is also called the theory of semilattices.
- Let's find a presentation for the distribution monad. The signature will contain a binary connective for every . Our axioms will be (skew-idempotency), (skew-commutativity), and for (skew-associativity). You should think of as units of and units of , which explains the ghastly expression for skew-associativity. This algebraic theory is called the theory of convex algebras.
Exercise 5: Find a presentation for for an arbitrary rig .
13 — convex powerset of distributions monad
As we saw before, the continuation monad encompasses both possibilistic and probabilistic uncertainty. Unfortunately lacks any presentation, even if we allow connectives with infinite arity![12] Fortunately, there exists a monad encompassing both possibilistic and probabilistic uncertainty which is presentable.
Recall that the nonempty finite powerset monad , which corresponds to possibilistic uncertainty, is presented by the theory of semilattices . And the distribution monad , which corresponds to probabilistic uncertainty, is presented by the theory of convex algebras . Consider the theory where is an additional axiom of describing how the connectives distribute over the connective.
This new theory is a presentation the convex powerset of distributions monad. This monad, denoted by , corresponds to a flavour of uncertainty wherein a belief-state is a convex set of distributions, e.g. "The coin lands either heads (20-30%) or tails (70-80%)." (See credal sets.)
Now, we could have defined in an entirely non-syntactic way, i.e. " is the set of nonempty finitely-generated convex-closed sets of finite-support distributions over ." But I think the syntactic definition, in terms of the algebraic theories for and , elucidates why is a well-motivated unification of probabilistic and possibilistic uncertainty. We will employ a similar strategy for motivating infrabayesianism — roughly speaking, infrabayesianism is exactly what you get when you combine probabilistic and possibilistic uncertainty with reward.
Flavour of uncertainty | imprecise probability |
---|---|
Monad | convex powerset of distributions monad |
Signature | |
Axioms | is semilattice, is convex algebra, distributes over , |
Interpretation | is certainty in an outcome . is possibilistic uncertainty between and . is probabilistic uncertainty between (with chance ) and (with chance ). |
14 — free convex lattice monad
There's a common usage of the word "uncertainty", where the uncertainty is modulo strategic choice. For example, you might hear "Black is certain to win" from a chess commentator if Black can force a checkmate, or hear "the winner is still uncertain" from a poker commentator during the flop. By Myers' correspondence, this flavour of uncertainty — call it "ludic uncertainty" — must correspond to some monad, but which?
Consider the theory of convex lattices — with signature and the following axioms:
Then is a monad corresponding, I think, to the aforementioned flavour of uncertainty. It sends a set to the set , the free convex lattices over . An element of should be read as a game-tree whose non-leaf nodes are either a free binary choice by White, a free binary choice by Black, or a biased coin flip. The leaf nodes may be either wins for White, wins for Black, or an element of the set .
We treat game-trees as equivalent if the same outcome would result from and regardless of the player's preferences over the elements of . For example, the lattice axioms and will hold because no player would willingly choose to loose, and the axioms and establish that the players are adversarial, i.e. would never willingly empower one another.
Exercise 7: Consider the game shown below. Which outcome is (ludically) certain?
Note that aren't really games in the usual sense, because leaf nodes might be elements of , and we treat these elements are pairwise incomparable to both players. So you should think of as a set of partially-specified game trees. A fully-specified game tree would be an element of , which is a game tree where each leaf-node returns some -valued utility to Black and disutility to White. You may notice that can itself be equipped with the structure of a convex lattice, which just means there exists a -algebra .[14] This -algebra is exactly the well-known used in combinatorial game theory.
Flavour of uncertainty | ludic |
---|---|
Monad | free convex lattices |
Signature | |
Axioms | is a lattice. is convex algebra. distributes over both and , |
Interpretation | is a game which will certainly result in outcome . is a game where White wins and is a game where Black wins. is a game where White can choose to play or to play . is a game where Black can choose to play or to play . is a game where is played with chance and with chance . |
15 — infrabayesianism
When agents have beliefs about the same environment that they're embedded in, weird things can happen [LW · GW]. Over the past few years, Vanessa Kosoy and Alex Appell have been exploring a novel flavour of uncertainty — infrabayesian uncertainty [? · GW] — which they claim more fruitfully characterises the belief-states of embedded agents. In particular, it characterises belief-states concerning Newcomb-like environments, where the state of the environment is correlated with the agent's choice under consideration. Their flavour of uncertainty corresponds to the infrabayesian monad .
Roughly speaking, is the same as above except without the connective. Consider the theory of convex semilattices with top and bottom, which is a presentation of the composite monad .[15] From what I understand, this monad is Kosoy's infrabayesian monad .[16] This justifies the claim that infrabayesianism is the flavour of uncertainty that minimally encompasses both possibilistic uncertainty (via the monad), probabilistic uncertainty (via the monad), and reward (via the monad). I think that this motivates infrabayesianism as a characterisation of an agent's belief-state about their environment.
Flavour of uncertainty | infrabayesian |
---|---|
Monad | infrabayesian monad |
Signature | |
Axioms | is a semilattice with and . is convex algebra. distributes over , |
Interpretation | is an environment which certainly results in outcome . is an impossible/contradictory environment where the agent achieves no disutility, called Nirvana. is an environment where the agent suffers maximal disutility. is a environment which is either like or like , and our agent should be pessimistic here. is an environment which is like with chance and with chance . |
Unfortunately, isn't a commutative monad, which means it's not a flavour of uncertainty that you can have to parts of the world, but only to the world in its entirety. Put starkly, there's no way to combine my infrabayesian belief-states about two coin toss to yield a single infrabayesian belief-state about the pair of coin tosses, even when the coin tosses are completely unrelated.[17] This, I think, limits both the theoretical appeal of infrabayesianism and its tractability.
Theoretically speaking, the fact that isn't a commutative monad weakens the analogy between infrabayesian uncertainty and possibilistic or probabilistic uncertainty. Many concepts are built upon possibilistic or probabilistic uncertainty which appeal, in an essential way, to the product operators or . And infrabayesianism, lacking such an operator, is not guaranteed the analogous concept.
Practically speaking, the lack of an infrabayesian product operator is an obstacle to parallelising algorithms which assume infrabayesian belief-states. There is no way to decompose the environment into separate components, discover an infrabayesian belief-state for each component, and then combine those belief-states into a single belief-state about the environment as a whole.
Implications for AI safety
Does this essay have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this essay is maybe agent foundations foundations or something like that. But I feel compelled to offer some practical implications for AI safety to validate my decision to write this essay and your decision to read it.
- One lesson is that uncertainty comes in many flavours, and formalisating different flavours of uncertainty isn't mathematically challenging. Just ask yourself the Four C's (Count? Certainty? Collapse? Combine?) and you've got yourself a monad.
- Often, you can replace one monad in a formalism with another and everything will still type-check. For example, the stochastic Markov decision processes are transition functions . One can generalise this to for any monad we've met so far.
- If you're conducting active research into agent foundations, then instead of assuming a fixed flavour of uncertainty (e.g. possibilistic, probabilistic, infrabayesian, etc), perhaps see if you can generalise the theory to an arbitrary monad, or at least an arbitrary commutative monad. I call such theories "parametric in the monad". If you're gonna do foundational work, it often pays to make it highly parametric, even if you only care about a specific case.
- The theory will be robust to errors about the appropriate flavour of uncertainty.
- If you want to account for another flavour of uncertainty, you'll have saved yourself time, effort, and ink.
- You've got more data points to sanity-check the theory — do you get sensible answers when you plug in different monads, e.g. etc?
- If your solution to AI safety involves, at some step, building a formal model of the environment (c.f. Davidad's Open Agency Architecture [AF · GW].) or of a human (c.f. imitative amplification [LW · GW]), then this model should carry all the flavours of uncertainty that actually characterise your belief-state about the system. And you shouldn't feel compelled to shoe-horn all your uncertainties into a probability distribution. For example, unless-claused uncertainty seems pretty fundamental — we commit to our stochastic models of the environment and/or a human only within a narrow range of situations — and this flavour of uncertainty seems irreducible to probabilistic uncertainty.
Further questions
In so far as "flavours of uncertainty" is an informal term, there's little we can do to test the correspondence other than enumerating well-known flavours of uncertainty and checking that they do in fact correspond to monads, and vice-versa, enumerating the well-known monads and giving them natural doxastic interpretations. I think my own attempt has been positive, but this result is open to revision.
Secondly, the the biggest asterisks of my essay: my treatment of belief-states has been silent on their most important property, namely that they are learned. For example, a probability distribution can be conditioned on new evidence, and possibilistic uncertainty also carries an analogous notion of conditioning. Perhaps any characterisation of belief should answer additional questions about how those belief-state revised in light of new evidence/observations/considerations, etc. Perhaps we should append to Count? Certainty? Collapse? Combine? a fifth question, Condition? I'm sympathetic to this worry.
And if indeed learning is a phenomenon which must be modelled by any characterisation of belief, then monads do not themselves carry enough structure to characterise beliefs. Rather, we would need to equip the monad with some additional structure, perhaps a family of maps for some spce of observations , possibly satisfying some additional constraints such as and . I'm just improvising here.
This is best left to future work, if the need arises.
Flavour of uncertainty | Monad |
---|---|
possibilistic | nonempty powerset monad |
probabilistic | distribution monad |
indeterminate | reader from monad |
confidence-marked | writer to monad |
unless-claused | writer to monad |
ordered clarifications | writer to monad |
unordered clarifications | writer to monad |
best guess | identity monad |
guess-or-shrug | maybe monad |
infinitesimal probabilistic | -distribution monad |
generalised probabilistic | -distribution monad |
quantum | quantum monad |
evolving | smooth state monad |
ex-ante utility | continuation to monad |
utterance in a language | guess-or-many-shrugs |
guess-or-different-shrugs | exception monad |
path through forking road | full binary trees monad |
utterance modulo equivalence | algebraic theory |
imprecise probability | convex powerset of distributions monad |
ludic | free convex lattice monad |
infrabayesian | infrabayesian monad |
- ^
In particular, I'm thinking of the applied category theory community.
- ^
Traditionally, the field of analytic epistemology has been concerned with defining epistemological concepts — i.e. constructing definitions for the concepts of knowledge, belief, evidence, learning, testimony, justification, etc. However, in recent years analytic epistemology has reorientated itself, chiefly under the influence of Timothy Williamson, towards modelling epistemological phenomena — i.e. constructing mathematical models for phenomena relating knowledge, belief, evidence, learning, testimony, justification, etc. This reorientation in epistemology, from concept-defining to model-building, was inspired by the natural sciences.
- ^
An operator assigns, to every set , another set/function .
For example, is the powerset operator, which assigns to every set another set . You can informally think of an operator as a function — but strictly speaking, an operator can't be a function because its domain would be the "set of all sets" (which doesn't exist).
Formally, the domain of an operator is something called a category. Categories can be larger than sets — in particular there is a category containing all the sets and the functions between them. For pedagogical purposes, I've framed everything in this article in terms of sets and functions, but most of the content of this article can applied to any category with enough structure.
- ^
And I suppose, by "generalising backwards", that my zeroth-order belief about the coin toss is the actual result of the coin toss..?
- ^
is a monoid if and .
A monoid is like a group except the elements might not have inverses, e.g. is a group but is only a monoid.
is a commutative monoid if also .
The writer monad for is given by the data ,, and where and .
- ^
Solution: I think is the -dimensional hilbert space, but this isn't my expertise.
- ^
Suppose has two distinct elements and . Let and . Then there are two ways to combine and into a single belief in , i.e. and . But these differ so is not a commutative monad for .
- ^
In fact, encompasses every other monad such that is a -algebra. This explains why encompasses both possibilistic and probabilistic uncertainty — specifically, it's because is a -algebra and is a -algebra.
Moreover, is the smallest monad with this property, because there's a bijection between -algebras and monad morphisms . See here for details.
That being said, isn't the smallest monad encompassing both and in particular. If you only need to encompass and then Vanessa Kosoy's infrabayesian monad will suffice, but is strictly contained within .
- ^
For example, suppose and satisfies and . Then we find via uniform substitution.
In pythonese, S_string = ''.join(t if t in Sigma else f(t) for t in W_string)
Equivalently, we can define the bind operator recursively on the depth of . For atomic sentences, , and for compound sentences, .
- ^
In particular, suppose contains two unary connectives. Suppose is my belief-state about and is my belief-state about . Then there are two ways to combine these two beliefs into a single belief in , i.e. and . But these differ so is not a commutative monad.
- ^
Note that a monad might have many distinct presentations, and this non-uniqueness is rather distasteful. The more elegant treatment of monads is with Lawvere theories, where both atomic connectives and compound connectives are treated on par.
- ^
For any cardinality , we say that a monad has rank if it has a presentation with operations of arity at most . The continuation monad has no rank (not even an infinitary one) which is a somewhat perverse property for a monad. A rankless monad isn't generated by any algebraic theory, even if we allow infinitary operators.
We can see that is rankless monad because it contains as a submonad for every , but is a monad without rank.
- ^
The lattice axioms for the signature consists of the semilattice axioms for , the semilattice axioms for , the boundary axioms and , and the absorption laws and .
- ^
The position evaluation function is defined inductively:
- ^
That is, the signature consists of the connectives , and contains the axioms: , , , , , .
Strictly speaking, it's improper to speak of composing monads and unless you provide a distributive law of over , i.e. . But yields a monad given by the convex powerset of distributions monad, and the exception monad distributes over any monad, so no worries here.
- ^
A technical caveat:
Kosoy's infrabayesian monad is actually given by rather than — that is, contains sets of distributions with arbitrary cardinality. A least, this is my reading from Diffractor's Infra-Miscellanea Section 2 [LW · GW].
Unfortunately, is a rankless monad, i.e. it isn't generated by any algebraic theory even if we allow infinitary operators.
Fortunately, we may approximate with a monad of rank for any cardinality . Let's define , where is the set of non-empty subsets of of cardinality no greater than . Algebraically, we obtain by adding the -ary disjunction connective to the signature for .
This leaves the open question, for which cardinality is an adequate and tractable approximation, if indeed any? I suspect suffices for all theoretical purposes, and that suffices for all practical purposes.
- ^
This also applies to imprecise probability and to strategic uncertainty .
For example, given a series of two-player games , there's no natural way to combine them into a single two-player game because isn't a commutative monad.
More generally, there's no commutative monad which contains both a operator and a operator without conflating them. See here for details.
6 comments
Comments sorted by top scores.
comment by DragonGod · 2024-01-20T22:46:49.706Z · LW(p) · GW(p)
i.e. if each forecaster has an first-order belief , and is your second-order belief about which forecaster is correct, then should be your first-order belief about the election.
I think there might be a typo here. Did you instead mean to write: "" for the second order beliefs about the forecasters?
comment by davidad · 2024-01-03T04:09:03.188Z · LW(p) · GW(p)
Kosoy's infrabayesian monad is given by
There are a few different varieties of infrabayesian belief-state, but I currently favour the one which is called "homogeneous ultracontributions", which is "non-empty topologically-closed ⊥–closed convex sets of subdistributions", thus almost exactly the same as Mio-Sarkis-Vignudelli's "non-empty finitely-generated ⊥–closed convex sets of subdistributions monad" (Definition 36 of this paper), with the difference being essentially that it's presentable, but it's much more like than .
I am not at all convinced by the interpretation of here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element in is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations. This is very useful for modelling Bayesian updates (Evidential Decision Theory via Partial Markov Categories, sections 3.5-3.6), in which some variable is observed to satisfy a certain predicate : this can be modelled by applying the predicate in the form where means the predicate is false, and means it is true. But I don't think there is a dual to logical inconsistency, other than the full set of all possible subdistributions on the state space. It is certainly not the same type of "failure" as losing a game.
Replies from: strawberry calm↑ comment by Cleo Nardo (strawberry calm) · 2024-01-09T18:09:49.053Z · LW(p) · GW(p)
For the sake of potential readers, a (full) distribution over is some with finite support and , whereas a subdistribution over is some with finite support and . Note that a subdistribution over is equivalent to a full distribution over , where is the disjoint union of with some additional element, so the subdistribution monad can be written .
I am not at all convinced by the interpretation of here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element in is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations.
Doesn't the Nirvana Trick basically say that these two interpretations are equivalent?
Let be and let be . We can interpret as possibility, as a hypothesis consistent with no observations, and as a hypothesis consistent with all observations.
Alternatively, we can interpret as the free choice made by an adversary, as "the game terminates and our agent receives minimal disutility", and as "the game terminates and our agent receives maximal disutility". These two interpretations are algebraically equivalent, i.e. is a topped and bottomed semilattice.
Unless I'm mistaken, both and demand that the agent may have the hypothesis "I am certain that I will receive minimal disutility", which is necessary for the Nirvana Trick. But also demands that the agent may have the hypothesis "I am certain that I will receive maximal disutility". The first gives bounded infrabayesian monad and the second gives unbounded infrabayesian monad. Note that Diffractor uses in Infra-Miscellanea Section 2.
Replies from: davidad↑ comment by davidad · 2024-01-12T18:16:19.077Z · LW(p) · GW(p)
I agree that each of and has two algebraically equivalent interpretations, as you say, where one is about inconsistency and the other is about inferiority for the adversary. (I hadn’t noticed that).
The variant still seems somewhat irregular to me; even though Diffractor does use it in Infra-Miscellanea Section 2, I wouldn’t select it as “the” infrabayesian monad. I’m also confused about which one you’re calling unbounded. It seems to me like the variant is bounded (on both sides) whereas the variant is bounded on one side, and neither is really unbounded. (Being bounded on at least one side is of course necessary for being consistent with infinite ethics [LW · GW].)
comment by davidad · 2024-01-03T03:28:29.722Z · LW(p) · GW(p)
Does this article have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this article is maybe agent foundations foundations...
I do think this is highly practically relevant, not least of which because using an infrabayesian monad instead of the distribution monad can provide the necessary kind of epistemic conservatism for practical safety verification in complex cyber-physical systems like the biosphere being protected and the cybersphere being monitored. It also helps remove instrumentally convergent perverse incentives to control everything [AF · GW].