Aggregative Principles of Social Justice
post by Cleo Nardo (strawberry calm) · 2024-06-05T13:44:47.499Z · LW · GW · 10 commentsContents
1. Introduction 1.1. Three aggregative principles 1.2. Living Every Life Once 1.3. Harsanyi's Lottery 1.4. Rawls' Original Position 1.5. Structural similarities 2. Formalising LELO, HL, and ROI 2.1. Personal and social outcomes 2.1. Formalising LELO 2.2. Formalising HL 2.3. Formalising ROI 2.4. Analysis 3. Monads and aggregative principles 3.1. Monads formalise collections 3.2. Algebras formalise aggregations. 3.3. A general aggregative principle 4. Algebraic structures on personal outcomes 4.1. Personal outcomes as monoid Example 1 Example 2 Example 3 Example 4 4.2. Personal outcomes as convex space Example 5 Example 6 Example 7 4.3. Personal outcomes as semilattice Example 8 Example 9 Example 10 Conclusion None 10 comments
1. Introduction
1.1. Three aggregative principles
This article examines aggregative principles of social justice. These principles state that a social planner should make decisions as if they will face the aggregated personal outcomes of every individual in the population. Different conceptions of aggregation generate different aggregative principles.
Aggregative principles avoid many theoretical pitfalls of utilitarian principles. Unlike utilitarianism, aggregative principles do not require specifying a social welfare function, which is notoriously intractable. Moreover, they seem less prone to counterintuitive conclusions such as the repugnant conclusion or the violation of moral side constraints.[1]
There are three well-known aggregative principles:
- Live every life once (LELO)
- Harsyani's Lottery (HL)
- Rawls' Original Position (ROI)
By the end of this article, we will see that these three aggregative principles are instances of an vast family of similar principles.
1.2. Living Every Life Once
The idea, as articulated below by William MacAskill, is that a social planner should make decisions as if they will live out every individual's life (past, present, and future) in sequence. We will call this principle of social justice "Live Every Life Once" (LELO).[2]
Imagine living the life of every human being who has ever existed — in order of birth. Your first life begins about 300,000 years ago in Africa. After living that life and dying, you travel back in time to be reincarnated as the second-ever person, born slightly later than the first, then the third-ever person, and so on.
[...] If you knew you were going to live all these future lives, what would you hope we do in the present? How much carbon dioxide would you want us to emit into the atmosphere? How careful would you want us to be with new technologies that could destroy, or permanently derail, your future? How much attention would you want us to give to the impact of today’s actions on the long term?
- William MacAskill (2022), "The Case for Longtermism"
MacAskill's hope is that the social planner, following LELO, would choose policies benefiting each individual because they anticipate living each individual's life, and they would avoid policies harming any individual for the same reason. For example, the social planner wouldn't choose to emit dangerous pollution that will harm the health of future generations, because the social planner anticipates suffering the consequences themselves, although delayed by a many millennia.
MacAskill's thought experiment bares a striking similarity to two other thought experiments in social ethics — namely, Harsanyi's Lottery and Rawls' Original Position.
1.3. Harsanyi's Lottery
The economist John C. Harsanyi offers a different principle of social justice: a social planner should make decisions as if they faced a hypothetical lottery over the personal outcomes of each individual in society. This lottery would assign a likelihood to each individual, so the social planner wouldn't be sure which individual's life they will face. For example, they may face a 20% chance of being individual A, a 35% chance of being B, and so on. The ignorance is meant to force an impartial perspective for making decisions, a feature of social justice. We will call this principle of social justice "Harsanyi's Lottery" (HL).[3]
Harsanyi's hope is that the social planner, following HL, would choose policies benefiting each individual because there is some nonzero probability that they face that individual's life. They would also avoid policies harming any individual for the same reason. For instance, they would not choose to impoverish the majority of society for a small gain to a minority, because the expected value of the corresponding lottery of outcomes is negative.
Typically, the hypothetical lottery is taken to be uniform over all individuals in society. This uniformity assumption is crucial for ensuring impartiality: the social planner would not rationally prioritize any one individual over another if they have an equal probability of being each person.
1.4. Rawls' Original Position
The philosopher John Rawls' offers a third principle of social justice, similar to Harsanyi's Lottery.[4] His principle states that a social planner should make decisions as if they were ignorant about which individual in society they will be. We will call this principle of social justice "Rawls' Original Position" (ROI).[5]
Rawls' hope is that the social planner, following ROI, would choose policies benefiting each individual because they must consider the possibility that they could be any individual. They would also avoid policies harming any individual for the same reason. For example, the social planner wouldn't choose to torture someone, even to greatly benefit the rest of society, because the social planner must consider the possibility that they will end up being that person.
HL and ROI share obvious similarities: both principles ask the social planner to imagine themselves in a state of ignorance about which individual's personal outcome they will face. However, they understand this ignorance in different ways. Under HL, the ignorance is probabilistic, with likelihoods attached to the alternatives. By contrast, under ROI, the ignorance is possibilistic, meaning the planner considers it possible that they could be any individual, without assigning probabilities to those possibilities. This situation (i.e. having no basis for assigning probabilities to the possible alternatives) is sometimes called Knightian uncertainty. Moreover, note that HL proposes a physical mechanism by which the individual is selected, namely, a random lottery. By contrast, ROI merely states that each individual might be selected, without specifying any physical mechanism.
1.5. Structural similarities
The similarity between HL and ROI was apparent to Harsanyi and Rawls.[6] On the other hand, HL and ROI seem, at first glance, quite distinct from LELO. Firstly, Harsanyi's and Rawls' principles both begin with a planner in a state of ignorance about which individual's personal outcome they will face, whereas LELO posits no such uncertainty: so long as the planner knows the personal outcomes of each individual and the ordering of the individual's births, then their hypothetical fate is certain. Moreover, LELO asks the social planner to contemplate an abnormal prospect, i.e. a lifetime spanning millennia, whereas HL and ROI involve prospects that actual individuals in society will face.
However, these three principles are structurally similar: LELO, HL and ROI each involve aggregating the prospects faced by the individuals into a single hypothetical prospect faced by the social planner. They differ in the mode of aggregation they employ: LELO aggregates via a concatenation, HL via a lottery, and ROI via a disjunction. This common aggregative structure appears to be underexplored in the existing literature.
I will call principles of this general form — defining social justice in terms of an aggregation of individual prospects — aggregative principles of social justice. LELO, HL, and ROI are three examples, but they do not exhaust the space of aggregative principles. In fact, for any well-defined mode of aggregation, we can generate a corresponding aggregative principle. The space of aggregative principles is large and underexplored.
The rest of the article is organized as follows. Section 2 formalises LELO, HL, and ROI in parallel, highlighting the structural similarity. Section 3 formalises the informal notion of a "mode of aggregation" with the mathematical concept of monads, and presents a full characterization of the space of aggregative principles. This is the key contribution of the article. Section 4 explores examples of the algebraic structures on personal outcomes that are necessary for the aggregative principles to be well-defined.
2. Formalising LELO, HL, and ROI
2.1. Personal and social outcomes
Each of LELO, HL, and ROI attempt to extend the planner's self-interested attitudes towards personal outcomes to moral attitudes towards social outcomes. They achieve this by assigning to each social outcome a hypothetical personal outcome , and then stating that the social planner should treat as they would treat . In other words, a social outcome is deemed socially desirable if the corresponding personal outcome is personally desirable.
Let be the space of personal outcomes and be the space of social outcomes. By "personal outcome", I mean a full description of the state-of-affairs for a single individual, and by "social outcome", I mean a full description of the state-of-affairs for society as a whole. Each aforementioned principle of social justice proposes a function assigning to each social outcome a hypothetical personal outcome . However, they differ on the function they propose:
LELO uses the function where is the personal outcome of facing a concatenation of the lives of the individuals facing social outcome . For instance, if consists of three individuals facing personal outcomes , , and then is the personal outcome of first facing , then , then in sequence. We'll denote this outcome by .
HL uses the function where is the personal outcome of facing a lottery among the personal outcomes of the individuals in social outcome . For instance, if consists of three individuals facing personal outcomes , , and then is the personal outcome of facing each outcome with equal likelihood . We'll denote this outcome by .
ROI uses the function where is the personal outcome of possibly facing the personal outcome of any individual in social outcome . For instance, if consists of three individuals facing personal outcomes , , and respectively, then is the personal outcome of facing either , , or , but without any probabilities attached to these possibilities. We'll denote this outcome by .
It remains to define these three functions, , , and . For simplicity, let's assume that all social outcomes share a fixed, finite population of individuals, represented by the set . This assumption could be relaxed in future work, to handle populations that vary across social outcomes. Moreover, let's assume that the personal outcome for each individual is fully determined by the social outcome. Formally, there exists a function such that, if the social outcome obtains, then each individual faces the personal outcome . We will treat this as a global assumption, not localised to any particular principle of social justice.
For example, might be the set of all possible physical configurations of the universe across time, while might be the set of an individual's possible health and economic outcomes. However, the precise definitions of and are not crucial for the present analysis. Conceptually, represents the domain of personal, self-interested preferences, while represents the domain over which we seek to define social or ethical preferences. In Section 4, we will give concrete examples of these two spaces.
If is already well-understood, there is a simple way to define and . We could define to be the space of functions from individuals to personal outcomes, and let be the standard evaluation function, mapping the pair to . Intuitively, a social outcome is just a vector specifying each individual's personal outcome, and simply looks up individual 's outcome in this vector. However, I have opted for a more general presentation in which social outcomes are not entirely characterized by the personal outcomes of individuals. This allows for the possibility that contains information beyond just the vector of personal outcomes. Consequently, there may exist distinct social outcomes such that for all .
If we are provided with the function , how might we construct the target function ? As we will see, the key to doing this is to assume some additional algebraic structure on the space , beyond it just being an abstract set. I will explain how this construction occurs in each aggregative principle, starting with Live Every Life Once.
2.1. Formalising LELO
Informally, this whole procedure can be summarized as follows: (1) The population is represented by a list of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from lists of individuals to the corresponding lists of personal outcomes, and then applied to the list representing the population. Hence each social outcome provides a list of personal outcomes. (3) Any list of personal outcomes can be concatenated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) For any set , a list over is a finite sequence where all entries are elements of . The set of all lists over is denoted by . This includes the empty list, denoted by [], or lists with repeated entries. Note that a list is more than just a set, it also imposes an ordering on the individuals.[7]
LELO must assume that the population is represented by a distinguished list of individuals . Typically, this list consists of all humans ordered by their birth, although alternative orderings could be considered.
(2) As discussed before, there exists a function such that, if the social outcome obtains, then each individual faces the personal outcome . It follows that each social outcome provides a function from individuals to their personal outcomes, where denotes the function .
Now, any function from individuals to their personal outcomes can be lifted to a function from lists of individuals to the corresponding lists of personal outcomes. Concretely, sends a list to the list , by applying componentwise. This lifting operation is a general feature of lists.
Hence, each social outcome provides a list of personal outcomes , obtained by lifting to a function and then applying to the distinguished list of individuals .
(3) LELO assumes that any list of personal outcomes can be concatenated into a single personal outcome. Formally, there exists a function which reduces any list of personal outcomes into a single personal outcome .
To align with MacAskill's intended interpretation, we should view as the personal outcome of facing each in order, starting with and ending with . Perhaps after each life ends, one is instantaneously transported to the beginning of the life with one's memories of the proceeding life wiped. The process of living a life, dying, memory wiping, and moving to the next life is repeated until the full list of outcomes is exhausted.
It is worth noting that the concatenation operator can equivalently be presented by a binary operator and a constant element , provided and satisfy the monoid axioms of associativity and identity.[8] Specifically, given , define as and define as . Conversely, given and , define as , evaluating the products left-to-right.
The monoid axioms are:
- Associativity:
- Identity: and
(4) Putting it all together, each social outcome provides a single personal outcome, namely , obtained by applying the concatenation operator to the list of personal outcomes generated by . This defines the LELO aggregation function , which assigns to each social outcome the concatenated personal outcome.
To illustrate, suppose the population is represented by the list , and suppose the social outcome assigns personal outcome to individual for each , i.e. . Then the concatenated personal outcome is . As a sanity check, consider the trivial case where consists of a single individual. Here the population list is just , and the concatenated outcome is simply , i.e. the personal outcome assigned to the sole individual .
The ordering of the distinguished list affects the structure of the concatenated outcome, due to the non-commutativity of the binary concatenation operator . In general, and yield different personal outcomes. The choice of ordering has substantive implications for the resulting principle of social justice. For example, suppose the social planner has a positive rate of time preference, i.e. they discount the value of future experiences. This is a realistic assumption about human preferences. A LELO principle using a chronological ordering of individuals (from earliest-born to latest-born) will prioritize the interests of earlier generations compared to a principle using the reverse-chronological ordering, all else being equal. More formally, suppose the social planner has a utility function over personal outcomes and a discount factor . Then the utility of a concatenated outcome is given by where maps each personal outcome to its duration. This discounting formula places more weight on the first outcome than the second outcome , and the difference grows exponentially with the duration of . Thus, the social planner's time preferences, combined with the ordering of the list, can lead to a "tyranny of the earlier" in the resulting principle of social justice.
Next, I will turn to Harsanyi's Lottery, the earliest of the three aggregative principles of social justice.
2.2. Formalising HL
The procedure is similar to LELO: (1) The population is represented by a distribution of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from distributions of individuals to the corresponding distributions of personal outcomes, and then applied to the distribution representing the population. Hence each social outcome provides a distribution of personal outcomes. (3) Any distribution of personal outcomes can be interpolated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) For any set , a distribution over is a function such that the support set is finite and . We will sometimes use the notation to denote a distribution satisfying for each . For example, and denote the same distribution satisfying and .
The set of all distributions over is denoted by . This includes the point-mass distributions, denoted by for each , or uniform distributions . Note that a distribution is more than just a set, it also imposes a weighting on the individuals.
HL must assume that the population is represented by a distinguished distribution over individuals . Typically, the distinguished distribution is taken to be uniform over the entire population. That is, if there are individuals in total, then . However, alternative weightings could be considered.
(2) As with LELO, there exists a function such that, if the social outcome obtains, then each individual faces the personal outcome . It follows that each social outcome provides a function from individuals to their personal outcomes, where denotes the function .
Now, any function from individuals to their personal outcomes can be lifted to a function from distributions of individuals to the corresponding distributions of personal outcomes. Concretely, sends a distribution to the distribution . This lifting operation is a general feature of distributions.
Hence, each social outcome provides a distribution of personal outcomes , obtained by lifting to a function and then applying to the distinguished distribution of individuals .
(3) HL assumes that any distribution of personal outcomes can be interpolated into a single personal outcome. Formally, there exists a function which reduces any distribution of personal outcomes into a single personal outcome .
To align with Harsanyi's intended interpretation, we should view as the personal outcome of facing each with probability . Perhaps a random outcome is sampled according to the distribution and the individual then faces that outcome. In contrast with LELO, the individual ultimately faces only a single human lifetime.
It is worth noting that the interpolation operator can equivalently be presented by a family of binary operators , one for each , provided satisfies the convex space axioms of idempotence, skew-commutativity, and skew-associativity.[9] Specifically, given , define as . Conversely, given the family of operators , we can rather clumsily define by induction on :
The convex space axioms are:
- Idempotence:
- Skew-commutativity:
- Skew-associativity: whenever
(4) Putting it all together, each social outcome provides a single personal outcome, namely , obtained by applying the interpolation operator to the distribution of personal outcomes generated by . This defines the HL aggregation function , which assigns to each social outcome the interpolated personal outcome.
To illustrate, suppose the population is represented by the uniform distribution . Suppose further that the social outcome assigns personal outcome to individual for each , i.e. . Then the interpolated personal outcome is . As a sanity check, consider the trivial case where consists of a single individual. Here the initial distribution is the point mass and the interpolated outcome is simply , i.e. the personal outcome assigned to the sole individual .
The weighting of the distinguished distribution affects the structure of the interpolated outcome, because and typically yield different personal outcomes when . The choice of weighting has substantive implications for the resulting principle of social justice. If the HL principle uses a non-uniform distribution then the social planner with prioritize the individuals who are assigned a greater weighting, and this favoritism towards the higher-weighted group grows as the weighting distribution becomes more uneven.
Suppose there is a fixed amount of resources to be distributed among the population. Furthermore, suppose the resource yields diminishing marginal returns, i.e. the social planner's utility function over resources is strictly concave. This is a realistic assumption about human preferences. Following HL, the social planner will allocate resources to maximise the expected value of the corresponding lottery. Formally, the social planner chooses to maximize subject to the constraint . In the optimal allocation, , the marginal utility of resources is inversely proportional to an individual's weight: . Therefore individuals with a larger weight will receive more resources: if then so the optimality condition implies and the strict concavity implies . In the special case where is logarithmic, the resources allocated to an individual will be directly proportional to their weight. Thus, the social planner's preferences, combined with the weights of the distribution, can lead to a "tyranny of the majority" in the resulting principle of social justice.
Finally, I will turn to Rawls' Original Position, the most famous aggregative principle of social justice.
2.3. Formalising ROI
The procedure is similar to LELO and HL: (1) The population is represented by a nonempty finite subset of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from nonempty finite subsets of individuals to the corresponding nonempty finite subsets of personal outcomes, and then applied to the subset representing the population. Hence each social outcome provides a nonempty finite subset of personal outcomes. (3) Any nonempty finite subset of personal outcomes can be fused into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
I'll now spell out the details.
(1) ROI must assume that the population is represented by a nonempty finite subset of individuals . For any set , let denote the nonempty finite subsets of . This is a standard notation, where stands for powerset, the superscript stands for nonempty and the subscript stands for finite.
Note that carries no additional structure beyond being a set — unlike the list used in LELO, it carries no ordering, and unlike the distribution used in HL, it carries no weightings. Typically, is assumed to be the universal set itself, representing all individuals. However, Rawls suggests that alternative subsets could be considered, such as the set of "Heads of Families" or "presently existing people".
(2) As with LELO and HL, there exists a function such that, if the social outcome obtains, then each individual faces the personal outcome . It follows that each social outcome provides a function from individuals to their personal outcomes, where denotes the function .
Now, any function from individuals to their personal outcomes can be lifted to a function from nonempty finite subsets of individuals to the corresponding nonempty finite subsets of personal outcomes. Concretely, sends a subset to the subset , by applying elementwise. This lifting operation is a general feature of nonempty finite subsets.
Hence, each social outcome provides a distribution of personal outcomes , obtained by lifting to a function and then applying to the distinguished subset of individuals .
(3) ROI assumes that any nonempty finite subset of personal outcomes can be fused into a single personal outcome. Formally, there exists a function which reduces any nonempty finite subset of personal outcomes into a single personal outcome .
To obtain Rawls' principle of social justice, we should interpret as the personal outcome where one might face any of the outcomes , but without any information about which outcome is more likely. That is, the fusion operator acts like a disjunction between the personal outcomes — for example, if is the outcome of eating vanilla ice cream and is the outcome of eating chocolate ice cream, then is the outcome of eating either vanilla or chocolate ice cream, with no probabilities attached. One could imagine that the exact prospect is selected by a third-party, maybe an adversary who selects the worse option or a benefactor who selects the best option.
It is worth noting that the fusion operator can equivalently be presented by a binary operator , provided satisfies the axioms of a semilattice.[10] Specifically, given , define as . Conversely, given , define as .
The semilattice axioms are:
- Idempotence:
- Commutativity:
- Associativity:
(4) Putting it all together, each social outcome provides a single fused personal outcome, namely , obtained by applying the fusion operator to the nonempty finite subset of personal outcomes generated by . This defines the ROI aggregation function , which assigns to each social outcome the fused personal outcome.
To illustrate, suppose the population is represented by the universal subset . Suppose further that the social outcome assigns personal outcome to individual for each , i.e. . Then the fused personal outcome is . As a sanity check, consider the trivial case where consists of a single individual. Here the population subset is the singleton , and the fused outcome is simply , i.e. the personal outcome assigned to the sole individual .
The choice of the distinguished subset affects the structure of the concatenated outcome. For example, suppose the social planner is pessimistic, evaluating the fused outcome as no better than the worst of the individual outcomes and . Formally, if the planner's preferences are represented by a utility function , this means assuming . This is a realistic assumption of decision-making under Knightian uncertainty, where the planner considers the worst-case scenario.[11] Under this assumption, the resulting ROI principle will be sensitive to the worst-off individuals in the population subset , leading to a 'tyranny of the unfortunate'.
Unlike HL, ROI is scope-insensitive [? · GW] due to the idempotence of the fusion operation , meaning . This implies that the fused outcome is insensitive to the number of individuals facing each personal outcome, and depends only on which personal outcomes are faced at all. For a stark illustration, suppose that a social outcome contains 100 individuals facing great wealth () and 1 facing abject poverty (). ROI yields the same fused outcome as a social outcome with 1 individual facing wealth and 100 facing poverty. The drastically different proportions of individuals are irrelevant; only the presence or absence of each outcome matters.
2.4. Analysis
LELO, HL, and ROI share a common structure, differing only in the specific mathematical objects used. They represent populations using some type of collection: lists for LELO, distributions for HL, and subsets for ROI. And they aggregate personal outcomes using some mode of aggregation: concatenation for LELO, interpolation for HL, and fusion for ROI. This suggest that LELO, HL, and ROI are instances of a general family of aggregative principles, obtained by varying the type collection and mode of aggregation. In the next section, I will show that this is true.
3. Monads and aggregative principles
The key difference between LELO, HL, and ROI lies in their mode of aggregation. In Section 3, we will formalise this informal notion of a "mode of aggregation", and thereby find the general family of aggregative principles.
3.1. Monads formalise collections
The concept of a monad originates in category theory, and has found extensive applications in functional programming languages like Haskell. While category theory lies beyond the scope of this article, monads can be understood concretely as formalising the notion of a "collection". The core idea is that monads allow for operations on the elements to be lifted to operations of the collections, in a way that preserves certain intuitive properties.
Formally, a monad consists of four components:
- assigns to each set another set, denoted , which we interpret as the collections over . This is called the construct operator.
- assigns to each function between sets to another function, denoted , between their corresponding collections. This lifting operation formalizes the idea that we can apply a function to each element within a collection independently. For example, if maps each child to their birthday, then maps each collection of children to the corresponding collection of birthdays. This is called the lift operator.
- assigns to each set a function . This encodes the idea that each element can be viewed as a "trivial" or "singleton" collection . This is called the unit operator.
- assigns to each set a function . Intuitively, takes a 'collection of collections' and 'flattens' it into a single collection . This is called the multiplication operator.
These components must satisfy certain coherence conditions, known as the monad laws:
- Associativity: for all objects . This ensures that flattening a collection of collections of collections is the same, regardless of the order in which we do the flattening.
- Left unit: for all objects . This ensures that wrapping a collection in a singleton and then flattening is the same as doing nothing.
- Right unit: for all objects . This ensures that mapping each element of a collection to a singleton and then flattening is the same as doing nothing.
For the full technical details, see Mac Lane (1971) "Categories for the Working Mathematician".
The three types of collections we've encountered so far — lists, distributions, and nonempty subsets — are formalised by monads.
For example, the list monad has these four components:[12]
- Construct operator , the set of finite lists with elements in .
- Lift operator, , which applies to each component.
- Unit operator which creates singleton lists,
- Multiplication operator which concatenates a list of lists into a single list.
And the distribution monad has these four components:[13]
- Construct operator , the set of probability distributions over , i.e. functions such that .
- Lift operator , which takes a distribution on and returns the pushed-forward distribution on , i.e. the distribution of when is sampled from . Intuitively, it marginalizes out the randomness in .
- Unit , which creates a point-mass distribution at , i.e. the distribution that always returns with probability 1.
- Multiplication , which takes a distribution over distributions on , and returns the average of those distributions weighted by . Intuitively, it collapses a two-stage sampling process (first sample a distribution from , then sample an element from ) into a single-stage sampling process.
And finally, the nonempty powerset monad has these four components:
- Type constructor , the set of finite nonempty subsets of .
- Lift operation , which applies to each element of the set .
- Unit , which creates a singleton set containing .
- Multiplication , which takes a set of sets and returns their union.
Whenever you encounter an informal concept of a collection, it will typically be formalizable as a monad. Let's take the finite multiset: intuitively, a multiset is a collection that allows multiple instances of each element, but where the order doesn't matter. Formally, for any set , a finite multiset on is a function , where represents the number of occurrences of element . The multiset is finite if there are finitely many with . The set of all finite multisets on is denoted . Now, elements of are intuitively collections over . Sure enough, the assignment is a monad, which we call the finite multiset monad . The definitions of , , and are similar to those for .[14]
3.2. Algebras formalise aggregations.
Algebraic structures are ubiquitous in mathematics: monoids, groups, rings, vector spaces, lattices, and so on. Informally, an algebraic structure is a set equipped with some operations (like addition, multiplication, etc.) that satisfy certain axioms (like associativity, commutativity, etc.). A core insight from category theory is that each type of algebraic structure corresponds to a monad.
As discussed earlier, each monad captures a general notion of a "collection" of elements. An algebra of is a way to aggregate any collection of those elements into a single element. Formally, given a monad , an -algebra is a set equipped with a function satisfying two laws:
- (unit law)
- (associativity law)
Intuitively, the unit law says that aggregating a singleton collection should just return the element itself. The associativity law says that aggregating a collection of collections can be done in two equivalent ways: first flatten the nested collections using and then aggregate the resulting collection using ; or first aggregate each inner collection using (this is what does), and then aggregate the resulting outer collection using again.
- For example, an algebra for the monad is a set equipped with an operator specifying how to aggregate any list of elements into a single element. A algebra is called a monoid.
- Similarly, an algebra for the monad is a set equipped with an operator specifying how to aggregate any distribution of element into a single element. A algebra is called a convex space.
- Finally, an algebra for the monad is a set equipped with an operator specifying how to aggregate any nonempty finite subset of element into a single element. A algebra is called a semilattice.
Each algebraic structure corresponds to a monad. For example, consider the most important algebraic structure: the vector space. The relevant monad assigns to each set the set of functions with for only finitely many . For example, if is the set then a typical element of might look like . An algebra for the monad is precisely a vector space: a set equipped with a function satisfying the appropriate unit and associativity laws. This definition captures the essence of a vector space — the ability to aggregate arbitrary linear combinations — with a single operation .
3.3. A general aggregative principle
As promised, we can now formulate a general family of aggregative principles using the language of monads and algebras. Each principle has the following form: a social planner should make decisions as if they will face the aggregate of the personal outcomes across all individuals in the population.
Informally, this whole procedure can be summarized as follows: (1) The population is represented by a distinguished collection of individuals. (2) Each social outcome provides a function from individuals to their personal outcomes. This function can be lifted to a function from collections of individuals to the corresponding collections of personal outcomes, and then applied to the collection representing the population. Hence each social outcome provides a collection of personal outcomes. (3) Any collection of personal outcomes can be aggregated into a single personal outcome. (4) Therefore, each social outcome provides a single personal outcome.
(1) Let be any monad, assigning to every set another set of collections over . We must assume that the population is represented by a distinguished collection . Typically, is chosen to represent the entire population impartially, although non-impartial collections could also be considered.
(2) As discussed before, there exists a function such that, if the social outcome obtains, then each individual faces the personal outcome . It follows that each social outcome provides a function from individuals to their personal outcomes, where denotes the function .
Now, any function from individuals to their personal outcomes can be lifted to a function from collections of individuals to the corresponding collections of personal outcomes. This lifting operation is a general feature of monads.
Hence, each social outcome provides a collection of personal outcomes , obtained by lifting to a function and then applying to the distinguished collection of individuals .
(3) We assume that any collection of personal outcomes can be aggregated into a single personal outcome. Formally, there exists a function which reduces any collection of personal outcomes into a single personal outcome .
A key requirement for obtaining a normatively compelling principle of social justice is that the aggregation function is "monotonic". That is, aggregating more desirable personal outcomes should yield a more desirable result than aggregating less desirable personal outcomes, as judged by the the self-interested social planner. This feature incentivizes the social planner to choose policies that benefit individuals in society, and to avoid policies that harm individuals, all else being equal.
(4) Putting it all together, each social outcome provides a single aggregated personal outcome, namely , obtained by applying the aggregation operator to the collection of personal outcomes generated by . This defines the general aggregation function , which assigns to each social outcome the aggregated personal outcome.
As a sanity check, consider the trivial case where consists of a single individual. Here the population collection is the singleton , and the aggregated outcome is simply , i.e. the personal outcome assigned to the sole individual .
By varying the monad , the distinguished collection , and the aggregation function , one can capture a wide range of principles, including LELO, HL, and ROI as special cases.
4. Algebraic structures on personal outcomes
As we can see, the algebraic structures that exist on the personal outcomes constrain which aggregative principles are well-defined. In particular, the monad and aggregation function must be compatible, in the sense that defines an -algebra on the set of personal outcomes. In this section, we will explore some concrete examples of algebraic structures on personal outcomes — including monoids, convex spaces, and semilattices, which are required for LELO, HL, and ROI respectively. Some of these examples will be exotic, thereby generating novel aggregative principles of social justice.
This section is not intended to be exhaustive. Indeed, there are countless possible algebraic structures one could consider, and the choice of algebraic structure will depend on the phenomena under investigation.
4.1. Personal outcomes as monoid
How might we model personal outcomes such that they form a monoid, as required by LELO? Recall that LELO requires a concatenation operator . Equivalently, we seek a binary operator and a constant element satisfying the axioms of a monoid, as discussed in section 3.1.
Example 1
The simplest way to model personal outcomes as a monoid is for each personal outcome to be list over a fixed alphabet , i.e. . We can think of elements of as the discrete moments which constitute a human life. For example, might be the set of minute-long experiences — then a human life of 80 years would be modelled as a list of 42 million elements from .
Indeed has a monoid structure. In fact, this is the free monoid over , meaning it is the 'most general' or 'least constrained' monoid containing . The monoid operation is given by concatenation of lists: if and are two lists, then . The identity element is the empty list . This is the simplest type of monoid, and thus the natural starting point for modeling personal outcomes in the context of LELO.
Example 2
Alternatively, we can model personal outcomes in a more continuous way. Suppose each personal outcome is a pair where:
- is a duration.
- is a trajectory, assigning to each moment in time an instantaneous experience . Here denotes the left-open, right-closed real interval of length , and is some fixed set of possible instantaneous experiences. We might use this model if we want to track variables that change continuously over time, such as an individual's location.
We can define a monoid operation on by concatenating durations and 'switching' between trajectories. Formally, for and , we define to be the pair where is given by . The identity element is the pair where is the empty function to .[15]
We could also restrict the trajectories to be piecewise smooth, piecewise continuous, piecewise constant, or to satisfy any other reasonable piecewise condition. remains a monoid under these restrictions, because the concatenation of piecewise smooth (resp. continuous, constant) functions is again piecewise smooth (resp. continuous, constant).
Example 3
In the previous two examples, we've modelled personal outcomes as predetermined trajectories through some space of experiences, either discrete or continuous. However, these models assume that an individual's life trajectory is fixed in advance, which is often unrealistic. In reality, individuals make choices that shape the course of their lives over time. To capture this agency, we can model personal outcomes as environments that are actively guided by the individual's actions.
Suppose we model a personal outcome as an interactive environment consisting of:
- A set of actions the individual can produce.
- A set of observations the individual can receive.
- A function assigning to each action a probability distribution over observations.
We can define a monoid operation on the set by running the two environments and in parallel, where the individual simultaneously chooses actions and receives observations in both environments. Concretely, given and , we define their product to be the environment where:
- The set of actions is the Cartesian product
- The set of observations is the Cartesian product
- If and are transition functions, considered as functions and , then is their product, i.e. is defined by . Intuitively, receives a pair of actions and produces a pair observations by independently sampling from and from .
The identity element is the trivial environment with a single action and a single observation, i.e. and is the point distribution on .
Example 4
We can further extend the previous example by incorporating rewards. Suppose a personal outcome is modelled by:
- A set of actions
- A set of observations
- A function assigning to each action a joint distribution over observations and real-valued rewards Now represents the distribution on (observation, reward) pairs resulting from taking action . The goal is to choose actions over time so as to maximize the expected total reward.
As before, we can define a monoid operation on the set by running the two 'reward-augmented' environments and in parallel, where the individual simultaneously chooses actions and receives observations in both environments, except that now each environments also produces a reward. The rewards are summed and received by the individual. The identity element is the trivial environment with a single action and a single observation, i.e. and is the point distribution on .
4.2. Personal outcomes as convex space
We've seen how personal outcomes form a monoid, as required by LELO. Next let's turn to convex spaces, as required by HL. Recall that HL requires an interpolation operator . Equivalently, we seek a family of binary operators satisfying the axioms of a convex space, as discussed in section 3.2.
Example 5
The simplest way to model personal outcomes as a convex space is to take each outcome to be a probability distribution over some fixed set of alternatives , i.e. . For example, might be the set of possible life histories, where a life history specifies all the relevant details of a person's life from birth to death, such as their physical and mental states, relationships, major life events, achievements, etc. A personal outcome is then a probability distribution over these possible life histories.
Indeed has a convex structure. In fact, this is the free convex space over , i.e. the 'least constrained' convex space containing . The interpolation operators are given by the standard notion of interpolation of distributions. That is, if and are two distributions, then their -interpolation is the distribution defined by . This is the simplest type of convex space, and thus the natural starting point for modeling personal outcomes in the context of HL.
Example 6
Again, this is a model of personal outcomes which lacks any notion of individual agency. Personal outcomes are simply probability distributions over a fixed set of alternatives, with no room for individuals to make choices that affect their outcomes. To incorporate individual agency, we will again model a personal outcome as an interactive environment consisting of an action set , an observation set , and a function assigning to each action a probability distribution over observations.
To define an interpolation operation on personal outcomes in this setting, we use the idea of stochastic case handling. Given two personal outcomes and , define their -interpolation as follows:
- The action space is the Cartesian product , representing a choice of action from each of the original action spaces.
- The observation space is the disjoint union , representing either an observation from or from .
- The transition function interpolates between the original transition functions and in a way that respects the observation space structure. Intuitively, the interpolated outcome allows the individual to choose an action for each environment, and then randomly selects whether to run the environment or with likelihoods and respectively.[16]
Example 7
We could imagine interpolating between personal outcomes in a more direct way. For example, if is the personal outcome of winning , and is the personal outcome of winning , then is the personal outcome of winning . However, it's unclear how to extend this interpolation to personal outcomes lacking an inherently probabilistic or quantitative structure. For instance, suppose is the outcome of being happily married with two children and an unfulfilling career, while is the outcome of being single and childless but having a fulfilling career. It's unclear how to meaningfully define an outcome "50% between them".
One approach is to represent personal outcomes as vectors in a high-dimensional real vector space such as . Here is some large number, potentially hundreds or thousands. The benefit of a vector representation is that the space of personal outcomes inherits the natural convex structure of . Concretely, for any two outcome vectors and any weight , we define the -interpolation as the weighted average .
Intuitively, if the dimensions of correspond to relevant features of the outcome, then the interpolated outcome has intermediate feature values between those of and . The relative influence of and is controlled by the weight . For the vector representation to be useful, it must encode all the important information about the outcome in a structured format (e.g. ensuring that similar outcomes map to similar vectors). This is a nontrivial challenge. Many important features, such as happiness, fulfilment, and relationships, are difficult to measure numerically.
One trick to obtain vector representations of personal outcomes could be to leverage the semantic knowledge embedded in a large pretrained language model like GPT-3. In particular, the activation space of a pretrained model can represent general semantic concepts, including personal outcomes, and comes equipped with a convex structure. Using this convex structure, we obtain the following aggregative principle: a social planner should make decisions as if they will face the average personal outcome across all individuals, where the averaging is performed in the activation space of the language model.[17]
Whether this aggregative principle is appropriate will depend on how personal outcomes are represented within the activations of GPT-3. In particular, we desire the monotonicity property. That is, if the interpolation is less desirable than the interpolation then there exists some less desirable than . Monotonicity would ensure that a social planner following this aggregative principle will, all else being equal, tend to choose policies that benefit individuals and avoid policies that harm individuals
The dimensionality of the latent space controls the level of detail captured about personal outcomes. The extreme cases are problematic:
- If , then the latent space collapses to a single dimension. If the vector representation maps each personal outcome to its cardinal utility , then comparing the averages of these one-dimensional vectors recovers classical utilitarianism. However, this faces the problem of interpersonal comparisons — there's no tractable method for determining the cardinal utility for each personal outcome. Moreover, the representation of the resulting personal outcome is a single number, which is difficult for humans to reason about concretely. Most humans would struggle to imagine what an outcome with "0.5 utility" would be like, as it lacks any information about the qualitative features of the outcome.
- If , where is the set of all possible life histories, then the latent space has one dimension corresponding to each possible life history. Suppose that the vector representation maps each personal outcome , represented as a lottery over possible life histories, to the vector of probabilities . Then we recover Harsanyi's Lottery. However, for large populations this lottery is intractable to reason about, because there are astronomically many possible outcomes.
- An intermediate value, such as , strikes the best balance between expressiveness and tractability, as the dimensionality is:
- Large enough to capture the features of outcomes that humans care about, such as a person's happiness, relationships, accomplishments, etc. The resulting vector representation is cognitively meaningful.
- Small enough that the vectors can be tractably compared, even for large populations, as we need only compare the summary vectors rather than the full lotteries. This scalability allows the framework to be applied to real-world policy decisions involving numerous stakeholders.
4.3. Personal outcomes as semilattice
We've seen how personal outcomes form a monoid or convex space, as required by LELO and HL respectively. Next let's turn to convex spaces, as required by ROI. Recall that ROI requires a fusion operator . Equivalently, we seek a of binary operator satisfying the axioms of a semilattice, as discussed in section 3.3.
Example 8
The simplest way to model personal outcomes as a semilattice is to take each outcome to be a nonempty finite subset of a fixed set of alternatives , i.e. . might be the set of possible life histories, specifying all the relevant details of a person's life from birth to death. A personal outcome is a state where any of the alternatives are possible, without specifying their likelihoods or the mechanism that will select among them.
Indeed has a semilattice structure. In fact, this is the free semilattice over , i.e. the 'least constrained' semilattice containing . The fusion operators are given by the standard union between sets. That is, if and are two subsets, then their fusion is the subset defined by . This recovers the disjunctive reading of the fusion operator. For example, if represents the outcome of having either vanilla or chocolate ice-cream, and represents the outcome of having either chocolate or strawberry ice-cream, then their fusion represents the outcome of having either vanilla, chocolate or strawberry ice-cream. This is the simplest type of semilattice, and thus the natural starting point for modeling personal outcomes in the context of ROI.
Example 9
Alternatively, we could interpret fusion as conjunction rather than disjunction: if is the outcome of playing tennis and is the outcome of listening to Bach, then is the outcome of simultaneously playing tennis and listening to Bach. In the conjunctive interpretation, we take the elements of to be specifications or properties about personal outcomes. A personal outcome is represented by a subset of , where contains exactly those specifications that the outcome satisfies. Fusion is still defined as set union, i.e. . The fused outcome will satisfy a specification if and only if at least one of or satisfies it.
For the fusion operation to always yield a coherent personal outcome, we require any finite subset of specifications in to be mutually consistent. This is a very strong assumption that rules out the vast majority of possible sets of specifications. For example, "has a PhD" and "has no higher education" cannot both be specifications in . Moreover, even if we could represent personal outcomes with a space of mutually consistent specifications, the resulting aggregative principle of social justice would likely fail to match our moral judgments. The problem is that the hypothetical prospect of "living every life simultaneously" is so alien that the social planner's preferences about it are unlikely to track anything normatively relevant.
Example 10
As discussed previously, we can represent personal outcomes as vectors in a high-dimensional real vector space such as . The benefit of a vector representation is that the space of personal outcomes inherits a natural semilattice structure of . Concretely, for any two outcome vectors we can define their fusion by the taking the coordinatewise maximum: for .
Intuitively, if the dimensions of correspond to degrees or intensities of different attributes, then the fused outcome has each attribute at the higher of the two degrees from and . For example, consider feature vectors with dimensions for wealth, sickness, and number of children. Fusing two such vectors would yield an outcome with the wealth of the wealthier individual, the sickness of the sicker individual, and the greater number of children. This example illustrates that the choice of vector representation substantively changes the resulting aggregative principle of social justice.
As discussed in the previous section, one approach to obtaining semantically meaningful vector representations of personal outcomes is to leverage the internal activations of a large language model like GPT-3. However, unlike the convex combination approach discussed earlier, defining the fusion operator via the coordinatewise maximum has a limitation when applied to language model embeddings. Namely, this fusion operator is not rotation-invariant, meaning the aggregative principle would depend on the basis in the model's activation space. To amend this issue, we might learn change-of-basis transformations from the model's activation space to a new embedding space where coordinatewise maximum yields an appropriate principle of social justice.
Conclusion
In this article, we examined aggregative principles of social justice, i.e. principles stating that a social planner should make decisions as if they will face the aggregated personal outcomes of every individual in the population. We saw three well-known examples — Live Every Life Once (LELO), Harsanyi's Lottery (HL), and Rawls' Original Position (ROI). After introducing the mathematical concept of a monad, we constructed a general family of aggregative principles.
Finally, we explored several concrete examples of algebraic structures on personal outcomes, with natural interpretations as monoids, convex spaces, and semilattices. The generality of the framework allowed for the development of novel principles, beyond those already discussed in the literature. For instance, we considered modeling personal outcomes as:
- Trajectories through a space of experiences, either discrete or continuous
- Interactive environments that are actively guided by an individual's actions
- High-dimensional vectors, with the dimensions corresponding to relevant features of the outcome.
In conclusion, aggregative principles offer a fruitful strategy for specifying principles of social justice. In my next article [? · GW], I prove that, under natural conditions of human rationality, aggregative principles will approximate utilitarian principles. Therefore, even though aggregativism avoids the theoretical pitfalls of utilitarianism, we should nonetheless expect aggregativism to generate roughly-utilitarian recommendations in practical social contexts, and thereby retain the most appealing insights from utilitarianism.
- ^
See Appraising aggregativism and utilitarianism [LW · GW] for a thorough defence.
- ^
The term LELO originates in Loren Fryxell (2024), "XU", which is where I first encountered the concept. I think Fryxell offers the first formal treatment of the LELO principle.
MacAskill (2022), "What We Owe the Future", says this thought experiment comes from Georgia Ray (2018), “The Funnel of Human Experience”, and that the short story Andy Weir (2009), "The Egg", shares a similar premise.
But (as Elliott Thornley notes), Roger Crisp attributes LELO to C.I. Lewis. This would predate both Ray and Weir, but I haven't traced the reference. - ^
John C. Harsanyi "Cardinal Utility in Welfare Economics and in the Theory of Risk-Taking" (1953) and "Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility" (1955)
- ^
John Rawls (1971), "A Theory of Justice"
- ^
- ^
John Harsanyi (1975) "Can the Maximin Principle Serve as a Basis for Morality? A Critique of John Rawls's Theory"
- ^
- ^
- ^
- ^
- ^
This is called the min-max principle in decision theory, and Murphy's law colloquially.
- ^
- ^
- ^
- ^
Note that the half-open interval is the empty set , because there are no real numbers , and for any set there is exactly one function which we call the empty function.
- ^
If and are transition functions, considered as functions and , then is defined by
- ^
Concretely, to assess a social outcome , the social planner should follow the following steps:
(1) Describe the personal outcome of each individual , e.g. "Alice lives a happy life as a successful doctor with a loving family."
(2) Run a forward pass of the language model on each prompt, without generating any new tokens, and extract the model internal activations. The choice of which specific activation to extract would be a hyperparameter to tune, but one natural choice is a hidden state of the model's residual stream. Overall, this gives some function where is the space of prompts and is the trained parameters of the model.
(3) For each individual , obtain a vector representation of their personal outcome by applying the function to their prompt. Compute the social outcome vector as a weighted average of the individual outcome vectors: .
(4) Interpret the social outcome vector by finding a natural language prompt such that is close to . This is a nontrivial inverse problem and may require heuristics. One approach is to perform gradient descent over the space of prompts to minimize a loss function . Here is the -norm, is the probability of under the language model, and is a hyperparameter controlling the relative importance of the two terms. Intuitively, this finds a prompt that has a vector representation close to and is likely under the language model.
When assessing the social outcome , the social planner should make decisions as if they will face the outcome described in , obtained in the procedure above.
10 comments
Comments sorted by top scores.
comment by EJT (ElliottThornley) · 2024-06-06T09:32:21.064Z · LW(p) · GW(p)
Looking forward to reading this properly. For now I'll just note that Roger Crisp attributes LELO to C.I. Lewis.
Replies from: strawberry calm, strawberry calm, Bentery↑ comment by Cleo Nardo (strawberry calm) · 2024-06-22T00:13:19.240Z · LW(p) · GW(p)
Three articles, but the last is most relevant to you:
- Aggregative Principles of Social Justice [LW · GW] (44 min)
- Aggregative principles approximate utilitarian principles [LW · GW] (27 min)
- Appraising aggregativism and utilitarianism [LW · GW] (23 min)
↑ comment by Cleo Nardo (strawberry calm) · 2024-06-06T15:19:37.150Z · LW(p) · GW(p)
would be keen to hear your thoughts & thanks for the pointer to Lewis :)
↑ comment by Bentery · 2024-06-06T21:01:35.796Z · LW(p) · GW(p)
Another related, much older reference is from Ramsey's Truth and Probability (1926) in which he relates risk attitudes to preferences over repeated experiences (it's in the single person case however):
"We can put this in a different way. Suppose his degree of belief in is ; then his action is such as he would choose it to be if he had to repeat it exactly times, in of which was true, and in the others false. [Here it may be necessary to suppose that in each of the times he had no memory of the previous ones.]"
comment by cubefox · 2024-06-05T21:27:23.212Z · LW(p) · GW(p)
That's a daunting amount of formalization. I hope all this effort helps with aggregation paradoxes involving the creation of agents, i.e. with variants of the repugnant conclusion, to which you allude in the beginning. I guess we will see in your next post.
May I also suggest to share or crosspost this on the EA Forum, where problems in population ethics are discussed more frequently?
comment by Charlie Steiner · 2024-06-07T11:56:28.802Z · LW(p) · GW(p)
I'm not sure we should worry about generalizing with high-powered machinery. Pick a collection, and you can represent it with a monad. But pick a monad, and it's probably not a collection (I think?).
E.g. consider a "whoops I lost some of them monad" - for each set you choose some subset, plus an extra element (could call it {*}, as in the maybe monad). So if my original set is (1,2,3,4,5,6,7), there will be some whoops monad that maps this to (1,2,*). Functions work as normal except when they would involve * or lost elements, in which case they get mapped to *. Seems like a perfectly good monad, but it's the diametric opposite of a collection.
Replies from: strawberry calm↑ comment by Cleo Nardo (strawberry calm) · 2024-06-07T14:25:46.738Z · LW(p) · GW(p)
sorry i’m not getting this whoops monad. can you spell out the details, or pick a more standard example to illustrate your point?
i think “every monad formalises a different notion of collection” is a bit strong. for example, the free vector space monad (see section 3.2) — is a collection of the elements, for some notion of collection?
is every element of a free algebraic structure a “collection” of the generators? would you hear someone say that a quantum state is a collection of eigenstates? at a stretch maybe.
Replies from: Charlie Steiner↑ comment by Charlie Steiner · 2024-06-07T14:38:10.524Z · LW(p) · GW(p)
The identity monad probably works about as well as an illustration, but has less of the flavor of "not only did you not make this more like a collection, you made it worse" :P But advantage is you didn't need the axiom of choice to specify it.
Replies from: strawberry calm↑ comment by Cleo Nardo (strawberry calm) · 2024-06-07T15:02:17.412Z · LW(p) · GW(p)
note that there are only two exceptions to the claim “the unit of a monad is componentwise injective”. this means (except these two weird exceptions), that the singleton collections and are always distinct for . hence, , the set of collections over , always “contains” the underlying set . by “contains” i mean there is a canonical injection , i.e. in the same way the real numbers contains the rational .
in particular, i think this should settle the worry that “there should be more collections than singleton elements”. is that your worry?
Replies from: Charlie Steiner↑ comment by Charlie Steiner · 2024-06-07T15:07:04.348Z · LW(p) · GW(p)
I wouldn't say it's my worry exactly, but it does deal with the most forceful reasons for worrying, yeah.