# Complete Class: Consequentialist Foundations

post by abramdemski · 2018-07-11T01:57:14.054Z · score: 43 (16 votes) · LW · GW · 34 comments## Contents

Background My Motives Other Foundations Four Complete Class Theorems Basic CCT Removing Likelihoods (and other unfortunate assumptions) Utilitarianism Futarchy Conclusion None 34 comments

The fundamentals of Bayesian thinking have been justified in many ways over the years. Most people here have heard of the VNM axioms and Dutch Book arguments. Far fewer, I think, have heard of the Complete Class Theorems (CCT).

Here, I explain why I think of CCT as a more purely consequentialist foundation for decision theory. I also show how complete-class style arguments play a role is social choice theory, justifying utilitarianism and a version of futarchy. This means CCT acts as a bridging analogy between single-agent decisions and collective decisions, therefore shedding some light on how a pile of agent-like pieces can come together and act like one agent. To me, this suggests a potentially rich vein of intellectual ore.

I have some ideas about modifying CCT to be more interesting for MIRI-style decision theory, but I'll only do a little of that here, mostly gesturing at the problems with CCT which could motivate such modifications.

# Background

## My Motives

This post is a continuation of what I started in Generalizing Foundations of Decision Theory and Generalizing Foundations of Decision Theory II. The core motivation is to understand the justification for existing decision theory very well, see which assumptions are weakest, and see what happens when we remove them.

There is also a secondary motivation in human (ir)rationality: to the extent foundational arguments are* real reasons* why rational behavior is better than irrational behavior, one might expect these arguments to be helpful in teaching or training rationality. This is related to my criterion of consequentialism: the argument in favor of Bayesian decision theory should directly point to *why it matters*.

With respect to this second quest, CCT is interesting because Dutch Book and money-pump arguments point out irrationality in agents by *exploiting* the irrational agent. CCT is more amenable to a model in which you point out irrationality by *helping* the irrational agent. I am working on a more thorough expansion of that view with some co-authors.

## Other Foundations

(Skip this section if you just want to know about CCT, and not why I claim it is better than alternatives.)

I give an overview of many proposed foundational arguments for Bayesianism in the first post in this series. I called out Dutch Book and money-pump arguments as the most promising, in terms of motivating decision theory only from "winning". The second post in the series attempted to motivate all of decision theory from only those two arguments (extending work of Stuart Armstrong along those lines), and succeeded. However, the resulting argument was in itself not very satisfying. If you look at the structure of the argument, it justifies constraints on decisions via problems which would occur in hypothetical games involving money. Many philosophers have argued that the Dutch Book argument is in fact a way of illustrating inconsistency in belief, rather than truly an argument that you must be consistent or else. I think this is right. I now think this is a serious flaw behind both Dutch Book and money-pump arguments. There is no pure consequentialist reason to constrain decisions based on consistency relationships with thought experiments.

The position I'm defending in the current post has much in common with the paper Actualist Rationality by C. Manski. My disagreement with him lies in his dismissal of CCT as yet another bad argument. In my view, CCT seems to address his concerns almost precisely!

Caveat --

Dutch Book arguments are *fairly* practical. Betting with people, or asking them to consider hypothetical bets, is a useful tool. It may even be what convinces someone to use probabilities to represent degrees of belief. However, the argument falls apart if you examine it too closely, or at least requires extra assumptions which you have to argue in a different way. Simply put, belief is not literally the same thing as willingness to bet. Consequentialist decision theories are in the business of relating beliefs to actions, not relating beliefs to betting behavior.

Similarly, money-pump arguments can sometimes be extremely practical. The resource you're pumped of doesn't need to be money -- it can simply be the cost of thinking longer. If you spin forever between different options because you prefer strawberry ice cream to chocolate and chocolate to vanilla and vanilla to strawberry, you will not get any ice cream. However, the set-up to money pump *assumes* that you will not notice this happening; whatever the extra cost of indecision is, it is placed outside of the considerations which can influence your decision.

So, Dutch Book "defines" belief as willingness-to-bet, and money-pump "defines" preference as willingness-to-pay; in doing so, both arguments put the justification of decision theory into hypothetical exploitation scenarios which are not quite the same as the actual decisions we face. If these were the best justifications for consequentialism we could muster, I would be somewhat dissatisfied, but would likely leave it alone. Fortunately, a better alternative exists: complete class theorems.

# Four Complete Class Theorems

For a thorough introduction to complete class theorems, I recommend Peter Hoff's course notes. I'm going to walk through four complete class theorems dealing with what I think are particularly interesting cases. Here's a map:

In words: first we'll look at the standard setup, which assumes likelihood functions. Then we will remove the assumption of likelihood functions, since we want to argue for probability theory from scratch. Then, we will switch from talking about decision theory to social choice theory, and use CCT to derive a variant of Harsanyi's utilitarian theorem, AKA Harsanyi's social aggregation theorem, which tells us about cooperation between agents with common beliefs (but different utility functions). Finally, we'll add likelihoods back in. This gets us a version of Critch's multi-objective learning framework, which tells us about cooperation between agents with different beliefs *and* different utility functions.

I think of *Harsanyi's utilitarianism theorem* as the best justification for utilitarianism, in much the same way that I think of CCT as the best justification for Bayesian decision theory. It is not an argument that *your personal values* are necessarily utilitarian-altruism. However, it *is* a strong argument for utilitarian altruism as the most coherent way to care about others; and furthermore, to the extent that groups can make rational decisions, I think it is an extremely strong argument that the group decision should be utilitarian. AlexMennen discusses the theorem and implications for CEV here [LW · GW].

I somewhat jokingly think of Critch's variation as "Critch's Futarchy theorem" -- in the same way that Harsanyi shows that utilitarianism is the unique way to make rational collective decisions when everyone agrees about the facts on the ground, Critch shows that rational collective decisions when there is disagreement must involve a betting market. However, Critch's conclusion is not quite Futarchy. It is more extreme: in Critch's framework, agents bet their voting stake rather than money! The more bets you win, the more control you have over the system; the more bets you lose, the less your preferences will be taken into account. This is, perhaps, rather harsh in comparison to governance systems we would want to implement. However, rational agents of the classical Bayesian variety are happy to make this trade.

Without further adieu, let's dive into the theorems.

## Basic CCT

We set up decision problems like this:

- is the set of possible states of the external world.
- is the set of possible observations.
- is the set of actions which the agent can take.
- is a likelihood function, giving the probability of an observation under a particular world-state .
- is a set of decision rules. For , outputs an action. Stochastic decision rules are allowed, though, in which case we should really think of it as outputting an action probability.
- , the loss function, takes a world and an action and returns a real-valued "loss". encodes preferences: the lower the loss, the better. One way of thinking about this is that the agent knows how its actions play out in each possible world; the agent is only uncertain about consequences because it doesn't know which possible world is the case.

In this post, I'm only going to deal with cases where and are finite. This is not a minor theoretical convenience -- things get significantly more complicated with unbounded sets, and the justification for Bayesianism in particular is weaker. So, it's potentially quite interesting. However, there's only so much I want to deal with in one post.

Some more definitions:

The * risk* of a policy in a particular true world-state: =.

A decision rule is a * pareto improvement* over another rule if and only if for all , and strictly > for at least one. This is typically called

*in treatments of CCT, but it's exactly parallel to the idea of pareto-improvement from economics and game theory: everyone is at least as well off, and at least one person is better off. An improvement which harms no one. The only difference here is that it's with respect to possible states, rather than people.*

**dominance**A decision rule is * admissible* if and only if there is no pareto improvement over it. The idea is that there should be no reason not to take pareto improvements, since you're only doing better no matter what state the world turns out to be in. (We could also call this

*pareto-optimal*.)

A class of decision rules is a ** complete class **if and only if for any rule

*not*in , , there exists a rule in which is a pareto improvement. Note, not every rule in a complete class will be admissible itself. In particular, the set of all decision rules is a complete class. So, the complete class is a device for proving a weaker result than admissibility. This will actually be a bit silly for the finite case, because we can characterize the set of admissible decision rules. However, it is the namesake of complete class theorems in general; so, I figured that it would be confusing not to include it here.

Given a probability distribution on world-states, the * Bayes risk* is the expected risk over worlds, IE: .

A probability distribution is * non-dogmatic* when for all .

A decision rule is * bayes-optimal* with respect to a distribution if it minimizes Bayes risk with respect to . (This is usually called a

*Bayes rule*with respect to , but that seems fairly confusing, since it sounds like "Bayes' rule" aka Bayes' theorem.)

**THEOREM: ***When * *and are finite, decision rules which are bayes-optimal with respect to a non-dogmatic are admissible. *

**PROOF: **On the one hand, if is Bayes-optimal with respect to non-dogmatic , it minimizes the expectation . Since for each world, any pareto-improvement (which must be strictly better in some world, and not worse in any) must decrease this expectation. So, must be minimizing the expectation if it is Bayes-optimal.

**THEOREM:** (basic CCT) *When and are finite, a decision rule is admissible if and only if it is Bayes-optimal with respect to some prior . *

** PROOF:** If is admissible, we wish to show that it is Bayes-optimal with respect to some .

A decision rule has a risk in each world; think of this as a vector in . The set of achievable risk vectors in (given by all ) is convex, since we can make mixed strategies between any two decision rules. It is also closed, since and are finite. Consider a risk vector as a point in this space (not necessarily achievable by any ). Define the * lower quadrant* to be the set of points which would be pareto improvements if they were achievable by a decision rule. Note that for an admissible decision rule with risk vector , and are disjoint. By the hyperplane separation theorem, there is a separating hyperplane between and . We can define by taking a vector normal to the hyperplane and normalizing it to sub to one. This is a prior for which is Bayes-optimal, establishing the desired result.

If this is confusing, I again suggest Peter Hoff's course notes. However, here is a simplified illustration of the idea for two worlds, four pure actions, and no observations:

(I used because I am more comfortable with thinking of "good" as "up", IE, thinking in terms of utility rather than loss.)

The black "corners" coming from and show the beginning of the Q() set for those two points. (You can imagine the other two, for and .) Nothing is pareto-dominated except for , which is dominated by everything. In economics terminology, the first three actions are on the pareto frontier. In particular, is not pareto-dominated. Putting some numbers to it, could be worth (2,2), that is, worth two in each world. could be worth (1,10), and could be worth (10,1). There is no prior over the two worlds in which a Bayesian would want to take action . So, how do we rule it out through our admissibility requirement? We add mixed strategies:

Now, there's a new pareto frontier: the line stretching between and , consisting of strategies which have some probability of taking those two actions. Everything else is pareto-dominated. An agent who starts out considering can see that mixing between and is just a better idea, no matter what world they're in. This is the essence of the CCT argument.

Once we move to the pareto frontier of the set of mixed strategies, we can draw the separating hyperplanes mentioned in the proof:

(There may be a unique line, or several separating lines.) The separating hyperplane allows us to derive a (non-dogmatic) prior which the chosen decision rule is consistent with.

## Removing Likelihoods (and other unfortunate assumptions)

Assuming the existence of a likelihood function is rather strange, if our goal is to argue that agents should use probability and expected utility to make decisions. A purported decision-theoretic foundation should not assume that an agent has any probabilistic beliefs to start out.

Fortunately, this is an extremely easy modification of the argument: restricting to either be zero or one is just a special case of the existing theorem. This does not limit our expressive power. Previously, a world in which the true temperature is zero degrees would have some probability of emitting the observation "the temperature is one degree", due to observation error. Now, we consider the error a part of the world: there is a world where the true temperature is zero and the measurement is one, as well as one where the true temperature is zero and the measurement is zero, and so on.

Another related concern is the assumption that we have mixed strategies, which are described via probabilities. Unfortunately, this is much more central to the argument, so we have to do a lot more work to re-state things in a way which doesn't assume probabilities directly. Bear with me -- it'll be a few paragraphs before we've done enough work to eliminate the assumption that mixed strategies are described by probabilities.

It will be easier to first get rid of the assumption that we have cardinal-valued loss . Instead, assume that we have an ordinal preference for each world, . We then apply the VNM theorem within each , to get a cardinal-valued utility within each world. The CCT argument can then proceed as usual.

Applying VNM is a little unsatisfying, since we need to assume the VNM axioms about our preferences. Happily, it is easy to weaken the VNM axioms, instead letting the assumptions from the CCT setting do more work. A detailed write-up of the following is being worked on, but to briefly sketch:

First, we can get rid of the independence axiom. A mixed strategy is really a strategy which involves observing coin-flips. We can put the coin-flips inside the world (breaking each into more sub-worlds in which coin-flips come out differently). When we do this, the independence axiom is a consequence of admissibility; any violation of independence can be undone by a pareto improvement.

Second, having made coin-flips explicit, we can get rid of the axiom of continuity. We apply the VNM-like theorem from the paper Additive representation of separable preferences over infinite products, by Marcus Pivato. This gives us cardinal-valued utility functions, but without the continuity axiom, our utility may sometimes be represented by infinities. (Specifically, we can consider surreal-numbered utility as the most general case.) You can assume this never happens if it bothers you.

More importantly, at this point we don't need to assume that mixed strategies are represented via pre-existing probabilities anymore. Instead, they're represented by the coins.

I'm fairly happy with this result, and apologize for the brief treatment. However, let's move on for now to the comparison to social choice theory I promised.

## Utilitarianism

I said that are "possible world states" and that there is an "agent" who is "uncertain about which world-state is the case" -- however, notice that I didn't really *use* any of that in the theorem. What matters is that for each , there is a preference relation on actions. CCT is actually about compromising between different preference relations.

If we drop the observations, we can interpret the as *people*, and the as potential collective actions. The are potential social choices, which are admissible when they are pareto-efficient with respect to individual's preferences.

Making the hyperplane argument as before, we get a which places positive weight on each individual. This is interpreted as each individual's weight in the coalition. The collective decision must be the result of a (positive) linear combination of each individual's cardinal utilities -- and those cardinal utilities can in turn be constructed via an application of VNM to individual ordinal preferences. This result is very similar to Harsanyi's utilitarianism theorem.

This is not only a nice argument for utilitarianism, it is also an amusing mathematical pun, since it puts utilitarian "social utility" and decision-theoretic "expected utility" into the same mathematical framework. Just because both can be derived via pareto-optimality arguments doesn't mean they're necessarily the same thing, though.

Harsanyi's theorem is not the most-cited justification for utilitarianism. One reason for this may be that it is "overly pragmatic": utilitarianism is about *values;* Harsanyi's theorem is about *coherent governance*. Harsanyi's theorem relies on imagining a collective decision which has to compromise between everyone's values, and specifies what it must be like. Utilitarians don't imagine such a global decision can really be made, but rather, are trying to specify their own altruistic values. Nonetheless, a similar argument applies: altruistic values are enough of a "global decision" that, hypothetically, you'd want to run the Harsanyi argument if you had descriptions of everyone's utility functions and if you accepted pareto improvements. So there's an argument to be made that that's still what you want to approximate.

Another reason, mentioned by Jessicata in the comments, is that utilitarians typically value egalitarianism. Harsanyi's theorem only says that you must put *some* weight on each individual, not that you have to be *fair. *I don't think this is much of a problem -- just as CCT argues for "some" prior, but realistic agents have further considerations which make them skew towards maximally spread out priors, CCT in social choice theory can tell us that we need *some* weights, and there can be extra considerations which push us toward egalitarian weights. Harsanyi's theorem is still a strong argument for a big chunk of the utilitarian position.

## Futarchy

Now, as promised, Critch's 'futarchy' theorem.

If we add observations back in to the multi-agent interpretation, associates each agent with a probability distribution on observations. This can be interpreted as each agent's beliefs. In the paper Toward Negotiable Reinforcement Learning, Critch examined pareto-optimal sequential decision rules in this setting. Not only is there a function which gives a weight for each agent in the coalition, but *this * *is updated via Bayes' Rule as observations come in.* The interpretation of this is that the agents in the coalition want to bet on their differing beliefs, so that agents who make more correct bets gain more influence over the decisions of the coalition.

This differs from Robin Hanson's futarchy, whose motto *"vote on values, but bet beliefs"* suggests that everyone gets an equal vote -- you lose *money* when you bet, which loses you influence on *implementation* of public policy, but you still get an equal share of *value. *However, Critch's analysis shows that Robin's version can be strictly improved upon, resulting in Critch's version. (Also, Critch is not proposing his solution as a system of governance, only as a notion of multi-objective learning.) Nonetheless, the spirit still seems similar to Futarchy, in that the control of the system is distributed based on bets.

If Critch's system seems harsh, it is because we wouldn't really want to bet away all our share of the collective value, nor do we want to punish those who would bet away all their value too severely. This suggests that we (a) just *wouldn't* bet everything away, and so wouldn't end up too badly off; and (b) would want to still take care of those who bet their own value away, so that the consequences for those people would not actually be so harsh. Nonetheless, we can also try to take the problem more seriously and think about alternative formulations which seem less strikingly harsh.

## Conclusion

One potential research program which may arise from this is: take the analogy between social choice theory and decision theory very seriously. Look closely at more complicated models of social choice theory, including voting theory and perhaps mechanism design. Understand the structure of rational collective choice in detail. Then, try to port the lessons from this back to the individual-agent case, to create decision theories more sophisticated than simple Bayes. Mirroring this on the four-quadrant diagram from early on:

And, if you squint at this diagram, you can see the letters "CCT".

(Closing visual pun by Caspar Österheld.)

## 34 comments

Comments sorted by top scores.

Making the hyperplane argument as before, we get a π which places positive weight on each individual. This is interpreted as each individual's weight in the coalition. The collective decision must be the result of a (positive) linear combination of each individual's cardinal utilities -- and those cardinal utilities can in turn be constructed via an application of VNM to individual ordinal preferences.

What this says is that any Pareto-optimal outcome can be *rationalized* as maximizing a positive linear combination of individual utilities, not that it can be *generated* in this way. For example, Nash bargaining results in Pareto optimal outcomes, yet it can't be specified as the unique maximization of some positive linear combination of individual utilities. After running the algorithm, the result is optimal according to some linear combination of individual utilities, but this is a rationalization rather than the actual generation procedure. (This also works as a criticism of Bayesianism)

I basically agree with this criticism, and would like to understand what the alternative to Bayesian decision theory which comes out of the analogy would be.

I think when several AIs with bounded utility functions decide to merge, they can reach any point on the Pareto frontier like this:

1) Allow linear combinations of utility functions. This lets you reach all "pointy" points.

2) Allow making a tuple of functions of type (1) whose values should be compared lexicographically (e.g. "maximize U+V, break ties by maximizing U"). This lets you reach some points on the edges of flat parts.

3) Allow the merging process to choose randomly which function of type (2) to give to the merged AI. This lets you reach the rest of the points on flat parts.

That's a bit complicated, but I don't think there's a simpler way.

I don't see why 2 is necessary given that any point on the Pareto frontier is a mixture of pointy points (intuition for this: any point on the face of a polyhedron is a mixture of that face's corners). In any case, I agree with the basic mathematical point that you can get any Pareto optimal mixture of outcomes by mixing between non-negative linear combinations of utility functions.

Well, I was imagining a Pareto frontier that changes smoothly from flat to curved. Then we can't quite get a pointy point exactly on the edge of the flat part. That's what 2 is for, it gives us some of these points (though not all). But I guess that doesn't matter if things are finite enough.

Ok, that seems right.

Yeah, I explored this direction pretty thoroughly a few years ago. The simplest way is to assume that agents don't have probabilities, only utility functions over combined outcomes, where a "combined outcome" is a combination of outcomes in all possible worlds. (That also takes care of updating on observations, we just follow UDT instead.) Then if we have two agents with utility functions U and V over combined outcomes, any Pareto-optimal way of merging them must behave like an agent with utility function aU+bV for some a and b. The theory sheds no light on choosing a and b, so that's as far as it goes. Do you think there's more stuff to be found?

It sounds like you considered a more general setting than I am an the moment. I want to eventually move to that kind of "combined outcome" setting, but first, I want to understand more classical preference structures and break things one at a time.

Do you think your version sheds any light on value learning in UDT? I had a discussion with Alex Appel about this, in which it seemed like you have a "nosy neighbors" problem, where a potential set of values may care about what happens even in worlds where different values hold; but, this problem seemed to be bounded by such other-world preferences acting like beliefs. For example, you could imagine a UDT agent with world-models in which either vegetarianism or carnivorism are right (which somehow make different predictions). Each set of preferences can either be "nosy" (cares what happens regardless of which facts end up true) or "non-nosy" (each preference set only cares about what happens in their own world -- vegetarianism cares about the amount of meat eaten in veg-world, and carnivorism cares about amount of meat eaten in carn-world).

The claim which seemed plausible was that nosiness has some kind of balancing behavior which acts like probability: putting some of your caring measure on other worlds reduces your caring measure on your own.

Anything structurally similar in your framework?