The justification for modelling real-world systems as “agents” - i.e. choosing actions to maximize some utility function - usually rests on various coherence theorems [LW · GW]. They say things like “either the system’s behavior maximizes some utility function, or it is throwing away resources” or “either the system’s behavior maximizes some utility function, or it can be exploited” or things like that. Different theorems use slightly different assumptions and prove slightly different things, e.g. deterministic vs probabilistic utility function, unique vs non-unique utility function, whether the agent can ignore a possible action, etc.
One theme in these theorems is how they handle “incomplete preferences”: situations where an agent does not prefer one world-state over another. For instance, imagine an agent which prefers pepperoni over mushroom pizza when it has pepperoni, but mushroom over pepperoni when it has mushroom; it’s simply never willing to trade in either direction. There’s nothing inherently “wrong” with this; the agent is not necessarily executing a dominated strategy, cannot necessarily be exploited, or any of the other bad things we associate with inconsistent preferences. But the preferences can’t be described by a utility function over pizza toppings.
In this post, we’ll see that these kinds of preferences are very naturally described using subagents. In particular, when preferences are allowed to be path-dependent, subagents are important for representing consistent preferences. This gives a theoretical grounding for multi-agent models of human cognition.
Preference Representation and Weak Utility
Let’s expand our pizza example. We’ll consider an agent who:
Prefers pepperoni, mushroom, or both over plain cheese pizza
Prefers both over pepperoni or mushroom alone
Does not have a stable preference between mushroom and pepperoni - they prefer whichever they currently have
We can represent this using a directed graph:
The arrows show preference: our agent prefers A to B if (and only if) there is a directed path from A to B along the arrows. There is no path from pepperoni to mushroom or from mushroom to pepperoni, so the agent has no preference between them. In this case, we’re interpreting “no preference” as “agent prefers to keep whatever they have already”. Note that this is NOT the same as “the agent is indifferent”, in which case the agent is willing to switch back and forth between the two options as long as the switch doesn’t cost anything.
Key point: there is no cycle in this graph. If the agent’s preferences are cyclic, that’s when they provably throw away resources, paying to go in circles. As long as the preferences are acyclic, we call them “consistent”.
Now, at this point we can still define a “weak” utility function by ignoring the “missing” preference between pepperoni and mushroom. Here’s the idea: a normal utility function says “the agent always prefers the option with higher utility”. A weak utility function says: “if the agent has a preference, then they always prefer the option with higher utility”. The missing preference means we can’t build a normal utility function, but we can still build a weak utility function. Here’s how: since our graph has no cycles, we can always order the nodes so that the arrows only go forward along the sorted nodes - a technique called topological sorting. Each node’s position in the topological sort order is its utility. A small tweak to this method also handles indifference.
(Note: I’m using the term “weak utility” here because it seems natural; I don’t know of any standard term for this in the literature. Most people don’t distinguish between these two interpretations of utility.)
When preferences are incomplete, there are multiple possible weak utility functions. For instance, in our example, the topological sort order shown above gives pepperoni utility 1 and mushroom utility 2. But we could just as easily swap them!
Preference By Committee
The problem with the weak utility approach is that it treats the preference between pepperoni and mushroom as unknown - depending on which possible utility we pick, it could go either way. It’s pretending that there’s some hidden preference there which we simply don’t know. But there are real systems where the preference is not merely unknown, but a real preference to stay in the current state.
For example, maybe our pizza-agent is actually a committee which must unanimously agree to any proposed change. One member prefers pepperoni to no pepperoni, regardless of mushrooms; the other prefers mushrooms to no mushrooms, regardless of pepperoni. This committee is not exploitable and does not throw away resources, nor does it have any hidden preference between pepperoni and mushrooms. Viewed as a black box, its “true” preference between pepperoni and mushrooms is to keep whichever it currently has.
In fact, it turns out that we can represent any consistent preferences by a committee requiring unanimous agreement.
The key idea here is called order dimension. We want to take our directed acyclic graph of preferences, and stick it into a multidimensional space so that there is an arrow from A to B if-and-only-if B is higher along all dimensions. Each dimension represents the utility of one subagent on the committee; that subagent approves a change only if the change does not decrease the subagent’s utility. In order for the whole committee to approve a change, the trade must increase (or leave unchanged) the utilities of all subagents. The minimum number of agents required to make this work - the minimum number of dimensions required - is the order dimension of the graph.
For instance, our pizza example has order dimension 2. We can draw it in a 2-dimensional space like this:
Note that, if there are infinitely many possibilities, then the order dimension can be infinite - we may need infinitely many agents to represent some preferences. But as long as the possibilities are finite, the order dimension will be as well.
So far, we’ve interpreted “missing” preferences as “agent prefers to stay in current state”. One important reason for that interpretation is that it’s exactly what we need in order to handle path-dependent preferences.
In practice, path-dependent preferences mostly matter for systems with “hidden state”: internal variables which can change in response to the system’s choices. A great example of this is financial markets: they’re the ur-example of efficiency and inexploitability, yet it turns out that a market does not have a utility function in general (economists call this “nonexistence of a representative agent”). The reason is that the distribution of wealth across the market’s agents functions as an internal hidden variable. Depending on what path the market follows, different internal agents end up with different amounts of wealth, and the market as a whole will hold different portfolios as a result - even if the externally-visible variables, i.e. prices, end up the same.
Most path-dependence results from some hidden state directly, but even if we don’t know the hidden state, we can always add hidden state in order to model path-dependence. Whenever future preferences differ based on how the system reached the current state, we just split the state into two states - one for each possibility. Then we repeat, until we have a full set of states with path-independent preferences between them. These new states are “full” states of the system; from outside, some of them look the same.
An example: suppose I prefer New York to Boston if I just came from DC, but Boston to New York if I just came from Philadelphia.
We can represent that with hidden state:
We now have two separate hidden internal nodes, which both correspond to the same externally-visible state “New York”.
Now the key piece: there is no way to get from the “New York (from Philly)” node directly from the “New York (from DC)” node. The agent does not, and cannot, have a preference between these two nodes. Analogously, a market cannot have a preference between two different wealth distributions - the subagents who comprise a market will never spontaneously decide to redistribute their wealth amongst themselves. They always “prefer” (or “decide”) to stay in whatever state they’re currently in.
This is why we need to understand incomplete preferences in order to handle path-dependent preferences: hidden state creates situations where the agent “prefers” to stay in whatever state they’re in.
Now we can easily model the system using subagents exactly as we did for incomplete preferences. We have a directed preference graph between full states (including hidden state), it needs to be acyclic to avoid throwing away resources, so we can find a set of subagents to represent the preferences. In the case of a market, this is just the subagents which comprise the market: they’ll take a trade if it does not decrease the utility of any subagent. (Note, however, that the same externally-visible trade can correspond to multiple possible internal state changes; the subagents will take the trade if any of the possible internal state changes are non-utility-decreasing for all of them. For a market, this means they can trade amongst themselves in response to the external trade in order to make everyone happy.)
Applications & Speculations
We’ve just argued that a system with consistent preferences can be modelled as a committee of utility-maximizing agents. How does this change our interpretation and predictions of the world?
First and foremost: the subagents argument is a generalization of the standard acyclic preferences argument. Anytime we might want to use the acyclic preferences argument, but there’s no reason for the system to be path-independent, we can apply the subagents argument instead. In practice, we usually expect systems to be efficient/inexploitable because of some selection pressure (evolution, market competition, etc) - and that selection pressure usually doesn’t care about path dependence in and of itself.
Main takeaway: pretty much anywhere we’d use an agent with a utility function to model something, we can apply the subagents argument and use a committee of agents with utility functions instead. In particular, this is a good replacement for "weak" utility functions.
Humans are a particularly interesting example. We’d normally use the acyclic preferences argument (among other arguments) to argue that humans approximate utility-maximizers in most situations. But there’s no particular reason to assume path-independence; indeed, human behavior looks highly path-dependent [LW · GW]. So, apply the subagents argument. Hypothesis: human behavior approximates the choices of a committee of utility-maximizing agents in most situations.
Sound familiar [LW · GW]? The subagents argument offers a theoretical basis for the idea that humans have lots of internal subagents, with competing wants and needs, all constantly negotiating with each other to decide on externally-visible behavior.
In principle, we could test this hypothesis more rigorously. Lots of people think of AI “learning what humans want” by asking questions or offering choices or running simulations. Personally, I picture an AI taking in a scan of a full human connectome, then directly calculating the embedded preferences. Someday, this will be possible. When the AI solves those equations, do we expect it to find a single generic optimizer embedded in the system, approximately optimizing some “utility”? Or do we expect to find a bunch of separate generic optimizers, approximately optimizing several different “utilities”, and negotiating with each other? Probably neither picture is complete yet, but I’d bet the second is much closer to reality.
The acyclic preferences argument is the easiest entry point for efficiency/inexploitability-implies-utility-maximization theorems, but it doesn’t handle lots of important things, including path dependence.
Markets, for example, are efficient/inexploitable but can’t be represented by a utility function. They have hidden internal state - the distribution of wealth over agents - which makes their preferences path-dependent.
The subagents argument says that any system with deterministic, efficient/inexploitable preferences can be represented by a committee of utility-maximizing agents - even if the system has path-dependent or incomplete preferences.
That means we can substitute committees in many places where we currently use utilities. For instance, it offers a theoretical foundation for the idea that human behavior is described by many negotiating subagents.
One big piece which we haven’t touched at all is uncertainty. An obvious generalization of the subagents argument is that, once we add uncertainty (and a notion of efficiency/inexploitability which accounts for it), an efficient/inexploitable path-dependent system can be represented by a committee of Bayesian utility maximizers. I haven’t even started to tackle that conjecture yet; it’s a wide-open problem.
When I initially read this post, I got the impression that "subagents = path-dependent/incomplete DAG". After working through more examples, it seems like all the work is being done by "committee requiring unanimous agreement" rather than by the "subagents" part.
Here are the examples I thought about:
Same as the mushroom/pepperoni situation, with the same two agents, but now each side can retaliate/hijack the rest of the mind if it doesn't get what it wants. For example, if it starts at pepperoni, the mushroom-preferring agent will hijack the rest of the mind to remove the pepperoni, ending up at cheese. But if the agent starts at the "both" node, it will stay there (because both agents are satisfied). The preference relation can be represented as pepperoni→cheese←mushroom with an extra arrow from cheese→both. This is still a DAG, and it's still incomplete (in the sense that we can't compare pepperoni vs mushroom) but it's no longer path-dependent, because no matter where we start, we end up at cheese or "both" (I am assuming that toppings-removal can always be done, whereas acquiring new toppings can't).
Same as the previous example, except now only the mushroom-preferring agent can retaliate/hijack (because the pepperoni-preferring agent is weak or nice). Now the preferences are pepperoni→cheese→mushroom→both. This is still a DAG, but now the preferences are total, so we can also view it as a (somewhat weird) single agent. A realistic example of this is given by Andrew Critch, where pepperoni=work, cheese=burnout (i.e. neither work nor friendship), mushroom=friendship, and both=friendship-and-work.
A modified version of the Zyzzx Prime planet by Scott Alexander. Now whenever we start out at pepperoni, the pepperoni-preferring agent becomes stupid/weak, and loses dominance, so now there are edges from pepperoni to mushroom and "both". (And similarly, mushroom points to both pepperoni and "both".) Now we no longer have a DAG because of the cycle between pepperoni and mushroom.
It seems like when people talk about the human mind being composed of subagents, the deliberation process is not necessarily "committee requiring unanimous agreement", so the resulting preference relations cannot necessarily be represented using path-dependent DAGs.
It also seems like the general framework of viewing systems as subagents (i.e. not restricting to "committee requiring unanimous agreement") is broad enough that it can basically represent any kind of directed graph. On one hand, this is suspicious (if everything can be viewed as a bunch of subagents, then maybe the subagents framework isn't adding anything after all). On the other hand, this suggests that claims of subagents are not really about the resulting behavior/preference ordering of the system, but rather about the internal dynamics of the system.
I definitely agree that most of the work is being done by the structure in which the subagents interact (i.e. committee requiring unanimous agreement) rather than the subagents themselves. That said, I wouldn't get too hung up on "committee requiring unanimous agreement" specifically - there are structures which behave like unanimous committees but don't look like a unanimous committee on the surface, e.g. markets. In a market, everyone has a veto, but each agent only cares about their own basket of goods - they don't care if somebody else' basket changes.
In the context of humans, one way to interpret this post is that it predicts that subagents in a human usually have veto power over decisions directly touching on the thing they care about. This sounds like a pretty good model of, for example, humans asked about trade-offs between sacred values.
Suppose you offer to pay a penny to swap mushroom for pepperoni, and then another penny to swap back. This agent will refuse, failing to money pump you.
Suppose you offer the agent a choice between pepperoni or mushroom, when it currently has neither. Which does it choose? If it chooses pepperoni, but refuses to swap mushroom for pepperoni then its decisions depend on how the situation is framed. How close does it have to get to the mushroom before they "have" mushroom and refuse to swap? Partial preferences only make sense when you don't have to choose between unordered options.
We could consider the agent to have a utility function with a term for time consistency, they want the pizza in front of them at times 0 and 1 to be the same.
If it chooses pepperoni, but refuses to swap mushroom for pepperoni then its decisions depend on how the situation is framed. How close does it have to get to the mushroom before they "have" mushroom and refuse to swap?
I previously suggested [LW · GW] that revealing the two options as equivalent will bring the two subagents into a standstill, requiring some third factor to help decide. Which seems close to what happens if I introspect on what happens if I'm offered a choice between two foods that I think are equally good - I just decide at random or go by e.g. some force of habit that provides a slight starting point bias [LW · GW].
Good question. Let's talk about analogous choices for a market, since that's a more realistic system, and then we can bring it back to pizza.
In a market, partial preferences result from hidden state. There is never a missing preference between externally-visible states (i.e. the market's aggregate portfolio). However, the market could have a choice between two hidden states: given one aggregate trade, the market could implement it two different ways, resulting in different wealth distributions. For instance, if I offer the market $5000 for 5 shares of AAPL, then those 5 shares can come from any combination of the internal agents holding AAPL, and the $5000 can be distributed among them in many different ways. This means the market's behavior is underspecified: there are multiple possible solutions for its behavior. Economists call the set of possible solutions the "contract curve". Usually, additional mechanics are added to narrow down the possible behavior - most notably the Law of One Price, the strongest form of which gives locally-unique solutions for the market's behavior. For real markets, Law of One Price is an approximation, and the exact outcome will depend on market microstructure: market making, day trading, and so forth.
Now let's translate this back to the original question about pizza.
Short answer: the preferences don't specify which choice the system takes when offered mushroom or pepperoni. It depends on internal structure of the system, which the preferences abstract away. And that's fine - as the market example shows, there are real-world examples where that abstraction is still useful. Additionally, for real-world cases of interest, the underspecified choices will usually be between hidden states, so the underspecified behavior will itself be "hidden" - it will only be externally-visible via path-dependence of later preferences.
I'm a bit concerned about this sort of thing: "The subagents argument offers a theoretical basis for the idea that humans have lots of internal subagents, with competing wants and needs, all constantly negotiating with each other to decide on externally-visible behavior."
A worry I have about the standard representation theorems is that they prove too much; if everything can be represented as having a utility function, then maybe it's not so useful to talk about utility functions. Similarly now I worry: I thought when people talked about subagent theories of mind, they meant something substantial by this--not merely that the mind has incomplete (though still acyclic) preferences!
Not sure if you've ever taken a class on electricity & magnetism, but one of the central notions is the conservative vector field - electric fields being the standard example. You take an electron, and drag it around the electric field. Sometimes you'll have to push it against the field, sometimes the field will push it along for you. You add up all the energy spent pushing (or energy extracted when the field pushes it for you), and find an interesting result: the energy spent moving the electron from point A to point B is completely independent of the path taken. Any two paths from A to B will require exactly the same energy expenditure.
That's a pretty serious constraint on the field - the vast majority of possible vector fields are not conservative.
It's also exactly the same constraint as a utility function: a vector field is conservative if-and-only-if it is acyclic, in the sense of having zero circulation around any closed curve. Indeed, this means that conservative vector fields can be viewed as utility functions: the field itself is the gradient of a "utility function" (called the potential field), and it accepts any local "trade" which increases utility - i.e. moving an electron up the gradient of the utility function. Conversely, if we have preferences represented by local preferences in a (finite-dimensional) vector space, then we can summarize those preferences with a utility function if-and-only-if the field is conservative.
My point is: acyclicity is a major constraint on a system's behavior. It is definitely not the case that "everything can be represented as having a utility function".
Now, there is a separate piece to your concern: when people talk about subagent theories of mind, they think that the brain is actually implemented using subagents, not merely behaving in a manner equivalent to having subagents. It's a variant of the behavior vs architecture question [LW · GW]. In this case, we can partially answer the question: subagent architectures have a relative advantage over most non-subagent architectures in that the subagent architectures won't throw away resources via cyclic preferences, whereas most of the non-subagent architectures will. The only non-subagent architectures which don't throw away resources are those whose behavior just so happens to be equivalent to subagents.
If a system with a subagent architecture is evolving, then it will mostly be exploring different configurations of subagents - so any configuration it explores will at least not throw away resources. On the other hand, with a non-subagent architecture, we'd expect that there's some surface in configuration space which happens to implement agent-like behavior, and any changes which move off that surface will throw away at least some resources - and any single-nucleotide change is likely to move off the surface. In other words, a subagent architecture is more likely to have a nice evolutionary path from wherever it starts to the maximum-fitness design, whereas a non-subagent architecture may not have such a smooth path. As an evolutionary analogue to the behavior vs architecture question, I'd conjecture: subagent-like behavior generally won't evolve without subagent-like architecture, because it's so much easier to explore efficient designs within a subagent architecture.
One thing I don't understand about cycles is that they seem fine as long as you have a generalized cycle detector and a single instance of a cycle getting generated is fine because the losses from one (or a few) rounds is small. I guess people think of utility functions as fixed normally, but this sort of rolls in fixed point/convergence intuitions into the problem formulation.
One frame is that utility functions as a formalism are just an extension of the great rationality debate.
If we have a cycle detector which prevents cycling, then we don't have true cycles. Indeed, that would be an example of a system with internal state: the externally-visible state looks like it cycles, but the full state never does - the state of the cycle detector changes.
So this post, as applied to cycle-detectors, says: any system which detects cycles and prevents further cycling can be represented by a committee of utility-maximizing agents.
Imagine a second agent which has the same preferences but an anti-status-quo preference between mushroom and pepperoni.
This would be exploitable by a third agent who is able to compare mushroom and pepperoni but assigns equal utilities to both. However the original agent described in the OP would not be able to exploit agent 2 (if agent 1's status-quo bias is larger than agent 2's anti-status-quo bias), so agent 3 dominates agent 1 in terms of performance.
Over multiple dimensions agent 3 becomes much more complex than agent 1. Having a status quo bias makes sense as a way to avoid being exploited whilst also being less computationally expensive than tracking or calculating every preference ordering.
Assuming agent 2 is rare, the loss incurred by not being able to exploit others is small.
With regards to path dependence and partial preferences, a certain amount of this feels like the model simply failing to fully capture the preference on the first go. That is, preferences are conditional, i.e. they are conditioned on the environment in which they are embedded, and the sense in which there is partiality and path dependence issues seems to me to arise entirely from partial specification, not the preference being partial itself. Thus I have to wonder, why pursue models that deal with partial preferences and their issues rather than trying to build better models of preferences that better capture the full complexity of preferences?
To a certain extent it feels to me like with partial preferences we're trying to hang on to some things that were convenient about older models while dealing with the complexities of reality they failed to adequately model, rather than giving up our hope to patch the old models and look for something better suited to what we are trying to model (yes, I'm revealing my own preference here for new models based on what we learned from old models instead of incrementally improving old models).
I recommend thinking about the market example. The difficulty for markets is not that the preferences are "conditioned on the environment"; exactly the opposite. The problem is that the preferences are conditional on internal state; they can't be captured only by looking at the external environment.
For examples like pepperoni vs mushroom pizza, where we're just thinking about partial preferences directly, it's reasonable to say that the problem is partial specification. Presumably the system does something when it has to choose between pepperoni and mushroom - see Donald Hobson's comment for more on that. But path dependence is a different beast. Once we start thinking about internal state and path dependence, partial preferences are no longer just due to partial specification - they're due to the system having internal variables which it doesn't "want" to change.
The problem is that the preferences are conditional on internal state; they can't be captured only by looking at the external environment.
I think I wasn't clear enough about what I meant. I mean to question specifically why excluding such so-called "internal" state is the right choice. Yes, it's difficult and inconvenient to work with that which we cannot externally observe, but I think much of the problem is that our models leave this part of the world out because it can't be easily observed with sufficient fidelity (yet). The division between internal and external is somewhat arbitrary in that it exists at the limit of our observation powers, not generally as a natural limit of the system independent of our knowledge of it, so I question whether it makes sense to then allow that limit to determine the model we use, rather than stepping back and finding a way to make the model larger such that it can include the epistemological limits that create partial preferences as a consequence rather than being ontologically basic to the model.
Ah, that makes more sense. There's several answers; the main answer is that the internal/external division is not arbitrary.
First: at least for coherence-type theorems, they need to work for any choice of system which satisfies the basic type signature (i.e. the environment offers choices, the system "decides" between them, for some notion of decision). The theorem has to hold regardless of where we draw the box. On the other hand, you could argue that some theorems are more useful than others and therefore we should draw our boxes to use those theorems - even if it means fewer "systems" qualify. But then we can get into trouble if there's a specific system we want to talk about which doesn't qualify - e.g. a human.
Second: in this context, when we talk about "internal" variables, that's not an arbitrary modelling choice - the "external" vs "internal" terminology is hiding a functionally-important difference. Specifically, the "external" variables are anything which the system chooses between, anything we could offer in a trade. It's not about the limits of observation, it's about the limits of trade or tradeoffs or choices. The distribution of wealth within a market is "internal" not because we can't observe it (to a large extent we can), but because it's not something that the market itself is capable of choosing, even in principle.
Now, it may be that there are other things in the external world which the market can't make choice about as a practical matter, like the radius of the moon. But if it somehow became possible to change the radius of the moon, then there's no inherent reason why the market can't make a choice on that - as opposed to the internal wealth distribution, where any choice would completely break the market mechanism itself.
That leads into a third answer: think of the "internal" variables as gears, pieces causally involved in making the decision. The system as a whole can have preferences over the entire state of the external world, but if it has preferences about the gears which are used to make decisions... well, then we're going to end up in a self-referential mess. Which is not to say that it wouldn't be useful to think about such self-referential messes; it would be an interesting embedded agency problem.
Consider a pizza-eating agent with the following "grass is always greener on the other side of the fence" preference: it has no "initial" preference between toppings but as soon as it has one it realises it doesn't like it and then prefers all other not-yet-tried toppings to the one it's got (and to others it's tried).
There aren't any preference cycles here -- if you give it mushroom it then prefers pepperoni, but having switched to pepperoni it then doesn't want to switch back to mushroom. If our agent has no opinion about comparisons between all toppings it's tried, and between all toppings it hasn't tried, then there are no outright inconsistencies either.
Can you model this situation in terms of committees of subagents? Can you do it without requiring an unreasonably large number of subagents?
Those are consistent path-dependent preferences, so they can be modeled by a committee of subagents by the method outlined in the post. It would require something like n2n−1 states, I think, one for each current topping times each possible set of toppings tried already. Off the top of my head, I'm not sure how many dimensions it would require, but you can probably figure it out by trying a few small examples.
That said, the right way to model those particular preferences is to introduce uncertainty and Bayesian reasoning. The "hidden state" in this case is clearly information the agent has learned about each topping.
This raises another interesting question: can we just model all path-dependent preferences by introducing uncertainty? What subset can be modeled this way? Nonexistence of a representative agent for markets suggests that we can't always just use uncertainty, at least without changing our interpretations of "system" or "preference" or "state" somewhat. On the other hand, in some specific cases it is possible to interpret the wealth distribution in a market as a probability distribution in a mixture model - log utilities let us do this, for instance. So I'd guess that there's some clever criteria that would let us tell whether a committee/market with given utilities can be interpreted as a single Bayesian utility maximizer.