«Boundaries», Part 1: a key missing concept from utility theory

post by Andrew_Critch · 2022-07-26T23:03:55.941Z · LW · GW · 31 comments


  1. Boundaries (of living systems) 
  2. Canonical disagreement points as missing from utility theory and game theory
  3. Boundaries as a way to select disagreement points in bargaining
  4. Some really important boundaries
  5. Summary

This post has been recorded as part of the LessWrong Curated Podcast, and can be listened to on Spotify, Apple Podcasts, and Libsyn.

This is Part 1 of my «Boundaries» Sequence [? · GW] on LessWrong.

Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse and x-risk (future posts).

1. Boundaries (of living systems) 

Epistemic status: me describing what I mean.

With the exception of some relatively recent and isolated pockets of research on embedded agency (e.g., Orseau & Ring, 2012; Garrabrant & Demsky, 2018), most attempts at formal descriptions of living rational agents — especially utility-theoretic descriptions — are missing the idea that living systems require and maintain boundaries.

When I say boundary, I don't just mean an arbitrary constraint or social norm.  I mean something that could also be called a membrane in a generalized sense, i.e., a layer of stuff-of-some-kind that physically or cognitively separates a living system from its environment, that 'carves reality at the joints' in a way that isn't an entirely subjective judgement of the living system itself.  Here are some examples that I hope will convey my meaning:

Figure 1: Cell membranes, skin, fences, firewalls, group divisions, and state borders as living system boundaries.

Comparison to Cartesian Boundaries.

For those who'd like a comparison to 'Cartesian boundaries', as in Scott Garrabrant's Cartesian Frames [AF · GW] work, I think what I mean here is almost exactly the same concept.  The main differences are these:

  1. (life-focus) I want to focus on boundaries of things that might naturally be called "living systems" but that might not broadly be considered "agents", such as a human being that isn't behaving very agentically, or a country whose government is in a state of internal disagreement. (I thought of entitling this sequence "membranes" instead, but stuck with 'boundaries' because of the social norm connotation.)
  2. (flexibility-focus) Also, the theory of Cartesian Frames assumes a fixed cartesian boundary for the agent, rather than modeling the boundary as potentially flexible, pliable, or permeable over time (although it could be extended to model that).

Comparison to social norms.

Certain social norms exist to maintain separations between livings systems.  For instance:

2. Canonical disagreement points as missing from utility theory and game theory

Epistemic status: uncontroversial overview and explanation of well-established research.

Game theory usually represents players as having utility functions (payoff functions), and often tries to view the outcome of the game as arising as a consequence of the players' utilities.  However, for any given concept of "equilibrium" attempting to predict how players will behave, there are often many possible equilibria.  In fact, there are a number of theorems in game theory called "folk theorems" (reference: Wikipedia) that show very large spaces of possible equilibria result when games have certain features approximating real-world interaction, such as

  1. the potential for players to talk to each other and make commitments (Kalai et al, 2010)
  2. the potential for players to interact repeatedly and thus establish "reputations" with each other (source: Wikipedia).

Here's a nice illustration of a folk theorem from a Chegg.com homework set:

Figure 2: A "folk theorem" showing a large space (blue) of subgame perfect Nash equilibria (SPNE) payoffs attainable in an infinitely repeated game, plotted on the space of payoffs for a single iteration of the game.  (image source: Chegg.com homework set).  It's not crucial to understand this figure for the post, but it's definitely worth learning about; see Wikipedia for an explanation.

The zillions of possible equilibria arising from repeated interactions leave us with not much of a prediction about what will actually happen in a real-world game, and not much of a normative prescription of what should happen, either.

Bargaining theory attempts to predict and/or prescribe how agents end up "choosing an equilibrium", usually by writing down some axioms to pick out a special point on the Pareto frontier of possible, such as the Nash Bargaining Solution and Kalai-Smordinsky Bargaining Solution (reference: Wikipedia).   It's not crucial to understand these figures for the remainder of the post, but if you don't, I do think think it's worth learning about them sometime, starting with the Wikipedia article:

Figure 3: Nash bargaining solution
simage source: Karmperis et al, 2013; to learn more, see Wikipedia)
Figure 4: Kalai-Smordinsky bargaining solution
(image source: Borgstrom et al, 2007; to learn more, start with Wikipedia)

The main thing to note about the above bargaining solutions is that they both depend on the existence of a constant point d, called a "disagreement point", representing a pair of constant utility levels that each player will fall back on attaining if the process of negotiation breaks down.  

(See also this concurrently written recent LessWrong post [LW · GW]about Kalai & Kalai's cooperative/competitive 'coco' bargaining solution.  The coco solution doesn't assume a constant disagreement point, but it does assume transferrable utility, which has its own problems, due to difficulties with defining interpersonal comparisons of utility [source: lots].)

The utility achieved by a player at the disagreement point is sometimes called their best alternative to negotiated agreement (BATNA):

Figure 5: Illustration of BATNAs delimiting a zone of potential agreement.
(source: PoweredTemplate.com ... not very academic, but a good illustration!)

Within the game, the disagreement point, i.e., the pair of BATNAs, may be viewed as defining what "zero" (marginal) utility means for each player.  

(Why does zero need a definition, you might ask?  Recall that the most broadly accepted axioms for the utility-theoretic foundations of game theory — namely,  the von Neumann–Morgenstern rationality axioms [reference: Wikipedia]) — only determine a player's utility function modulo a positive affine transformation ().  So, in the wild, there's no canonical way to look at an agent and say what is or isn't a zero-utility outcome for that agent.)

While it's appealing to think in terms of BATNAs, in physical reality, payoffs outside of negotiations can depend very much on the players' behavior inside the negotiations, and thus is not a constant.  Nash himself wrote about this limitation (Nash, 1953) just three years after originally proposing the Nash bargaining solution.  For instance, if someone makes an unacceptable threat against you during a business negotiation, you might go to the police and have them arrested, versus just going home and minding your business if the negotiations had failed in a more normal/acceptable way.  In other words, you have the ability to control their payoff outside the negotiation, based on what you observe during the negotiation.  It's not a constant; you can affect it.

So, the disagreement point or BATNA concept isn't really applicable on its own, unless something is protecting the BATNA from what happens in the negotiation, making it effectively constant.  Basically, the two players need a safe/protected/stable place to walk away to in order for a constant "walk away price" to be meaningful.  For many people in many situations, that place is their home:

Figure 6: People disagreeing and going home.
(source: owned)

Thus, to the extent that we maintain social norms like "mind your own business" and "don't threaten to attack people" and "people can do whatever they want in the privacy of their own homes", we also simplify bargaining dynamics outside the home, by maintaining a well-defined fallback option for each person (a disagreement point), of the form "go home and do your own thing".

3. Boundaries as a way to select disagreement points in bargaining

Epistemic status: research ideas, both for pinning down technical bargaining solutions, and for fixing game theory to be more applicable to real-life geopolitics and human interactions.

Since BATNAs need protection in order to be meaningful in negotiations, to identify BATNAs, we must ask: what protections already exist, going into the negotiation?  

For instance, 

4. Some really important boundaries

In real-world high-stakes negotiations between states — wars — almost the whole interaction is characterized by

Figure 7: The Eastern Front in WWII.  
Source: Britannica for kids ... again, not very academic, but nicely evocative of states changing their boundaries.

Finally, the issue of whether AI technology will cause human extinction is very much an issue of whether certain boundaries can be respected and maintained, such as the boundaries of the human body and mind that protect individuals, as well as boundaries around physical territories and cyberspace that (should) protect human civilization.  

That, however, will be a topic of a future post.  For now, the main take-aways I'd like to re-iterate are that boundaries of living systems are important, and that they have a technical role to play in the theory and practice of how agents interact, including in formal descriptions of how one or more agents will or should reach agreements in cases of conflict.

In the next post, I'll talk more about how that concept of boundaries could be better integrated into discourse on effective altruism.

5. Summary

In this post, I laid out what I mean by boundaries (of living systems), described how a canonical choice of a "zero point" or "disagreement point" is missing from utility theory and bargaining theory, proposed that living system boundaries have a role to play in defining those disagreement points, and briefly alluded to the importance of boundaries in navigating existential risk.

This was Part 1 of my «Boundaries» Sequence [? · GW].


Comments sorted by top scores.

comment by Jan_Kulveit · 2022-07-27T14:51:43.727Z · LW(p) · GW(p)

With the exception of some relatively recent and isolated pockets of research on embedded agency (e.g., Orseau & Ring, 2012; Garrabrant & Demsky, 2018), most attempts at formal descriptions of living rational agents — especially utility-theoretic descriptions — are missing the idea that living systems require and maintain boundaries.

While I generally like the post, I somewhat disagree with this summary of state of understanding, which seems to ignore quite a lot of academic research. In particular

- Friston et al certainly understand this (cf ... dozens to hundreds papers claiming and explainting the importance of boundaries for living systems)
- the whole autopoiesis field
- various biology-inspired papers (eg this)

I do agree this way of thinking it is less common among people stuck too much in the VNM basin, such as most of econ or most of game theory.


Replies from: Andrew_Critch
comment by Andrew_Critch · 2022-09-04T22:43:50.451Z · LW(p) · GW(p)

Jan, I agree with your references, especially Friston et al.  I think those kinds of understanding, as you say, have not adequately made their way into utility utility-theoretic fields like econ and game theory, so I think the post is valid as a statement about the state of understanding in those utility-oriented fields.  (Note that the post is about "a missing concept from the axioms of game theory and bargaining theory" and "a key missing concept from utility theory", and not "concepts missing from the mind of all of humanity".)

comment by Andrew_Critch · 2022-08-27T21:08:15.913Z · LW(p) · GW(p)

In Part 3 of this series, I plan to write a shallow survey of 8 problems relating to AI alignment, and the relationship of the «boundary» concept to formalizing them.  To save time, I'd like to do a deep dive into just one of the eight problems, based on what commenters here would find most interesting.  If you have a moment, please use the "agree" button (and where desired, "disagree") to vote for which of the eight topics I should go into depth about.  Each topic is given as a subcomment below (not looking for karma, just agree/disagree votes).  Thanks!

Replies from: Andrew_Critch, Andrew_Critch, Andrew_Critch, Andrew_Critch, Andrew_Critch, Andrew_Critch, Andrew_Critch, Andrew_Critch
comment by Andrew_Critch · 2022-08-27T21:12:25.794Z · LW(p) · GW(p)

7. Preference plasticity — the possibility of changes to the preferences of human preferences over time, and the challenge of defining alignment in light of time-varying preferences (Russell, 2019, p.263).

comment by Andrew_Critch · 2022-08-27T21:11:31.127Z · LW(p) · GW(p)

5. Counterfactuals in decision theory — the problem of defining what would have happened if an AI system had made a different choice, such as in the Twin Prisoner's Dilemma (Yudkowsky & Soares, 2017).

comment by Andrew_Critch · 2022-08-27T21:10:40.299Z · LW(p) · GW(p)

3. Mild optimization — the problem of designing AI systems and objective functions that, in an intuitive sense, don’t optimize more than they have to (Taylor et al, 2016).

comment by Andrew_Critch · 2022-08-27T21:12:04.664Z · LW(p) · GW(p)

6. Mesa-optimizers — instances of learned models that are themselves optimizers, which give rise to the so-called inner alignment problem (Hubinger et al, 2019).

comment by Andrew_Critch · 2022-08-27T21:11:05.172Z · LW(p) · GW(p)

4. Impact regularization — the problem of formalizing "change to the environment" in a way that can be effectively used as a regularizer penalizing negative side effects from AI systems (Amodei et al, 2016).

comment by Andrew_Critch · 2022-08-27T21:09:45.140Z · LW(p) · GW(p)

2. Corrigibility — the problem of constructing a mind that will cooperate with what its creators regard as a corrective intervention (Soares et al, 2015).

comment by Andrew_Critch · 2022-08-27T21:13:01.557Z · LW(p) · GW(p)

8. (Unscoped) Consequentialism — the problem that an AI system engaging in consequentialist reasoning, for many objectives, is at odds with corrigibility and containment (Yudkowsky, 2022 [LW · GW], no. 23).

comment by Andrew_Critch · 2022-08-27T21:08:57.386Z · LW(p) · GW(p)

1. AI boxing / containment — the method and challenge of confining an AI system to a "box", i.e., preventing the system from interacting with the external world except through specific restricted output channels (Bostrom, 2014, p.129).

comment by Ruby · 2022-08-04T05:12:43.733Z · LW(p) · GW(p)

Curated. It's not everyday that someone attempts to add concepts to the axioms of game theory/bargaining theory/utility theory and I'm pretty excited for where this is headed, especially if the implications are real for EA and x-risk.

comment by romeostevensit · 2022-07-28T02:27:59.087Z · LW(p) · GW(p)

Rambling/riffing: Boundaries typically need holes in order to be useful. Depending on the level of abstraction, different things can be thought of as holes. One way to think of a boundary is a place where a rule is enforced consistently, and this probably involves pushing what would be a continuous condition into a condition with a few semi discrete modes (in the simplest case enforcing a bimodal distribution of outcomes). In practice, living systems seem to have settled on stacking a bunch of one dimensional gate keepers together as presumably the modularity of such a thing was easier to discover in the search space than things with higher path dependencies due to entangled condition measurement. This highlights the similarity between boolean circuit analysis and a biological boundary. In a boolean circuit, the configurations of 'cheap' energy flows/gradients can be optimized for benefit, while the walls to the vast alternative space of other configurations can be artificially steepened/shored up (see: mitigation efforts to prevent electron tunneling in semiconductors).

comment by johnswentworth · 2022-07-27T21:37:41.572Z · LW(p) · GW(p)

Great post! One relatively-minor nitpick:

The coco solution doesn't assume a constant disagreement point, but it does assume transferrable utility, which has its own problems, due to difficulties with defining interpersonal comparisons of utility [source: lots].

Interpersonal comparisons of utility in general make no sense at all, because each agent's utility can be scaled/shifted independently. But I don't think that's a problem for transferrable utility, which is what we need for coco. Transferrable utility just requires money (or some analogous resource), and it requires that the amounts of money-equivalent involved in the game are small enough that utility is roughly linear in money. We don't need interpersonal comparability of utility for that.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-07-28T15:50:39.335Z · LW(p) · GW(p)

For the games that matter most, the amounts of money-equivalent involved are large enough that utility is not roughly linear in it. (Example: Superintelligences deciding what to do with the cosmic endowment.) Or so it seems to me, I'd love to be wrong about this.

Replies from: johnswentworth
comment by johnswentworth · 2022-07-28T17:38:01.727Z · LW(p) · GW(p)

Seems true, though I would guess that the coco idea could probably be extended to weaker conditions, e.g. expected utility a smooth function of money. I haven't looked into this, but my guess would be that it only needs linearity on the margin, based on how things-like-this typically work in economics.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-07-28T20:04:49.470Z · LW(p) · GW(p)

Interesting. I hope you are wrong.

Replies from: johnswentworth
comment by johnswentworth · 2022-07-28T20:25:38.052Z · LW(p) · GW(p)

Heh. Beware lest you wish yourself from the devil you know to the devil you don't.

comment by Vladimir_Nesov · 2022-08-13T12:56:59.710Z · LW(p) · GW(p)

In other words, you have the ability to control their payoff outside the negotiation, based on what you observe during the negotiation.

This suggests some sort of (possibly acausal) bargaining within the BATNAs, so points to a hierarchy of bargains. Each bargain must occur without violating boundaries of agents, but if it would, then the encounter undergoes escalation, away from trade and towards conflict. After a step of escalation, another bargain may be considered, that runs off tighter less comfortable boundaries. If it also falls through, there is a next level of escalation, and so on.

Possibly the sequence of escalation goes on until the goodhart boundary [LW(p) · GW(p)] where agents lose ability to assess value of outcomes. It's unclear what happens when that breaks down as well and one of the agents moves the environment into the other's crash space [LW(p) · GW(p)].

Note that this is not destruction of the other agent, which is unexpected for the last stage of escalation of conflict. Destruction of the other agent is merely how the game aborts before reaching its conclusion, while breaking into the crash space of the other agent is the least acceptable outcome in terms of agent boundaries (though it's not the worst outcome, it could even have high utility; these directions of badness are orthogonal, goodharting vs. low utility). This is a likely outcome of failed AI alignment (all boundaries of humanity are ignored, leading to something normatively worthless), as well as of some theoretical successes of AI alignment that are almost certainly impossible in practice (all boundaries of humanity are ignored, the world is optimized towards what is the normatively best outcome for humanity).

comment by mako yass (MakoYass) · 2023-03-05T00:38:11.617Z · LW(p) · GW(p)

You speak a lot about respecting boundaries, what I need to see before I'll be convinced that you're onto something here is that you also know when to disregard a spurious boundary. There are a lot of boundaries that exist in the world have been drawn arbitrarily and incorrectly and need to be violated, examples include; almost all software patents, national borders that don't correspond to the demographic preference clusters over spacialized law, or, some optional ones: racial definitions of cultural groups, or situations where an immense transition in power has occurred that has made it so that there was never a reason for the new powers to ask consent from the old powers, for instance, if RadicalXChange or Network States gave rise to a new political system that was obviously both hundreds of times more wealthy and democratically legitimate than the old system, would you expect it to recognize the US's state borders?

How would your paradigm approach that sort of thing?

Replies from: Chipmonk
comment by Chipmonk · 2023-07-18T19:51:47.854Z · LW(p) · GW(p)

Underrated comment! I completely agree [LW · GW].

For example, I think that many people say "I'm setting a boundary here […]" as an attempt to manipulate others into respecting a spurious boundary.

Replies from: MakoYass
comment by mako yass (MakoYass) · 2023-07-18T20:20:31.952Z · LW(p) · GW(p)

I'm setting a boundary here *claims the entire territory of Israel*

Replies from: Chipmonk
comment by Chipmonk · 2023-07-18T20:26:33.261Z · LW(p) · GW(p)

Yeah, I think this actually happens

comment by CarlJ · 2022-08-07T21:57:23.814Z · LW(p) · GW(p)

Some parts of this sounds similar to Friedman's "A Positive Account of Property Rights":

»The laws and customs of civil society are an elaborate network of Schelling points. If my neighbor annoys me by growing ugly flowers, I do nothing. If he dumps his garbage on my lawn, I retaliate—possibly in kind. If he threatens to dump garbage on my lawn, or play a trumpet fanfare at 3 A.M. every morning, unless I pay him a modest tribute I refuse—even if I am convinced that the available legal defenses cost more than the tribute he is demanding. 


If my analysis is correct, civil order is an elaborate Schelling point, maintained by the same forces that maintain simpler Schelling points in a state of nature. Property ownership is alterable by contract because Schelling points are altered by the making of contracts. Legal rules are in large part a superstructure erected upon an underlying structure of self-enforcing rights.«


comment by Lionel Levine · 2022-08-04T22:14:10.049Z · LW(p) · GW(p)

So, boundaries enable cooperation, by protecting BATNA.

Would you say there is a boundary between cell and mitochondria?

In the limit of perfect cooperation, the BATNA becomes minus infinity and the boundary dissolves.

comment by MSRayne · 2022-07-27T00:30:02.979Z · LW(p) · GW(p)

Wonderful! As usual, people smarter than me manage to actually put into words what has been floating around my mind as vague ideas for years: I've long suspected that boundaries are the fundamental thing in defining human values, but I have never been able to figure out how exactly to explicate that intuition, other than figuring out how a wide range of emotions relate to intuitive interpretations of movement of boundaries. (Which I'm thinking of trying to write a post about.) I'm looking forward to seeing what you have to say on this subject.

comment by Cullen (Cullen_OKeefe) · 2022-08-12T21:13:46.792Z · LW(p) · GW(p)

ELI5-level question: Is this conceptually related to one of the key insights/corollaries of the Coase theory, which is that efficient allocations of property requires clearly defined property rights? And, the behavioral econ observation that irrational attachment to the status quo (e.g., endowment effect) can prevent efficient transactions?

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2022-08-12T21:32:25.663Z · LW(p) · GW(p)

It's Coase's Theorem, and yeah now I do see a relationship, at least distantly. It's an interesting connection, in some sense.

comment by Максим Богомолов (maksim-bogomolov) · 2022-08-09T10:40:59.401Z · LW(p) · GW(p)

Hi, recently (just three months ago) I have suggested similar and slightly broader conception of boundaries. Taking anything as a concept in a mind of rational agents (just because rational agents need to make a concept or models  of objects to interact with them as rational agents) anything have boundaries: abstract ideas or living beings. Such boundaries include time period of existence of any being, domains of functions, terms of existence or any other conditions under which something sustain its existence and sense. For example, idea of freedom of agent becomes nonsense if we try to implement agent's freedom out of terms of existence of that agent. Now, when I have met your article, I think such a broad conception of boundaries could be also formalized in similar way. But maybe some taxonomy of boundaries is needed firstly.

Glad to see that people from other parts of world thinks in similar way and going even further.

Here's my article (in Russian): https://vk.com/@317703-ogranichennaya-racionalnost

comment by Flaglandbase · 2022-07-28T08:33:42.651Z · LW(p) · GW(p)

For a human, the most important boundary is whatever contains the information in their brain. This is not just the brain itself, but the way the brain is divided by internal boundaries. This information could only be satisfactorily copied to an external device if these boundaries could be fully measured. 

comment by Tobias Brown · 2023-07-18T13:19:29.694Z · LW(p) · GW(p)

"Boundaries," as discussed in "Boundaries", Part 1, is a crucial but often overlooked concept in utility theory. It refers to the limitations individuals set to protect their well-being, happiness, and resources. Understanding boundaries helps in making rational decisions, maintaining balance, and avoiding undue stress or exploitation in various situations.