Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

post by Roland Pihlakas (roland-pihlakas) · 2025-01-12T03:37:59.692Z · LW · GW · 0 comments

Contents

  Introduction
  Why Utility Maximisation Is Insufficient
  Homeostasis as a More Correct and Safer Goal Architecture
    1. Multiple Conjunctive Objectives
    2. Task-Based Agents or Taskishness — “Do the Deed and Cool Down”
    3. Bounded Stakes: Reduced Incentive for Extremes
    4. Natural Corrigibility and Interruptibility
  Diminishing Returns and the “Golden Middle Way”
  Formalising Homeostatic Goals
  Parallels with Other Ideas in Computer Science
  Open Challenges and Future Directions
  Addendum about Unbounded Objectives
  Conclusion
None
No comments

I notice that there has been very little if any discussion on why and how considering homeostasis is significant, even essential for AI alignment and safety. Current post aims to begin amending that situation. In this post I will treat alignment and safety as explicitly separate subjects, which both benefit from homeostatic approaches.

This text is a distillation and reorganisation of three of my older blog posts at Medium: 

I will probably share more such distillations or weaves of my old writings in the future.


Introduction

Much of AI safety discussion revolves around the potential dangers posed by goal-driven artificial agents. In many of these discussions, the agent is assumed to maximise some utility function over an unbounded timeframe. This simplification, while mathematically convenient, can yield pathological outcomes. A classic example is the so-called “paperclip maximiser”, a “utility monster” which steamrolls over other objectives to pursue a single goal (e.g. creating as many paperclips as possible) indefinitely. “Specification gaming”, Goodhart’s law, and even “instrumental convergence” are also closely related phenomena.

However, in nature, organisms do not typically behave like pure maximisers. Instead, they operate under homeostasis: a principle of maintaining various internal and external variables (e.g. temperature, hunger, social interactions) within certain “good enough” ranges. Going far beyond those ranges — too hot, too hungry, too socially isolated — leads to dire consequences, so an organism continually balances multiple needs. Crucially, “too much of a good thing” is just as dangerous as too little.

In this post, I argue that an explicitly homeostatic, multi-objective model is a more suitable paradigm for AI alignment. Moreover, correctly modelling homeostasis increases AI safety, because homeostatic goals are bounded — there is an optimal zone rather than an unbounded improvement path. This bounding lowers the stakes of each objective and reduces the incentive for extreme (and potentially destructive) behaviours.


Why Utility Maximisation Is Insufficient

In the standard utility maximisation framework, the agent’s central goal is to increase some singular measure of “value” or “reward” over time. Because most real-world objectives are unbounded — for instance, economic growth can always be increased by some amount — maximisers tend to push that objective beyond any reasonable point. This dynamic is what yields the risk of “berserk” behaviour.

By contrast, homeostasis signals that there is an optimal zone — an “enough” amount — for each objective. Pursuing more than enough is wasteful and even harmful. Representing such objectives as unbounded maximisation objectives would be grossly inaccurate. In the example of eating and drinking — both activities need to be balanced with some time granularity.


Homeostasis as a More Correct and Safer Goal Architecture

1. Multiple Conjunctive Objectives

Real organisms typically have several needs (food, water, social connection, rest, etc.) that must all be satisfied to at least a “sufficient” level. In a homeostatic agent:

When an AI has multiple conjunctive goals, it cannot ignore most while excessively optimising just one. Instead, the synergy among objectives drives it toward a safer, middle-ground strategy. In effect, massive efforts in any single dimension become exponentially harder to justify because you are “pulling away” from the target ranges in others.

2. Task-Based Agents or Taskishness — “Do the Deed and Cool Down”

Homeostatic goals naturally lead to bounded or “task-based” behaviour rather than indefinite optimisation. If all current objectives are satisfied, a homeostatic agent is content to remain idle. It does not keep scanning the universe for hypothetical improvements once the setpoints are reached. This “settle to rest” feature significantly limits the potential damage the agent might cause. Even if such an agent accidentally does something very harmful on a personal level, it is less likely to affect entire nations.

For example, an AI tasked with making 100 paperclips in a homeostatic framework will strive to produce enough paperclips to satisfy the “paperclip count” objective. Once the target is met (often even just approximately) — and so long as other objectives are also in a comfortable range — it does not chase infinitely more micro-optimisations.

3. Bounded Stakes: Reduced Incentive for Extremes

Because each objective is bounded, the stakes of discrepancy in one dimension remain limited by the balance with other dimensions. The system cannot keep turning up the efforts in a single dimension without noticing detrimental side effects (like going off course in every other dimension). This effect reduces the likelihood of extremist outcomes.

4. Natural Corrigibility and Interruptibility

A hallmark of AI alignment is ensuring that we can correct or shut down an agent if it goes off track. Simple maximisers often resist shutdown because they interpret it as a threat to their singular goal. By contrast, a homeostatic system:


Diminishing Returns and the “Golden Middle Way”

An important corollary of multi-objective homeostasis is diminishing returns. Once a target is nearly satisfied in one dimension, the agent’s “bang for the buck” in that dimension is no longer large; it becomes more “cost-effective” to address other objectives that are further from their optimum. Thus:

This “golden middle way” extends even to safety measures themselves. An AI with an excessively strong safety constraint might be tempted to devote the entire universe’s resources to verifying it is 100% safe — unless that safety measure also has a built-in diminishing returns principle. By bounding safety checks themselves, we avoid another pathological scenario: the AI going to extremes to confirm it has never ever caused any harm, while ignoring the rest of its objectives. I hope the trade here is clear: nobody would build or give resources to an agent that is safe to such an extent, therefore it would be a losing strategy.


Formalising Homeostatic Goals

In reinforcement learning or other goal frameworks, one can use a loss or negative-utility term for each objective. Each objective is measured against its target (also known as “setpoint”), and both “too little” and “too much” push the system away from equilibrium. Summing or combining these losses in a way that penalises big deviations in any single dimension more strongly than small deviations in many fosters a “balancing” behaviour. One simple formula might be:

Here:

Some objectives can also be framed as “safety constraints” about not disturbing the environment too much, and these can be folded into the same overall homeostatic reward system. These would be the “low impact” and "minimal side effects" objectives, sometimes also “keep future options open” and “maximise human autonomy” objectives — “negative goals” by their nature, as these objectives are about not changing the existing state, as opposed to achieving some new state as is the case with “positive goals” which are usually the “performance” objectives.


Parallels with Other Ideas in Computer Science

As machine-learning-minded ones of you may notice, this formula above is very similar to how a regression algorithm works. Except in this case the squared errors are computed from plurality of objectives, not plurality of data points. In both cases the motivation is avoiding overfitting to a subset of data or objectives.

I propose an additional perspective: The distinction between constraints versus objective functions in combinatorial optimisation is analogous to the distinction between safety objectives and performance objectives. 

It is notable that in combinatorial optimisation problems the concept of constraints have been naturally considered as part of the setup, alongside the concept of objectives. In contrast, this unfortunately has often not been the case in the use of reinforcement learning. Yet in reality both performance and safety objectives should be present. 

The difference between "safety objectives" versus "constraints in combinatorial optimisation" can be that various safety objectives might considered as “soft” constraints — they can be traded off up to a point, but not too much. When they are violated notably, they become increasingly similar to hard constraints in their effects.

Though in special use cases, some safety considerations can also be treated as “hard” constraints when comparing to performance objectives, while being still traded off or balanced among plurality of other safety objectives. (These are not just theoretical thoughts here — I have implemented such advanced homeostatic setups for example in multi-objective workforce planning algorithms that have been successfully time-tested for the past 15 years by now.)

Finally, the most obvious parallel — control systems and control theory. This framework treats objectives inherently as bounded and homeostatic. It supports changing setpoints, having multiple dimensions/objectives, and they can even be hierarchical. Cybernetics is a broader closely related field which is worth mentioning. It is curious that it has evolved largely separately from AI. Is there a reason we don’t talk more about cybernetic alignment instead of AI alignment?


Open Challenges and Future Directions

While promising, a multi-objective homeostatic approach has many subtleties:

  1. Time Granularity
    • In the example of eating and drinking — both activities need to be balanced with some time granularity. It would not be necessary nor productive to keep these objectives maximally balanced at all times. That would mean for example, that an agent runs between food and drink sources without stopping for more than a smallest timestep. Which would in the end result in collecting "equally" less than necessary on both objectives, since most of time would be spent on switching tasks. Deciding an optimal time granularity during which unequal treatment to competing objectives is allowed, requires a bit of strategic thinking on the agent’s part.
  2. Handling Evolving Targets
    • Objectives (or their setpoints) may change over time, especially when humans are involved. Ideally, the agent should accept these new targets without undue manipulation such as resisting the changes, escaping, or reverting the objectives. Likewise, the agent should not be inducing the changes. In the current post I mentioned the general principles for solving this, but details still need to be worked out and tested in practice.
  3. Soft vs. Hard Constraints
    • Not all objectives are created equal. Some constraints (e.g. “don’t kill humans”) might be hard constraints with extremely high weight or even lexicographic priority. But introducing lexicographic ordering may require some hard decisions. Additionally, one must be diligent and make sure that these higher-priority objectives could not cause extreme losses in unacceptable dimensions of low-priority objectives. Alternatively, one should ensure that the higher-priority objectives include all objectives where extreme losses are undesirable.
  4. Goodhart’s Law and Multi-Scale Measures
    • We might measure safety and low impact at multiple “levels” (individual, community, global, etc.). Aggregating or normalising them incorrectly risks inadvertently re-enabling Goodhart-like exploits. As an example, there is a possible error where one large aggregated discrepancy (lets say, at gender level) is split up into multiple small discrepancies (at individual level) while being insufficiently represented at the higher level.
  5. Coordination of Multiple Agents
    • If multiple homeostatic agents co-exist, how do they avoid interfering with each other’s setpoints? Most likely some authorisation and power hierarchy framework is relevant here. Such a system would then inevitably need to also include mechanisms for accountability auditing of the higher-ups who need to be held liable when a subordinate system executes unsafe actions down the line.
  6. Tradeoffs Between “Interruptibility” and “Getting Work Done”
    • Ideally, we want the agent to do its job of achieving performance objectives but also remain open to external corrections or even a shutdown. Balancing these is partly social/political and partly technical. Yet there is a well known risk of systems becoming too important to shut down. Perhaps aligned systems could be throttled gradually, so a “shutdown” command becomes a “chill down a bit” command.
  7. Integration of Bounded and Unbounded Objectives
    • Even after introduction of the concept of homeostatic objectives, sufficiently many performance objectives can still belong to the unbounded class. In contrast to bounded objectives, on which I focused in this post, unbounded objectives could reach infinite positive rewards. Yet these potential infinite rewards should not dominate safety objectives or even exclude balancing of other performance objectives. This topic is explored in a bit more detail in the addendum below.
  8. Rate-Limited Objectives
    • Even though homeostatic objectives are bounded locally in time, in various cases they can be unbounded across time. For example, a human need for novelty may be satiated for a day, but it will arise again the next day. This dynamic can be described in other words as rate-limited objectives. Rate limiting is important also for sustainability purposes — to avoid exhausting the renewable resources in the environment. So homeostasis fulfills a sort of balancing role here as well. I imagine, the expansion of humanity could often be described from such a perspective too — humanity may have an intrinsic need to grow to live even outside of our planet Earth one day, but such growth needs to be sustainable.

These are open research challenges, but a homeostatic approach at least provides a conceptual blueprint of how to incorporate multi-dimensional, bounded goals into AI systems without inviting the pathological extremes of naive maximisation. This post is a conversation starter and there is so much to explore further.


Addendum about Unbounded Objectives

I fully acknowledge that various instrumental objectives could still have unbounded nature. For example, accumulating money, building guarantees against future risks, etc. (Though in case of risks, people familiar with the concept of antifragility may argue that reducing risk too much in the short term would paradoxically mean more fragility in the long run).

These unbounded objectives would still become safer when the above described homeostatic principles are considered in relevant objectives alongside.

Additionally, unbounded objectives are usually described by concave utility functions in economics. In other words, these objectives too have their own form of diminishing marginal returns. Besides biology, economics is the other fundamental and well-established field describing our needs. AI alignment surely should consult both. An aligned agent should be able to model diminishing returns and thus balancing between plurality of unbounded objectives just as well as it should be able to properly model homeostasis. As economists have observed, humans prefer averages in all objectives to extremes in a few.

You can read more about balancing multiple unbounded objectives via the mechanism of considering diminishing returns from the following blog posts I have co-authored:


Conclusion

Homeostasis — the idea of multiple objectives each with a bounded “sweet spot” — offers a more natural and safer alternative to unbounded utility maximisation. By ensuring that an AI’s needs or goals are multi-objective and conjunctive, and that each is bounded, we significantly reduce the incentives for runaway or berserk behaviours.

Such an agent tries to stay in a “golden middle way”, switching focus among its objectives according to whichever is most pressing. It avoids extremes in any single dimension because going too far throws off the equilibrium in the others. This balancing act also makes it more corrigible, more interruptible, and ultimately safer.

In short, modelling multi-objective homeostasis is a step toward creating AI systems that exhibit the sane, moderate behaviours of living organisms — an important element in ensuring alignment with human values. While no single design framework can solve all challenges of AI safety, shifting from “maximise forever” to “maintain a healthy equilibrium” is a crucial part of the solution space.


Thanks for reading! If you have thoughts, questions, improvement suggestions, resource and collaborator references, feedback, ideas, or alternative formulations for multi-objective homeostasis, please share in the comments.

0 comments

Comments sorted by top scores.