Reaction to "Empowerment is (almost) All We Need" : an open-ended alternative
post by Ryo (Flewrint Ophiuni) · 2023-11-25T15:35:41.950Z · LW · GW · 3 commentsContents
3 comments
The upcoming framing is analogous to a procedure asking :
"Is [this option] really giving [more options] to the highest number of beings?"
Takeaway : In the worst case, which many argue is by default, humans will create a maximizer [? · GW]. I argue that the least harmful thing for a maximizer (AI or not) to optimize is the [diversity of states in the universe], in a non-myopic procedure with ecosystemic synergy.
(Long-term co-affordance of options)
If the following analysis is right, it inherently turns a maximization into valuable outputs.
The reason behind this process unfolds the basis of tractable universal ethics.
This discussion is a reaction to jacob_cannell [LW · GW]'s post [LW · GW];
Which presented AI alignment through (altruistic) "empowerment of others"
TLDR : From what I read in the comments we could condense the issue to the difficulty of defining "power" (in this context "other-empowerment"; opposed to self-empowerment), because it may lead AI to "paralyze" humans/agents in order to "expand" action potential (which is very myopic), or inversely AI may forbid humans/agents to be sub-optimal (ie. you can't exile in the mountains).
To me the crucial node/difficulty is a threat (mentioned here [LW(p) · GW(p)]) that AI will "erase all practical differences between you and the maximally longtermist version of you"; which is why I propose an alternative based on optionality and diversity.
────────────────────────
Universal optionality :
There is a unique opportunity in our space-time "zone" to bloom unique properties that do not exist in other places of the global space-time (so compared to other past, present and future space-time)
+ The end-state ought to be with the most diversity possible
Open-ended evolution : “refers to the unbounded increase in complexity that seems to characterize evolution on multiple scales” (Corominas-Murtra, Seoane & Solé, 2018).
Irreducibility : "While many computations admit shortcuts that allow them to be performed more rapidly, others cannot be sped up. Computations that cannot be sped up by means of any shortcut are called computationally irreducible." Computational Irreducibility – Wolfram MathWorld
Open-ended ecosystems have irreducible states/features/options that can only appear at the right unique ("endemic") speed rate,
It also creates an inherent uncertainty that gives humans/agents more degree of freedom (you can be somewhat sub-optimal and not strictly empower yourself).
────────────────────────
In the worst case, which many argue is by default, humans will create a maximizer [? · GW].
I argue that the less harmful thing for a maximizer to optimize is the [diversity of states in the universe] in a non-myopic procedure with ecosystemic synergy.
By "diversity of states in the universe" I mean :
To have the least myopic form of diversity possible, which takes into account futur and past states of the universe. This leads to the necessity of ecosystems, and caring about it; because you can't open the universe towards more states without ecosystems. Every action has to be carefully weighted, because it may destroy more states than it creates.
The challenge of alignement is to find a problem which resolution leads AI to not simply do what we, humans, think we want (with our delusions); but what we actually need [? · GW].
More than the "less harmful" outcome, I think that even in the best case what we actually need is a high number of qualitatively different states;
The more different and synergistic (co-increasing options) they are, the better it is; it's a procedure to augment the number of 'diversity dimensions'.
For exemple : 1010101010 has less diversity dimensions than 0123456789.
Because true diversity is not simply about alternating two states linearly, we want a procedure that cares about creating more options and "complexity"/"sophistication".
But it is often argued to not use such vague words.
Let's clarify these words then :
1) Qualitatively diverse structures/patterns/stimuli computed/circulating in a system
-> In the continuous space of all the possible different states (a sort of semantic space),
the more distance there is between two states, the more qualitative is the difference;
It's like colors in a chromatic circle. It's a latent space. "Black" is more dissimilar to "white" than it is to "grey". A "cat" is closer to a "dog" than to a "chair".
(Individual objects are a category, ie. [my blue dog], but very similar to other objects of a same meta-category; ie. [blue dogs], with which they share a low diversity "index")
So what I mean by complexity/sophistication is closely related to this qualitative diversity. You simply combine 1) with :
2) The causal power of a system
-> Number of qualitatively different external changes caused by the system/sub-system
Such [diversity of states in the universe] procedure leads to the maximization of open-endedness, which demands to care about ecosystems, without which things collapse; and qualitative diversity disappears.
-> This is the first layer, before anything else, allowing a general, primal alignment.
The goal is maximal synergy,
In maximal diversity,
With maximal optionality.
The inverse of alterity (otherness) is ipseity (selfness)
I am talking about the in-between : "synergity"
You can add more and more explicit layers to calibrate the triangulation of our preferences through things such as :
But the more fundamental a layer is, the more it supersedes others, taking priority in decision-making (also more procedural, implicit, avoiding reward tunnels/squiggles)
For example it could be (increasingly explicit, from more to less fundamental) :
Open-ended diversity
Universal optionality
Ecosystemic synergy
Other-empowerment
There is a high and continuous overlap;
Upon such basis, upper layers/tools can bring more precise/local well-being :
.What The Longest-Running Study on Happiness Reveals
.Moral foundations theory
.Seven moral rules found all around the world
.Modular Morals: The Genetic Architecture of Morality as Cooperation
.Morality and Evolutionary Biology
+https://plato.stanford.edu/entries/morality-biology/
.Profiles of an Ideal Society: The Utopian Visions of Ordinary People
────────────────────────
The aim is to provide affordance, access to adjacent possibles, expanding the action space and autonomy of a maximum of phenomenons. This optionality approach could potentially solve lots of long-standing thorny problems in consequentialism [? · GW], like wireheading [? · GW]or the fiendish difficulty of defining happiness/utility; and how even making the tiniest mistake in that definition can be precisely catastrophic.
For instance : local/myopic/short-timed happiness is regularly detrimental (egocentric interests, addictions, etc.), while local suffering can be beneficial (training, family diner).
Open-endedness and optionality at the scale of [the whole universe's past and future timeline] implies that keeping the process at the perfect (critical) sweet-spot allows, at the end (widest time scope), the most diversity of states/emergence/open-endedness.
It lets the time for ecosystems to grow and interact, so that they can bloom/mutate the unique/irreducible causal chains that cannot happen with higher (or slower) speed rate.
And it can be non-linear ie. you sometimes need to restrict the number of options for one to not be frozen by possibilities. To increase options implies an increase of the capacity to chose. The limit is that those choices (ideally) have to not destroy meaningful choices of other agents/humans/wildlife... After which point choosers can opt for themselves however they fit. And always increasing this optionality, preempting lock-ins.
While the more an AI knows you, the more it can assist you, as we've seen the primal layer would not exactly be *empowerment of others*, but a more universal care.
It might lead to a certain increase of uncertainty, but it's more a feature than a bug;
-> AI will have a respect to your [(irreducible) preference] proportional to its concern for a wider ecosystem full of things.
AIs are here to arbitrate conflicting actions that go against universal open-endedness; local sub-optimality is so complex to define as such, how to be sure that actually when you (human) decide to exile in the mountain it will not in fact be both good for you and for everybody on the long-run?
-> This uncertainty leads to an implicit degree of freedom which we can also explicitly increase further; to relax utilitarian [? · GW]mistakes through more tolerance for sub-optimality.
────────────────────────
Some of my important references :
.Paretotopia [? · GW]
.Learning Altruistic Behaviours in Reinforcement Learning without External Rewards
.The Hippocratic principle - Vanessa Kosoy [LW(p) · GW(p)]
.OMNI: Open-endedness via Models of human Notions of Interestingness
.4. SKIM THE MANUAL | Intelligent Voluntary Cooperation & Paretotropism
.Report on modeling evidential cooperation in large worlds [LW · GW]
.The Capability Approach to Human Welfare [EA · GW]
.Resurrecting Gaia: harnessing the Free Energy Principle to preserve life as we know it
.From AI for people to AI for the world and the universe
.Autonomy in Moral and Political Philosophy
3 comments
Comments sorted by top scores.
comment by jacob_cannell · 2023-11-25T21:48:37.508Z · LW(p) · GW(p)
For a single human/agent assume they have some utility function 'u' over future world trajectories: - which really just says they a preference ranking over futures. A reasonable finite utility function will decompose into a sum of discounted utility over time: and then there are some nice theorems indicating any such utility function converges to - and thus is well approximated by - empowerment (future optionality - a formal measure of power over future world states). However the approximation accuracy converges only with increasing time, so it only becomes a perfect approximation in the limit of discount factor approaching 1.
Another way of saying that is: all agents with a discount factor of 1 are in some sense indistinguishable, because their optimal instrumental plans are all the same: take control of the universe.
So there are three objections/issues:
- Humans are at least partially altruistic - so even when focusing on a single human it would not be correct to optimize for something like selfish empowerment of their brain's action channel
- Humans do not have a discount factor of 1 and so the approximation error for the short term component of our utility could cause issues
- Even if we assume good solutions to 1 and 2 (which I'm optimistic about), its not immediately very clear how to correctly use this for more realistic alignment to many external agents (ie humanity, sapients in general, etc) - ie there is still perhaps a utility combination issue
Of these issues #2 seems like the least concern, as I fully expect that the short term component of utility is the easiest to learn via obvious methods. So the fact that empowerment is a useful approximation only for the very hard long term component of utility is a strength not a weakness - as it directly addresses the hard challenge of long term value alignment.
The solutions to 1 and 3 are intertwined. You could model the utility function of a fully altruistic agent as a weighted combination of other agent's utility functions. Applying that to partially altruistic agents you get something like a pagerank graph recurrence which could be modeled more directly, but it also may just naturally fall out of broad multi-agent alignment (the solution to 3).
One approach which seems interesting/promising is to just broadly seek to empower any/all external agency in the world, weighted roughly by observational evidence for that agency. I believe that human altruism amounts to something like that - so children sometimes feel genuine empathy even for inanimate objects, but only because they anthropomorphize them - that is they model them as agents.
Replies from: Flewrint Ophiuni↑ comment by Ryo (Flewrint Ophiuni) · 2023-11-26T01:18:33.119Z · LW(p) · GW(p)
All right! Thank you for the precision,
Indeed the altruistic part seems to be interestingly close to a broad 'world empowerment', but I've some doubts about a few elements surrounding this : "the short term component of utility is the easiest to learn via obvious methods"
It could be true, but there are worries that it might be hard, so I try to find a way to resolve this?
If the rule/policy to choose the utility function is a preference based on a model of humans/agents then there might be ways to circumvent/miss what we would truly prefer (the traction of maximization would cross the limited sharpness/completeness of models), because the model underfits reality (which would drift into more and more divergence as the model updates along the transformations performed by AI)
In practice this would allow a sort of intrusion of AI into agents to force mutations.
So,
Intrusion could be instrumental
Which is why I want to escape the 'trap of modelling' even further by indirectly targeting our preferences through a primal goal of non-myopic optionality (even more externally focused) before guessing utility.
If your #2 is a least concern then indeed those worries aren't as meaningful
↑ comment by Ryo (Flewrint Ophiuni) · 2023-11-26T10:47:44.022Z · LW(p) · GW(p)
I'm also trying to avoid us becoming grabby aliens, but if
-> Altruism is naturally derived from a broad world empowerment
Then it could be functional because the features of the combination of worldwide utilities (empower all agencies) *are* altruism, sufficiently to generalize in the 'latent space of altruism' which implies being careful about what you do to other planets
The maximizer worry would also be tamed by design
And in fact my focus on optionality would essentially be the same to a worldwide agency concern (but I'm thinking of an universal agency to completely erase the maximizer issue)