Optimizing for Agency?

post by Michael Soareverix (michael-soareverix) · 2024-02-14T08:31:16.157Z · LW · GW · 1 comment

This is a question post.

Contents

  The core idea is that this is an AI that will help you do whatever you want but will prevent you from reducing the agency of others.
None
  Answers
    6 Lauren (often wrong)
None
1 comment

It's commonly accepted that pretty much every optimization target results in death. If you optimize for paperclips, humans die. If you optimize for curing cancer, humans die.

What about optimizing for agency?

The way I visualize this is after a superintelligence takeover, and the superintelligence is optimizing for intelligent agency, all intelligent beings (including animals) have a sort of 'domain' of agency around them. This domain extends as far as possible but ends when it comes into contact with another agent's domain.

For example, let's say you're hungry. You have maximum agency over this area, so you request a burrito and the AI summons a drone, which speeds over to you and drops you a Chipotle burrito.

The superintelligence is constantly balancing agency between people.

One person might want to build a house. They're walking across a field and decide "There should be a house here." The AI then begins rapidly moving construction robots and building a house. Agency for the person is maximized.

What if someone else wants that field to remain a field?

Then, the agency domains clash. The AI attempts to preserve the agency of both individuals (without manipulating them, since manipulation would reduce agency).  This would result in some kind of negotiation between the two individuals, where both parties end negotiations with their agency still maximized. Maybe another area is used to build a house. Maybe the house is built, but some land is given to the other person.

The core idea is that this is an AI that will help you do whatever you want but will prevent you from reducing the agency of others.

 

You could train an AI like this to get familiar maximizing the agency of humanlike LLMs in a simulation. At each point, the simulated human (which an LLM) has a set of options open to them. The AI tries to maximize those options without reducing the options of other agents. An AI that became superintelligent at this should hopefully allow us to cure cancer or create beautiful new worlds without also wiping everyone out.

 

The outcome of this type of AI might be weird, but it hopefully won't be as bad as extinction.

-Essentially, no one would ever be able to kill or harm others without full consent of both parties, since being killed/harmed reduces agency.

-Conscious beings would be essentially immortal and invincible, even creatures like fish.

-Everyone might end up in a simulation, as this would give the AI perfect fine-grained control so that everyone could live in their own agency-maxed world. Ideally, having agency would allow the people in the simulation to have the agency to exit the simulation and do things in reality if they desired. They could also always go back into the simulation if they wanted. This is hopefully what would happen when you maximize agency.

-People should also be free from addictive stimuli, since addiction reduces agency. People should always have the agency to 'brainwash' themselves and remove addiction. Additionally, the AI should be able to intervene to ensure maximum agency if it deems that the person has lost agency to addiction.

 

Problems:

-People might not be able to die, or sleep, as these technically reduce their agency. Hopefully, people should be able to voluntarily reduce their agency if they wanted to sleep for a while, or permanently reduce their agency by passing on. Not being allowed to sleep is a reduction of agency, just as not being allowed to die is a reduction of agency.

-This is roughly similar to the 'optimize for all human utility functions' as discussed in this post: https://www.lesswrong.com/posts/wnkGXcAq4DCgY8HqA/a-case-for-ai-alignment-being-difficult#It_is_hard_to_specify_optimization_of_a_different_agent_s_utility_function

-I'm not sure how agency should scale with intelligence, especially if people want to become superintelligent.

-People in this AI-optimized world might become extremely self-centered and lose some of the meaning that we find in humanity.

-Defining agency is tough in its own right.

 

I think the form for a solution to AI Alignment is found somewhere around this problem. Something about balancing many optimizations to create an equilibrium, while retaining a world where consciousness can flourish, is a key. Our world is currently balanced because everyone has some amount of agency, so we use systems of cooperation to reach for better ends.

 

Part of a thought experiment I've been doing where I consider a superintelligence optimizing for different targets:

-Agency

-Normality

-Beauty

-Narrative

Answers

answer by the gears to ascension (Lauren (often wrong)) · 2024-02-14T10:48:47.055Z · LW(p) · GW(p)

This is an interesting idea that is being explored, but how do you nail it down precisely so that the superintelligence is actually interested in optimizing for it, and so that the beings whose agency is being optimized for are actually the ones you're interested in preserving? Identifying the agents in a chunk of matter is not a solved problem. Eg, here's a rough sketch of the challenge I see, posed as a question to a hypothetical future LLM [LW(p) · GW(p)] (I know of no LLM capable of helping significantly with this, GPT4 and Gemini Advanced have both been insufficient. I'm hopeful the causal incentives group hits another home run like Discovering Agents and nails it down.)

Meanwhile, the folks who have been discussing boundaries seem to maybe possibly be onto something about defining a zone of agency, maybe. I'm not totally sure they have anything to add on top of Discovering Agents.

Cannell has also talked about "empowerment of other" - Empowerment is the term of art for what you're proposing here.

It always comes down to the difficulty of making sure the superintelligence's agency is actually seeking agency for others, rather than a facsimile of agency for others that turns out to just be pictures of agency.

comment by Gunnar_Zarncke · 2024-12-06T16:46:34.797Z · LW(p) · GW(p)

Cannell has also talked about "empowerment of other"

Do you mean this? Empowerment is (almost) All We Need [LW · GW]

folks who have been discussing boundaries ... zone of agency

and this: Agent membranes/boundaries and formalizing “safety” [LW · GW]

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-12-06T21:04:12.492Z · LW(p) · GW(p)

Yes to both. I don't think Cannell is correct about an implementation of what he said being a good idea, even if it was a certified implementation, and I also don't think his idea is close to ready to implement. Agent membranes still seem at all interesting, right now as far as I know the most interesting work is coming from the Levin lab (tufts university, michael levin), but I'm not happy with any of it for nailing down what we mean by aligning an arbitrarily powerful mind to care about the actual beings in its environment in a strongly durable way.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2024-12-07T14:18:28.392Z · LW(p) · GW(p)

I'm not clear about what research by Michael Levin you mean. I found him mentioned here: «Boundaries», Part 3b: Alignment problems in terms of boundaries [AF · GW] but his research seems to be about cellular computation, not related to alignment.

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2024-12-08T12:20:36.451Z · LW(p) · GW(p)

https://www.drmichaellevin.org/research/

https://www.drmichaellevin.org/publications/

it's not directly on alignment, but it's relevant to understanding agent membranes. understanding his work seems useful as a strong exemplar of what one needs to describe with a formal theory of agents and such. particularly interesting is https://pubmed.ncbi.nlm.nih.gov/31920779/

It's not the result we're looking for, but it's inspiring in useful ways.

comment by Michael Soareverix (michael-soareverix) · 2024-02-15T06:40:30.249Z · LW(p) · GW(p)

Super interesting!

There's a lot of information here that will be super helpful for me to delve into. I've been bookmarking your links.

I think optimizing for the empowerment of other agents is a better target than giving the AI all the agency and hoping that it creates agency for people as a side-effect to maximizing something else. I'm glad to see there's lots of research happening on this and I'll be checking out 'empowerment' as an agency term.

Agency doesn't equal 'goodness', but it seems like an easier target to hit. I'm trying to break down the alignment problem into slices to figure it out and agency seems like a key slice.

Replies from: lahwran, tristan-tran
comment by the gears to ascension (lahwran) · 2024-02-15T21:23:37.321Z · LW(p) · GW(p)

the problem is there are going to be self-agency-maximizing ais at some point and the question is how to make AIs that can defend the agency of humans against those.

comment by Tristan Tran (tristan-tran) · 2024-05-30T23:50:05.153Z · LW(p) · GW(p)

With optimization, I'm always concerned with the interactions of multiple agents, are there any ways in this system that two or more agents could form cartels and increase each others agency. I see this happen with some reinforcement learning models where if some edge cases aren't covered, then they will just mine each other for easy points thanks to how we set up the reward function.

1 comment

Comments sorted by top scores.

comment by Mitchell_Porter · 2024-02-14T16:31:05.272Z · LW(p) · GW(p)

This would be an e/acc-consistent philosophy if you added that the universe is already optimizing for maximum agency thanks to non-equilibrium thermodynamics and game theory.