What are some claims or opinions about multi-multi delegation you've seen in the memeplex that you think deserve scrutiny?

post by Quinn (quinn-dougherty) · 2021-06-27T17:44:52.389Z · LW · GW · 2 comments

This is a question post.

Contents

  Context
  Claims, opinions
None
  Answers
    7 paulfchristiano
    1 NPCollapse
None
2 comments

I think multi-multi is really hard to think about. One of the first steps I'm taking to get less confused about it is to scrutinize claims or opinions that I've encountered in the wild.

Context

Critch & Krueger 2020 primarily discuss delegation, which is described as "when some humans want something done, those humans can delegate responsibility for the task to one or more AI systems." (p19). Delegation is in fact the composition of three subproblems; comprehension ("the human ability to understand how an AI system works and what it will do"), instruction ("the human ability to convey instructions to an AI system regarding what it should do"), and control ("the human ability to retain or regain control of a situation involving an AI system, especially in cases where the human is unable to successfully comprehend or instruct the AI system via the normal means intended by the system’s designers"). The four flavors of delegation are single-(human principal)/single-AI-(system), single-(human principal)/multi-AI-(systems), multi-(human principals)/single-AI-(system), and multi-(human principals)/multi-AI-(systems).

Naturally, the traditional "alignment problem" is roughly single-single delegation, especially single-single control. Aspects of single-multi can be made sense of in light of Dafoe et. al. 2020, and aspects of multi-single can be made sense of in light of Baum 2020, but it's difficult to find even minimal footholds in multi-multi.

Here's some notes from Critch about what makes multi-multi problematic:

I almost never refer to a multi-multi alignment, because I don’t know what it means and it’s not clear what the values of multiple different stakeholders even is. What are you referring to when you say the values are this function? ... So, I don’t say multi-multi alignment a lot, but I do sometimes say single-single alignment to emphasize that I’m talking about the single stakeholder version. I think the multi-multi alignment concept almost doesn’t make sense. ... And the idea that there’s this thing called human values, that we’re all in agreement about. And there’s this other thing called AI that just has to do with the human value says. And we have to align the AI with human values. It’s an extremely simplified story. It’s got two agents and it’s just like one big agent called the humans. And then there’s this one big agent called AIs. And we’re just trying to align them. I think that is not the actual structure of the delegation relationship that humans and AI systems are going to have with respect to each other in the future. And I think alignment is helpful for addressing some delegation relationships, but probably not the vast majority.

Claims, opinions

I have two, I'd like your help in expanding this list.

If you have an opinion that you think might be yours that you don't think qualifies as "in the memeplex", I hope you share it anyway! I'm also really happy if you pontificate about intuitions you have or bottlenecks you see here in the answers. In general, fitting your answer into my project is my problem, not yours. This is also an invitation for you to DM me what confuses you about multi-multi, why you think it might be hard or easy, etc.

Answers

answer by paulfchristiano · 2021-06-27T18:49:04.252Z · LW(p) · GW(p)

Today we have a number of approaches to coordination---we sign contracts, create firms with multiple shareholders, vote in democracies, and so forth. I think the starting point for multiple AIs interacting with multiple humans is:

  • AI systems use similar mechanisms to coordinate amongst each other--e.g. my AI and your AI may sign a contract, or voters may have AI delegates who vote for them or advise them on how to vote.
  • Some AI systems are deployed by (and hopefully aligned with) a collective decision-making process--e.g. a firm may decide to deploy an AI CEO which is overseen by the board, or a government agency may deploy an AI to enforce regulations which is overseen by a bureaucratic process.
  • We may interleave those two basic approaches in complex ways, e.g. a firm with AI shareholders may itself deploy an AI, which may sign contracts with other firms' AIs, which are in turn enforced by other AIs who are overseen by a process defined by that contract...

(And regardless of what happens on the object level, AIs and humans will continue improving our methods for cooperation/governance/oversight.)

When I think about this topic I'm mostly interested in ways that this "default" falls short (or may be unrealistic/impractical).

AI may also facilitate new forms of cooperation; those might be needed to cope with new challenges or the new pace introduced by AI, or may result in an improvement over the status quo. Some versions of this:

  • We can deploy AI systems with novel forms of oversight, rather than merely representing some existing collective decision-making process. For example, we could imagine "merging utility functions" (as in Servant of Many Masters) or some kind of collective idealization (as in CEV).
  • Because some kinds of AI labor are very cheap, we could use coordination mechanisms that are currently impractical---e.g. our AI systems could write many more contracts and negotiate them with low latency; or we could vote directly on a much larger number of decisions made by the state; or we could reduce agency costs by engaging in more careful oversight more sparingly.

Whether or not (or for however long) the "default" is adequate to avoid existential catastrophe, it seems useful to use AI as an opportunity to improve our coordination. In some sense "most" of that work will presumably be done by AI systems, but doing the work ourselves may unlock those benefits much earlier. That may be critically important if the transition to AI creates a ton of chaos before we have AI systems who are much better than us at designing new cooperative arrangements. (This is fairly similar to the situation with alignment, where we could also wait to delegate the problem to AI but that may be too late to avoid trouble.)

comment by Quinn (quinn-dougherty) · 2021-06-27T19:40:36.913Z · LW(p) · GW(p)

I think both sets of bullets (multi-multi (eco?)systems either replicating cooperation-etc-as-we-know-it or making new forms of cooperation etc) are important, I think I'll call them prosaic cooperation and nonprosaic cooperation, respectively, going forward. When I say "cooperation etc." I mean cooperation, coordination, competition, negotiation, compromise.

You've provided crisp scenarios, so thanks for that!

In some sense "most" of that work will presumably be done by AI systems, but doing the work ourselves may unlock those benefits much earlier.

But if the AI does that work there will be an interpretability problem, an inferential distance. I'm imagining people ask a somewhat single-single aligned AI for solutions to multi-multi problems and the black box returns something inscrutable. Putting ourselves in a position where we can grok what its recommendations are seems aligned with researching it for ourselves so we won't have to ask the black box in the first place, though this probably only applies to prosaic cooperation.

answer by Connor Leahy (NPCollapse) · 2021-06-27T18:57:51.243Z · LW(p) · GW(p)

I haven't read Critch in-depth, so I can't guarantee I'm pointing towards the same concept he is. Consider this a bit of an impromptu intuition dump, this might be trivial. No claims on originality of any of these thoughts and epistemic status "¯\_(ツ)_/¯"

The way I currently think about it is that multi-multi is the "full hard problem", and single-single is a particularly "easy" (still not easy) special case. 

In a way we're making some simplifying assumptions in the single-single case. That we have one (pseudo-cartesian) "agent" that has some kind of definite (or at least bounded-ly complicated) values that can be expressed. This means we kind of have "just" the usual problems of a) expressing/extracting/understanding the values, in so far as that is possible (outer alignment) and b) making sure the agent actually fulfills those values (inner alignment).

Multi principals then relaxes this assumption into saying we don't have a "single" function, but multiple, which introduces another "necessary ingredient": Some kind of social choice theory "synthesis function", that can take in all the individual functions and spit out a "super utility function" that represents some morally acceptable amalgamation of the other functions (whatever that means). The single case is a simpler special case in that the synthesis function is the equivalent of the identity function, but that no longer works if you have multiple inputs.

In a very simplistic sense, multi is "harder" because we are introducing an additional "degree of freedom". So you might argue we have outer alignment, inner alignment and "even-more-outerer alignment" or "multi-outer alignment" (which would be the synthesis problem), and you probably have to make hard (potentially irreconcilable) moral choices for at least the latter (probably for all).

In multi-multi, if the agents serve (or have different levels of alignment towards) different subsets of principals, this would then add the additional difficulty of game theory between the different agents and how they should coordinate. We can call that the "multi-inner alignment problem" or something, the question of how to get the amalgamation of competing agents to be "inner aligned" and not blow everything up and getting stuck in defect-defect spirals or whatever. (This reminds me a lot of what CLR works on)

I tbh am not sure if single-multi would be harder/different from single-single just "applied multiple times". Maybe if the agents have different ideas of what the principal wants they could compete, but that seems like a failure of outer alignment, but maybe it would be better cast as a kind of failure of "multi-inner alignment".

So in summary I think solutions (in so far as such a thing even exists in an objective fashion, which it may or may not) to the multi-multi problem are a superset of solutions to multi-single,  single-multi and single-single. Vaguely, outer alignment = normativity/value learning, inner alignment = principal agent problem, multi-outer alignment = social choice, multi-inner alignment = game theory, and you need to solve all four to solve multi-multi. If you make certain simplifying assumptions which correspond to introducing "singles", you can ignore one or more of these (i.e. a single agent doesn't need game theory, a single principal doesn't need social choice).

Or something. Maybe the metaphor is too much of a stretch and I'm seeing spurious patterns.

comment by Quinn (quinn-dougherty) · 2021-06-27T19:18:27.595Z · LW(p) · GW(p)

I wrote out the 2x2 grid you suggested in MS paint Connor's Geometry

I'm not sure I'm catching how multi-inner is game theory. Except that I think "GT is the mesa- of SCT" is an interesting, reasonable (to me) claim that is sort of blowing my mind as I contemplate it, so far.

2 comments

Comments sorted by top scores.

comment by Morgan_Rogers · 2021-06-27T18:19:47.965Z · LW(p) · GW(p)

Critch's comments support an opinion I've held since I started thinking seriously about alignment: that the language we use to describe it is too simple, and ignores the fact that "human" interests (the target of alignment) are not the monolith they're usually presented as.

For your specific question about multi-multi, I only have limited access to the memeplex, so I'll just share my thoughts. Multi-multi delegation involves:
1. Compromise / resolution of conflicts of interest between delegators.
2. Mutual trust in delegators regarding communication of interests to delegatees.
3. Equitable control between delegators. This could be lumped in with conflicts of interest, but deserves special attention.
4. Capacity for communication and cooperation between delegatees.

... and some other aspects I haven't thought of. As far as I can see, though, the most important issues here would be addressed by consideration of single-multi and multi-single; multi-multi-specific problems will only be relevant when there are obstacles to communication between either delegators or delegatees (a conceivable future problem, but not a problem as long as the complexity of systems actually being constructed is limited).

Replies from: quinn-dougherty
comment by Quinn (quinn-dougherty) · 2021-06-27T18:52:48.025Z · LW(p) · GW(p)

Thanks! Trust, compromise, and communication are all items in Dafoe et. al. 2020, if you're interested in exploring. I agree that primitive forms of these issues are present in multi-single and single-multi, it's not clear to me whether we should think of solving these primitive forms then solving some sort of extension to multi-multi or if we should think of attacking problems that are unique to multi-multi directly. It's just not clear to me which of those better reflects the nature of what's going on.