Strategic implications of AIs' ability to coordinate at low cost, for example by merging

post by Wei_Dai · 2019-04-25T05:08:21.736Z · score: 49 (19 votes) · LW · GW · 35 comments

It seems likely to me that AIs will be able to coordinate with each other much more easily (i.e., at lower cost and greater scale) than humans currently can, for example by merging into coherent unified agents by combining their utility functions. This has been discussed at least since 2009 [LW · GW], but I'm not sure its implications have been widely recognized. In this post I talk about two such implications that occurred to me relatively recently.

I was recently reminded of this quote from Robin Hanson's Prefer Law To Values:

The later era when robots are vastly more capable than people should be much like the case of choosing a nation in which to retire. In this case we don’t expect to have much in the way of skills to offer, so we mostly care that they are law-abiding enough to respect our property rights. If they use the same law to keep the peace among themselves as they use to keep the peace with us, we could have a long and prosperous future in whatever weird world they conjure. In such a vast rich universe our “retirement income” should buy a comfortable if not central place for humans to watch it all in wonder.

Robin argued that this implies we should work to make it more likely that our current institutions like laws will survive into the AI era. But (aside from the problem that we're most likely still incurring astronomical waste even if many humans survive "in retirement"), assuming that AIs will have the ability to coordinate amongst themselves by doing something like merging their utility functions, there will be no reason to use laws (much less "the same laws") to keep peace among themselves. So the first implication is that to the extent that AIs are likely to have this ability, working in the direction Robin suggested would likely be futile.

The second implication is that AI safety/alignment approaches that aim to preserve an AI's competitiveness must also preserve its ability to coordinate with other AIs, since that is likely an important part of its competitiveness. For example, making an AI corrigible in the sense of allowing a human to shut it (and its successors/subagents) down or change how it functions would seemingly make it impossible for this AI to merge with another AI that is not corrigible, or not corrigible in the same way. (I've mentioned this a number of times in previous comments, as a reason why I'm pessimistic about specific approaches, but I'm not sure if others have picked up on it, or agree with it, as a general concern, which partly motivates this post.)

Questions: Do you agree AIs are likely to have the ability to coordinate with each other at low cost? What other implications does this have, especially for our strategies for reducing x-risk?


Comments sorted by top scores.

comment by cousin_it · 2019-04-26T09:25:04.921Z · score: 8 (4 votes) · LW · GW

A big obstacle to human cooperation is bargaining: deciding how to split the benefit from cooperation. If it didn't exist, I think humans would cooperate more. But the same obstacle also applies to AIs. Sure, there's a general argument that AIs will outperform humans at all tasks and bargaining too, but I'd like to understand it more specifically. Do you have any mechanisms in mind that would make bargaining easier for AIs?

comment by Wei_Dai · 2019-04-26T11:00:42.304Z · score: 4 (2 votes) · LW · GW

A big obstacle to human cooperation is bargaining: deciding how to split the benefit from cooperation. If it didn’t exist, I think humans would cooperate more.

Can you give some examples of where human cooperation is mainly being stopped by difficulty with bargaining? It seems to me like enforcing deals is usually the bigger part of the problem. For example in large companies there are a lot of inefficiencies like shirking, monitoring costs to reduce shirking, political infighting, empire building, CYA, red tape, etc., which get worse as companies get bigger. It sure seems like enforcement (i.e., there's no way to enforce a deal where everyone agrees to stop doing these things) rather than bargaining is the main problem there.

Or consider the inefficiencies in academia, where people often focus more on getting papers published than working on the most important problems. I think that's mainly because an agreement to reward people for publishing papers is easily enforceable and while an agreement to reward people for working on the most important problems isn't. I don't see how improved bargaining would solve this problem.

Do you have any mechanisms in mind that would make bargaining easier for AIs?

I haven't thought about this much, but perhaps if AIs had introspective access to their utility functions, that would make it easier for them to make use of formal bargaining solutions that take utility functions as inputs? Generally it seems likely that AIs will be better at bargaining than humans, for the same kind of reason as here [LW · GW], but AFAICT just making enforcement easier would probably suffice to greatly reduce coordination costs.

comment by cousin_it · 2019-04-26T11:41:59.388Z · score: 4 (2 votes) · LW · GW

Can you give some examples of where human cooperation is mainly being stopped by difficulty with bargaining?

Two kids fighting over a toy; a married couple arguing about who should do the dishes; war.

But now I think I can answer my own question. War only happens if two agents don't have common knowledge about who would win (otherwise they'd agree to skip the costs of war). So if AIs are better than humans at establishing that kind of common knowledge, that makes bargaining failure less likely.

comment by Wei_Dai · 2019-04-26T12:16:51.027Z · score: 8 (3 votes) · LW · GW

War only happens if two agents don’t have common knowledge about who would win (otherwise they’d agree to skip the costs of war).

But that assumes strong ability to enforcement agreements (which humans typically don't have). For example suppose it's common knowledge that if countries A and B went to war, A would conquer B with probability .9 and it would cost each side $1 trillion. If they could enforce agreements, then they could agree to roll a 10-sided die in place of the war and save $1 trillion each, but if they couldn't, then A would go to war with B anyway if it lost the roll, so now B has a .99 probability of being taken over. Alternatively maybe B agrees to be taken over by A with certainty but get some compensation to cover the .1 chance that it doesn't lose the war. But after taking over B, A could just expropriate all of B's property including the compensation that it paid.

comment by Larks · 2019-05-02T20:44:47.325Z · score: 2 (1 votes) · LW · GW

War only happens if two agents don’t have common knowledge about who would win (otherwise they’d agree to skip the costs of war).

They might also have poorly aligned incentives, like a war between two countries that allows both governments to gain power and prestige, at the cost of destruction that is borne by the ordinary people of both countries. But this sort of principle-agent problem also seems like something AIs should be better at dealing with.

comment by Bunthut · 2019-04-26T12:47:36.360Z · score: 1 (1 votes) · LW · GW

Not only of who would win, but also about the costs it would have. I think the difficulty in establishing common knowledge about this is in part due to people traing to deceive each other. Its not clear that the ability to see through deception improves faster than the ability to deceive with increasing intelligence.

comment by ryan_b · 2019-04-25T17:38:34.266Z · score: 8 (3 votes) · LW · GW

If there's a lot of coordination among AI, even if only through transactions, I feel like this implies we would need to add "resources which might be valuable to other AIs" to the list of things we can expect any given AI to instrumentally pursue.

comment by ricraz · 2019-05-22T15:25:37.118Z · score: 7 (4 votes) · LW · GW

I'd like to push back on the assumption that AIs will have explicit utility functions. Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that's difficult to formalise (e.g. somewhere within a neural network).

It may also be the case that coordination is much harder for AIs than for humans. For example, humans are constrained by having bodies, which makes it easier to punish defection - hiding from the government is tricky! Our bodies also make anonymity much harder. Whereas if you're a piece of code which can copy itself anywhere in the world, reneging on agreements may become relatively easy. Another reasion why AI cooperation might be harder is simply that AIs will be capable of a much wider range of goals and cognitive processes than humans, and so they may be less predictable to each other and/or have less common ground with each other.

comment by Wei_Dai · 2019-05-23T04:42:57.871Z · score: 7 (3 votes) · LW · GW

I’d like to push back on the assumption that AIs will have explicit utility functions.

Yeah I was expecting this, and don't want to rely too heavily on such an assumption, which is why I used "for example" everywhere. :)

Even if you think that sufficiently advanced AIs will behave in a utility-maximising way, their utility functions may be encoded in a way that’s difficult to formalise (e.g. somewhere within a neural network).

I think they don't necessarily need to formalize their utility functions, just isolate the parts of their neural networks that encode those functions. And then you could probably take two such neural networks and optimize for a weighted average of the outputs of the two utility functions. (Although for bargaining purposes, to determine the weights, they probably do need to look inside the neural networks somehow and can't just treat them as black boxes.)

Or are you're thinking that the parts encoding the utility function are so intertwined with the rest of the AI that they can't be separated out, and the difficulty of doing that increases with the intelligence of the AI so that the AI remains unable to isolate its own utility function as it gets smarter? If so, it's not clear to me why there wouldn't be AIs with cleanly separable utility functions that are nearly as intelligent, which would outcompete AIs with non-separable utility functions because they can merge with each other and obtain the benefits of better coordination.

It may also be the case that coordination is much harder for AIs than for humans.

My reply to Robin Hanson [LW · GW] seems to apply here. Did you see that?

comment by ricraz · 2019-05-22T15:19:14.045Z · score: 6 (3 votes) · LW · GW

This paper by Critch is relevant: it argues that agents with different beliefs will bet their future share of a merged utility function, such that it skews towards whoever's predictions are more correct.

comment by Wei_Dai · 2019-05-22T20:40:18.361Z · score: 5 (2 votes) · LW · GW

I had read that paper recently, but I think we can abstract away from the issue by saying that (if merging is a thing for them) AIs will use decision procedures that are "closed under" merging, just like we currently focus attention on decision procedures that are "closed under" self-modification. (I suspect that modulo logical uncertainty, which Critch's paper also ignores, UDT might already be such a decision procedure, in other words Critch's argument doesn't apply to UDT, but I haven't spent much time thinking about it.)

comment by RobinHanson · 2019-04-25T15:36:58.397Z · score: 5 (3 votes) · LW · GW

The claim that AI is vastly better at coordination seems to me implausible on its face. I'm open to argument, but will remain skeptical until I hear good arguments.

comment by Wei_Dai · 2019-04-26T03:00:20.921Z · score: 11 (5 votes) · LW · GW

Have you considered the specific mechanism that I proposed, and if so what do you find implausible about it? (If not, see this longer post [LW · GW] or this shorter comment [LW · GW].)

I did manage to find a quote from you that perhaps explains most of our disagreement on this specific mechanism:

There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination.

Can you elaborate on what these other factors are? It seems to me that most coordination costs in the real world come from value differences, so it's puzzling to see you write this.

Abstracting away from the specific mechanism, as a more general argument, AI designers or evolution will (sooner or later) be able to explore a much larger region of mind design space than biological evolution could. Within this region there are bound to be minds much better at coordination than humans, and we should certainly expect coordination ability to be one objective that AI designers or evolution will optimize for since it offers a significant competitive advantage.

This doesn't guarantee that the designs that end up "winning" will have much better coordination ability than humans because maybe the designers/evolution will be forced to trade off coordination ability for something else they value, to the extent that the "winner" don't coordinate much better than humans, but that doesn't seem like something we should expect to happen by default, without some specific reason to, and it becomes less and less likely as more and more of mind design space is explored.

comment by ryan_b · 2019-04-25T17:30:59.214Z · score: 5 (3 votes) · LW · GW

It seems to me that computers don't suffer from most of the constraints humans do. For example, AI can expose its source code and its error-less memory. Humans have no such option, and our very best approximations are made of stories and error-prone memory.

They can provide guarantees which humans cannot, simulate one another within precise boundaries in a way humans cannot, calculate risk and confidence levels in a way humans cannot, communicate their preferences precisely in a way humans cannot. All of this seems to point in the direction of increased clarity and accuracy of trust.

On the other hand, I see no reason to believe AI will have the strong bias in favor of coordination or trust that we have, so it is possible that clear and accurate trust levels will make coordination a rare event. That seems off to me though, because it feels like saying they would be better off working alone in a world filled with potential competitors. That statement flatly disagrees with my reading of history.

comment by Rana Dexsin · 2019-04-25T17:49:15.447Z · score: 5 (3 votes) · LW · GW

Extending this: trust problems could impede the flow of information in the first place in such a way that the introspective access stops being an amplifier across a system boundary. An AI can expose some code, but an AI that trusts other AIs to be exposing their code in a trustworthy fashion rather than choosing what code to show based on what will make the conversation partner do something they want seems like it'd be exploitable, and an AI that always exposes its code in a trustworthy fashion may also be exploitable.

Human societies do “creating enclaves of higher trust within a surrounding environment of lower trust” a lot, and it does improve coordination when it works right. I don't know which way this would swing for super-coordination among AIs.

comment by habryka (habryka4) · 2019-04-25T17:24:05.270Z · score: 4 (2 votes) · LW · GW

What evidence would convince you otherwise? Would superhuman performance in games that require difficult coordination be compelling?

Deepmind has outlined Hanabi as one of the next games to tackle:

comment by gwern · 2019-07-08T17:06:16.576Z · score: 7 (3 votes) · LW · GW

Avalon is another example, with better current performance.

comment by Dagon · 2019-04-25T17:47:29.050Z · score: 2 (1 votes) · LW · GW

As a subset of the claim that AI is vastly better at everything, being vastly better at coordination is plausible. The specific arguments that AI somehow has (unlike any intelligence or optimization process we know of today) introspection into it's "utility function" or can provide non-behavioral evidence of it's intent to similarly-powerful AIs seem pretty weak.

I haven't seen anyone attempting to model shifting equilibria and negotiation/conflict among AIs (and coalitions of AIs and of AIs + humans) with differing goals and levels of computational power, so it seems pretty unfounded to speculate on how "coordination" as a general topic will play out.

comment by MakoYass · 2019-04-26T05:24:39.709Z · score: 1 (1 votes) · LW · GW

I'd expect a designed thing to have much cleaner, much more comprehensible internals. If you gave a human a compromise utility function and told them that it was a perfect average of their desires (or their tribe's desires) and their opponents' desires, they would not be able to verify this, they wouldn't recognise their utility function, they might not even individually possess it (again, human values seem to be a bit distributed), and they would be inclined to reject a fair deal, humans tend to see their other only in extreme shades, more foreign than they really are.

Do you not believe that an AGI is likely to be self-comprehending? I wonder, sir, do you still not anticipate foom? Is it connected to that disagreement?

comment by shminux · 2019-04-25T06:01:14.884Z · score: 4 (4 votes) · LW · GW

As usual, a converse problem is likely to give useful insights. I would approach this issue from the other direction: what prevents humans from coordinating with each other at low cost? Do we expect AIs to have similar traits/issues?

comment by Wei_Dai · 2019-04-25T07:21:27.778Z · score: 15 (7 votes) · LW · GW

One possible way for AIs to coordinate with each other is for two or more AIs to modify their individual utility functions into some compromise utility function, in a mutually verifiable way, or equivalently to jointly construct a successor AI with the same compromise utility function and then hand over control of resources to the successor AI. This simply isn't something that humans can do.

comment by totallybogus · 2019-04-25T23:33:34.916Z · score: 8 (4 votes) · LW · GW

modify their individual utility functions into some compromise utility function, in a mutually verifiable way, or equivalently to jointly construct a successor AI with the same compromise utility function and then hand over control of resources to the successor AI

This is precisely equivalent to Coasean efficiency, FWIW - indeed, correspondence with some "compromise" welfare function is what it means for an outcome to be efficient in this sense. It's definitely the case that humans, and agents more generally, can face obstacles to achieving this, so that they're limited to some constrained-efficient outcome - something that does maximize some welfare function, but only after taking some inevitable constraints into account!

(For instance, if the pricing of some commodity, service or whatever is bounded due to an information problem, so that "cheap" versions of it predominate, then the marginal rates of transformation won't necessarily be equalized across agents. Agent A might put her endowment towards goal X, while agent B will use her own resources to pursue some goal Y. But that's a constraint that could in principle be well-defined - a transaction cost. Put them all together, and you'll understand how these constraints determine what you lose to inefficiency - the "price of anarchy", so to speak.)

comment by MakoYass · 2019-04-26T23:50:22.218Z · score: 1 (1 votes) · LW · GW

Strong upvote, very good to know

Agent A might put her endowment towards goal X, while agent B will use her own resources to pursue some goal Y

I internalised the meaning of these variables only to find you didn't refer to them again. What was the point of this sentence.

comment by Rana Dexsin · 2019-04-25T17:41:09.673Z · score: 7 (3 votes) · LW · GW

But jointly constructing a successor with compromise values and then giving them the reins is something humans can sort of do via parenting, there's just more fuzziness and randomness and drift involved, no? That is, assuming human children take a bunch of the structure of their mindsets from what their parents teach them, which certainly seems to be the case on the face of it.

comment by Wei_Dai · 2019-04-26T04:06:44.016Z · score: 8 (4 votes) · LW · GW

Yes, but humans generally hand off resources to their children as late as possible (whereas the AIs in my scheme would do so as soon as possible) which suggests that coordination is not the primary purpose for humans to have children.

comment by MakoYass · 2019-04-25T22:50:05.928Z · score: 5 (4 votes) · LW · GW

I'm pretty sure nobility frequently arranged marriages to do exactly this, for this purpose, to avoid costly conflicts.

comment by Kaj_Sotala · 2019-04-25T06:31:34.918Z · score: 14 (5 votes) · LW · GW

Humans tend to highly value their own personal experiences - getting to do things that feel fun, acquiring personal status, putting intrinsic value on their own survival, etc. This limits the extent to which they'll co-operate with a group, since their own interests and the interests of the group are only partially the same. AIs with less personal interests would be better incentivized to coordinate - if you only care about the amount of paperclips in the universe, you will be able to better further that goal with others than if each AI was instead optimizing for the amount of paperclips that they personally got to produce.

Some academics argue that religion etc. evolved for the purpose of suppressing personal interests and giving them a common impersonal goal, partially getting around this problem. I discussed this and its connection to these matters a bit in my old post Intelligence Explosion vs. Co-operative Explosion [LW · GW].

comment by michaelcohen (cocoa) · 2019-04-30T14:47:38.455Z · score: 1 (1 votes) · LW · GW

Have you thought at all about what merged utility function two AI's would agree on? I doubt it would be of the form .

comment by Larks · 2019-05-02T20:51:24.945Z · score: 10 (2 votes) · LW · GW

Critch wrote a related paper:

Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs.Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naive linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.

Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making

comment by Wei_Dai · 2019-04-30T16:04:20.017Z · score: 5 (2 votes) · LW · GW

Stuart Armstrong wrote a post [LW · GW] that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn't take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I'm curious what your own reason for doubting it is.

comment by michaelcohen (cocoa) · 2019-05-01T00:12:04.433Z · score: 5 (3 votes) · LW · GW

One utility function might turn out much easier to optimize than the other, in which case the harder-to-optimize one will be ignored completely. Random events might influence which utility function is harder to optimize, so one can't necessarily tune in advance to try to take this into account.

One of the reasons was the problem of positive affine scaling preserving behavior, but I see Stuart addresses that.

And actually, some of the reasons for thinking there would be more complicated mixing are going away as I think about it more.

EDIT: yeah if they had the same priors and did unbounded reasoning, I wouldn't be surprised anymore if there exists a that they would agree to.

comment by avturchin · 2019-04-26T09:39:42.874Z · score: 1 (3 votes) · LW · GW

This may fall in the the fallowing type of reasoning: "Superinteligent AI will be super in any human capability X. Human can cooperate. Thus SAI will have superhuman capability to cooperate."

The problem of such conjecture is that if we take an opposite human quality not-X, SAI will also have superhuman capability in it. For example, if X= cheating, then superintelligent AI will have superhuman capability in cheating.

However, SAI can't be simultaneously super-cooperator and super-cheater.

comment by Wei_Dai · 2019-04-26T11:14:58.771Z · score: 3 (1 votes) · LW · GW

I think superintelligent AI will probably have superhuman capability at cheating in an absolute sense, i.e., they'll be much better than humans at cheating humans. But I don't see a reason to think they'll be better at cheating other superintelligent AI than humans are at cheating other humans, since SAI will also be superhuman at detecting and preventing cheating.

comment by avturchin · 2019-04-26T12:06:04.719Z · score: 2 (1 votes) · LW · GW

But 10 000 IQ AI can cheat 1000 IQ AI? If yes, only equally powerful AIs will cooperate.

comment by avturchin · 2019-04-25T09:34:04.945Z · score: 0 (2 votes) · LW · GW

AI could have a superhuman capability to find win-win solutions and sell it as a service to humans in form of market arbitrage, courts, partner matching (e.g; Tinder).

Based on this win-win solutions finding capability, AI will not have to "take over the world" - it could negotiate its way to global power, and everyone will win because of it (at leat, initially).