Anything you can do with n AIs, you can do with two (with directly opposed objectives)

post by jessicata (jessica.liu.taylor) · 2016-05-04T23:14:31.000Z · LW · GW · 2 comments

Contents

2 comments

Summary: For any normal-form game, it's possible to cast the problem of finding a correlated equilibrium in this game as a 2-player zero-sum game. This seems useful because zero-sum games are easy to analyze and more resistant to collusion.


Consider the following class of games (equivalent to the class of normal-form games):

There are players. Player 's action set is . After each player chooses their action , the state results from these actions (perhaps stochastically), and each player receives utility .

Often, we are interested in finding the Nash equilibrium of a game like this. One strategy for this is to instantiate the players as agents. However, this could cause collusion; see here and here for previous writing on collusion. Zero-sum games seem more resistant to collusion (although maybe not 100% resistant). Additionally, 2-player zero-sum games are typically easier to reason about than -player games.

So we might be interested in finding the Nash equilibrium using a zero-sum game. I don't actually know how to find a mixed Nash equilibrium, so instead I'll present a strategy for finding a correlated equilibrium (a superset of Nash equilibria which are computationally easier to find). Here's how it works:

  1. The actor chooses actions .
  2. The critic chooses a player index , observes the action , and suggests an alternative action .
  3. Flip a fair coin. If it comes up heads, observe the state that results from actions , and give the actor utility .
  4. If it comes up tails, observe the state that results from actions , and give the actor utility .
  5. Either way, the critic's utility is the negation of the actor's utility.

It will be useful to use the concept of an -correlated equilibrium. While a correlated equilibrium is where no player can gain any expected utility by strategy modification, an -correlated equilibrium is where no player can gain more than expected utility by strategy modification.

Note that the critic's policies correspond to mixtures of strategy modifications; the critic can be seen as jointly picking a player and a strategy modification for the player. Furthermore, the critic's expected utility is half the expected utility gained by the corresponding player for the average strategy modification in this mixture:

because the critic's expected utility is half the difference between player 's expected utility given strategy modification () and player 's expected utility given no strategy modification (). Some facts result:

  1. Suppose the actor chooses from some joint distribution that is an -correlated equilibrium of the original game. Then the actor's expected utility is at least regardless of the critic's policy.
  2. Suppose the actor chooses from some joint distribution that is not an -correlated equilibrium of the original game. Then the critic's best response results in a utility of no more than for the actor.

Correlated equilibria always exist, so at a Nash equilibrium in the zero-sum game, the actor always outputs a correlated equilibrium and gets expected utility 0.

Perhaps in real life, it is inconvenient to observe the state resulting from actions , because we can only observe the state by outputting actions, and maybe we always want to output actions from a correlated equilibrium. In this case we could use counterfactual oversight to usually output , but run the procedure above occasionally to gather training data. It's not clear when it's acceptable to occasionally output strategy-modified action profiles (instead of action profiles from a correlated equilibrium).

2 comments

Comments sorted by top scores.

comment by Stuart_Armstrong · 2016-05-10T10:10:15.000Z · LW(p) · GW(p)

We have to be careful that the game is really zero-sum. Some setups, with reward signals, seem zero sum but if the AI's hack it, could become positive sum.

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2016-05-10T19:30:09.000Z · LW(p) · GW(p)

This is true.