A case for fairness-enforcing irrational behavior

post by cousin_it · 2024-05-16T09:41:30.660Z · LW · GW · 3 comments

Contents

3 comments

There's a long-standing and possibly unsolvable puzzle about how AIs should behave in game-theoretic situations with each other. The simplest example is the Ultimatum Game, where player A proposes how a dollar should be split between A and B, and B either accepts or rejects. In case of rejection both A and B get nothing. There are many Nash equilibria, one for each possible split, making the game indeterminate.

You can put all kinds of complexities on top of the game, like making A and B computer programs that can make conclusions about each other, but the essential nature of indeterminacy remains: the players have to pick a point on the Pareto frontier, their interests being directly opposed, making it a tug-of-war. The game is so simple that any complicated analysis seems almost hopeless.

However, when people play this game in reality, it seems that they bring in other considerations, not just choose what's best for them. The person being offered 20% of the pot will often reject. The reason for such behavior seems to come from a notion of fairness.

This points the way to how AIs could solve the puzzle as well. Imagine you're an AI forced to play some complicated ultimatum-type game with another AI. Then you could ignore the strategic picture of the game entirely, and focus only on what outcome seems "fair", in the sense that you and the other player get about equal amount of whuffies (however understood). And if the other player offers you an unfair deal, you could "flip the table" and make them get nothing, even at cost to you. As long as the "flip the table" option is available to you, this seems a viable approach.

Maybe this is a very simple idea, but it flips my understanding of game-theoretic situations on its head. Until today I thought that the game matrix, what actions are available to players, was the important part. And things like each player's utility scaling were merely afterthoughts. But under the "fairness" view, the figure and the ground invert. Now we care only about comparing the players' utilities, making sure everyone gets roughly equal amount of whuffies. The particular strategic details of each game matter less: as long as each player has access to a "flip the table" strategy, and is willing to use that strategy irrationally when the outcome seems unfair, that's enough.

Of course this can fail if the two players have incompatible views on fairness. For example, if player A thinks "taller people should get more food" and player B thinks "heavier people should get more food", and A is taller but B is heavier, the result is a food fight. So the focus switches even deeper: we no longer think about rational behavior in games, nor about fairness according to players, but about the processes that give rise to notions of fairness in players, and how to make these processes give compatible results.

What would that mean for negotiation between AIs in practice? Let's say a human-built AI travelling between the stars meets an alien AI, and they end up in an ultimatum-type situation regarding the fate of the entire universe. And further, imagine that the alien AI has the upper hand. But the human AI can still be coded to act like this:

  1. Does the situation contain another agent getting whuffies from it?

  2. Is the other agent acting as to give unfairly high whuffies to itself and unfairly low whuffies to me?

  3. Do I have access to a "flip the table" action, denying whuffies to the other agent even at cost to myself?

  4. If yes, take it!

Note that this is technically irrational. If the alien AI came with a precommitment of its own saying "demand all whuffies no matter what", the rational thing would be for us to accept, yet we still reject. I think however that this approach has a nice quality to it: it cuts short the arms race. We could have everyone in the universe spending time to make their AIs better at ultimatum tug-of-wars; or we could make an "irrational" AI that simply goes for fairness no matter what, then there's no incentive for others to build better strategies, and the outcome ends up alright for everyone.

3 comments

Comments sorted by top scores.

comment by Measure · 2024-05-16T16:00:30.816Z · LW(p) · GW(p)

The typical algorithm I've seen for enforcing fairness is to reject unfair offers randomly with some probability such that the counterparty's EV decreases with increasing unfairness of the offer. This incentivizes fair offers without completely burning the possibility of partial cooperation between agents with slightly differing notions of fairness.

Replies from: cousin_it
comment by cousin_it · 2024-05-16T16:24:17.381Z · LW(p) · GW(p)

Yeah, this works.

I'm a bit torn on where fairness should be properly placed. For example, if Alice is the only one deciding and Bob has no power to punish her at all, it seems like fairness should still come into Alice's consideration. So maybe it should be encoded into the utility function, not into the strategic behavior running on top of it. But that would mean we need to take actions that benefit random aliens while going about our daily lives, and I'm not sure we want that.

Replies from: Measure
comment by Measure · 2024-05-16T16:29:12.921Z · LW(p) · GW(p)

You might precommit to fairness if you don't know which side of the game you'll be playing or if you anticipate being punished by onlookers, but I don't know if I want my AI to be "fair" to an alien paperclipper that can't retaliate.