If you don't know the name of the game, just tell me what I mean to you

stuart_armstrong

If you don't know the name of the game, just tell me what I mean to you

post by Stuart_Armstrong · 2010-10-26T13:43:57.762Z · LW · GW · Legacy · 26 comments

  Multiple games
  Invariance
  Kalai-Smorodinsky and Nash's revenge
  Edit: the mystery of µ (mathy, technical, and not needed to understand the rest of the post)
None
26 comments

Following: Let's split the Cake

tl;dr: Both the Nash Bargaining solution (NBS), and the Kalai-Smorodinsky Bargaining Solution (KSBS), though acceptable for one-off games that are fully known in advance, are strictly inferior for independent repeated games, or when there exists uncertainty as to which game will be played.

Let play a bargaining game, you and I. We can end up with you getting €1 and me getting €3, both of us getting €2, or you getting €3 and me getting €1. If we fail to agree, neither of us gets anything.

Oh, and did I forget to mention that another option was for you to get an aircraft carrier and me to get nothing?

Think of that shiny new aircraft carrier, loaded full with jets, pilots, weapons and sailors; think of all the things you could do with it, all the fun you could have. Places to bomb or city harbours to cruise majestically into, with the locals gaping in awe at the sleek powerful lines of your very own ship.

Then forget all about it, because Kalai-Smorodinsky says you can't have it. The Kalai-Smorodinsky bargaining solution to this game is 1/2 of a chance of getting that ship for you, and 1/2 of a chance of getting €3 for me (the Nash Bargaining Solution is better, but still not the best, as we'll see later). This might be fair; after all, unless you have some way of remunerating me for letting you have it, why should I take a dive for you?

But now imagine we are about to start the game, and we don't know the full rules yet. We know about the €'s involved, that's all fine, we know there will be an offer of an aircraft carrier; but we don't know who is going to get the offer. If we wanted to decide on our bargaining theory in advance, what would we do?

Obviously not use the KSBS; it gives 1/4 of an aircraft carrier to each player. One excellent solution is simple: whoever has the option of the aircraft carrier... gets it. This gives us both an expected gain of 1/2 aircraft carrier before the game begins, superior to the other approaches.

Let formalise this uncertainty over possible games. A great game (GG) is a situation where we are about to play one of games g_j, each with probability p_j. My utility function is U₁, yours is U₂. Once we choose a bargaining equilibrium which results in outcomes o_j with utilities (a_j,b_j) for each game, then our expected utility gain is (Σp_j a_j, Σp_j b_j)=Σp_j(a_j, b_j) Since the outcome utilities for each games are individually convex, the set S of possible outcome (expected) utilities for the GG are also convex.

Now, we both want a bargaining equilibrium that will be Pareto optimal for GG. What should we do? Well, first, define:

A μ-sum maximising bargaining solution (μSMBS) is one that involves maximising the expectation of (U₁+µU₂) in each game, for a positive μ.

We'll extend the definition to μ=∞ by setting that to be the bargaining solution that involves maximising U₂. Now this definition isn't complete; there are situations where maximising (U₁+µU₂) doesn't give a single solution (such as those where µ=1, we have to split €10 between us, and we both have utilities where 1 utiliton=1€). These situations are rare, but we'll assume here that any μSMBS comes complete with a tie-breaker method for selecting unique solutions in these cases.

The first (major) result is:

For any positive µ, μSMBS is Pareto-optimal for any GG.

To prove this, let o_j be the outcomes for game g_j, using μSMB, with expected utilities (a_j,b_j). The GG has expected utilities (a,b) =∑p_j (a_j,b_j). Let fµ be the function that maps (x,y) to x+µy. The μSMBS is equivalent with maximising the value of fµ for each game.

So now let q_j be another possible outcome set, and expected utility (c,d), assume to be strictly superior, for both players, to (a,b). Now, because µ is positive, (c,d) > (a,b) implies c>a and µd>µb, so implies fµ(c,d) > fµ(a,b). However, by the definition of μSMBS, we must have fµ(a_j,b_j) ≥ fµ(c_j,d_j). Since fµ is linear, fµ(a,b)=∑p_j fµ(a_j,b_j) ≥ ∑p_j fµ(c_j,d_j) = fµ(c,d). This contradicts the assumption that (c,d) > (a,b), and hence proves that (a,b) is Pareto-optimal.

This strong result has a converse, namely:

For any bargaining solution that is Pareto-optimal for a given GG, there is a μSMBS that produces the same outcome, in each game g_j.

Let o_j be the outcomes for a given Pareto-optimal bargaining solution, with expected utilities (a_j,b_j), and GG having expected utilities (a,b) =∑p_j (a_j,b_j). The set S of possible expected utilities for GG is convex, and since (a,b) is Pareto-optimal, it must lie on the boundary. Hence there exists a line L through (a,b) such that S lies entirely to the left of this line. Let -μ be the slope of L. Thus, there does not exist any (c,d) in the expected utilities outcomes with fµ(c,d) > fµ(a,b).

Now, if there were to exist an outcome q_k for the game g_k with expected utilities (c_k,d_k) such that fµ(c_k,d_k) > fµ(a_k,b_k), then the expected utility for the outcomes o_j with o_k replaced with p_k, would be (c,d) = (a,b) + q_k ((c_k,d_k) - (a_k,b_k)). This has fµ(c,d) > fµ(a,b), contradicting the previous result. Thus (a_j,b_j) always maximise fµ in g_j, and hence this bargaining solution produces the same results as μSMBS (with a given tie-breaking procedure, if needed).

So the best solution, if you are uncertain what the games are you could be playing, is to fix a common relative measure of value, and then maximise that, ignoring considerations of fairness or any other. To a crude approximation, capitalism is like this: every game is supposed to maximise money as much as it can.

Multiple games

For the moment, we've been considering a single game, with uncertainty as to which game is going to be played. The same result goes through, however, if you are expecting to play multiple games in a row. One caveats is needed: the games must be independent of each other.

The result holds, for instance, in two games with stakes €10, as long as our utilities are linear in these (small) amounts of money. It does not hold if the first game's stake is a left shoe and the second's is a right shoe, for there the utilities of the outcomes are correlated: if I win the left shoe, I'm much more likely to value the right shoe in the subsequent game.

Invariance

Maximising summed utility is invariant under translations (which just add a single constant to each utility). It is not, of course, invariant under scalings, and it would be foolish indeed to first decide on μ and then allow players to rescale their utilities. In general the μ is not a real number, but a linear isomorphism between the two utilities, invariantly defined by some process.

Kalai-Smorodinsky and Nash's revenge

So, it seems established. μSMBS is the way to go. KSBS and NBS are loser solutions, and should be discarded. As you'd imagine, it's not quite so simple...

The problem is, which µ? Fixing that µ is going to determine your expected utility for any GG, so it's an important value. And each player is going to have a different priority it, trying to minimise the contribution of the other agent's utility. So there will have to be some... bargaining. And that bargaining will be a one-shot bargaining deal, not a repeated one, so there is no superior way of going about it. Use KSBS or NBS or anything like that if you want; or set µ to reflect your joint valuing of a common currency ($, € or negentropy); but you can't get around the fact that you're going to have to fix that µ somehow. And if you use μ'SMBS to do so, you've just shifted the battle to the value of μ'...

Edit: the mystery of µ (mathy, technical, and not needed to understand the rest of the post)

There has been some speculation on the list as to the technical meaning of U₁+µU₂ and µ. To put this on an acceptable rigorous footing, let u₁ and u₂ be any two representatives of U₁ and U₂ (in that u₁ and u₂ are what we would normally term as "utility functions", and U₁ and U₂ are the set of utility functions that are related to these by affine transformations). Then µ is a function from the possible pairs (u1, u2) to the non-negative reals, with the property that it is equivariant under linear transformations of u₁ and inverse-equivariant under linear transformations of u₂ (in human speak: when u₁ gets scaled bigger, so does µ, and when u₂ gets scaled bigger, µ gets scaled smaller), and invariant under translations. Then we can define U₁+µU₂ as the set of utility functions for which u₁+µu₂ is a representative (the properties of µ make this well defined, independently of our choices of u₁ and u₂). Whenever µ≠0, there is a well defined µ^-1, with the property that µ^-1U₁+U₂ = U₁+µU₂. Then the case μ=∞ is defined to be μ^-1=0.

26 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2010-10-27T18:30:55.396Z · LW(p) · GW(p)

Your post seems to point out that one can consider mixed coordinated strategies on the global game (where in first round you are told which game you play, and in the second round you play it), with the set of payoffs thus obtained as the convex closure of pure strategy payoffs, in particular payoffs on Pareto frontier of the global game being representable as linear (convex) combination of payoffs on Pareto frontiers of individual games, and in an even more special case, this point applies to any notion of "fair" solution.

The philosophical point seems to be the same as in Counterfactual Mugging: you might want to always follow a strategy you'd (want to) choose before obtaining the knowledge you now possess (with that strategy itself being conditional, and to be used by passing the knowledge you now possess as parameter), in this case applied to knowledge about which game is being played. In other words, try respecting reflective consistency even if "it's already too late".

P.S.

In general the μ is not a real number, but a linear isomorphism between the two utilities, invariantly defined by some process.

"Isomorphism" (and "between") seems like a very wrong word to use here. Linear combination of two utilities, perhaps.

Replies from: Perplexed, Stuart_Armstrong, GuySrinivasan

↑ comment by Perplexed · 2010-10-28T00:59:43.923Z · LW(p) · GW(p)

In general the μ is not a real number, but a linear isomorphism between the two utilities, invariantly defined by some process.

"Isomorphism" (and "between") seems like a very wrong word to use here. Linear combination of two utilities, perhaps.

I suspect you misunderstand. The two isomorphic utilities (i.e. utility functions) are U2 and μU2. You seem to be referring to the linear combination of U1 and U2.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-10-28T09:11:49.142Z · LW(p) · GW(p)

I've added an addendum to the post, laying out what μ actually is.

Though the whole addendum could be summarised as: yes, μ is pretty much what you'd expect. :-)

↑ comment by Stuart_Armstrong · 2010-10-28T09:01:54.560Z · LW(p) · GW(p)

Yes, that's what Perplexed noticed. What seems interesting is that getting a Pareto optimal result in the GG forces both players to follow Counterfactual Mugging style reasoning.

I've added an addendum to the post, laying out what μ actually is.

↑ comment by SarahNibs (GuySrinivasan) · 2010-10-28T00:47:44.976Z · LW(p) · GW(p)

"Isomorphism" (and "between") seems like a very wrong word to use here. Linear combination of two utilities, perhaps.

I feel like mu isn't the really important part... it's more like mu = A x B where A encodes the translation from "one util" in U1 to "one util" in U2 and B encodes the relative amounts the agents matter in the deal. It seems like A is the bit that remains fairly constant across deals and short periods of time, while B can be differently bargained each time relative to the specific deal in question.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-10-28T09:14:12.593Z · LW(p) · GW(p)

If B differes over time, you'll have outcomes that are not Pareto optimal in total. Idealised utility maximising agents should establish μ once and for all at the beggining; each change in μ is paid for in decreased utility.

I'm not claiming that human agents do, or should, behave this way.

comment by PhilGoetz · 2011-01-08T22:13:15.446Z · LW(p) · GW(p)

So the best solution, if you are uncertain what the games are you could be playing, is to fix a common relative measure of value, and then maximise that, ignoring considerations of fairness or any other.

I didn't see how this follows from anything that comes before it. Best in what way? Best for yourself?

comment by Relsqui · 2010-10-27T09:00:57.371Z · LW(p) · GW(p)

It wouldn't help, but I keep imagining Solomon. "We shall cut the aircraft carrier in half ..."

comment by Armok_GoB · 2010-11-01T19:05:32.126Z · LW(p) · GW(p)

In some circumstances, like natural humans sharing an actual pie or romantic dinner etc. might dopamine amounts be a good µ currency?

comment by Arenamontanus · 2010-10-27T13:20:49.307Z · LW(p) · GW(p)

It seems that the bargaining for mu will be dependent on your priors about what games will be played. That might help fix the initial mu-bargaining.

comment by Perplexed · 2010-10-26T19:15:56.548Z · LW(p) · GW(p)

another option was for you to get an aircraft carrier and me to get nothing ... The Kalai-Smorodinsky bargaining solution to this game is 1/2 of a chance of getting that ship for me

Huh? This is a typo, right? Who has the chance at that carrier - you or me?

And then it goes downhill from there. ... With the final sentence admitting that the whole post was almost pointless.

Maybe it is just me, but I really dislike being dragged along on a journey when my "guide" has not clearly stated where we are going and how long it will take to get there.

If you have a plan here, tell us what it is. If you are providing a tutorial, please provide links to your source authority. If you are providing original research, please provide references to relevant existing research and indicate what you are doing that has not been done before.

Downvoted.

Replies from: Stuart_Armstrong, Mass_Driver, SilasBarta

↑ comment by Stuart_Armstrong · 2010-10-27T07:55:32.250Z · LW(p) · GW(p)

Maybe I'm too close to the subject, but I feel there are two major new results in here, neither of which I knew before, and neither of which I've found in the literature . They are:

1) NBS and KSBS are not pareto optimal when used in situations where the game is uncertain, or iterated independently.

2) Solutions that are Pareto-Optimal are all of a particular format, that involve linear isomorphisms between the two utilities (technically, classes of affine isomorphisms with the same linear part).

The last sentence doesn't admit the whole post was pointless; it admits the approach doesn't completely solve the bargaining problem. But neither do NBS and KSBS - if they did, there wouldn't be two of them, and they would be Pareto optimal.

If you are providing original research, please provide references to relevant existing research and indicate what you are doing that has not been done before.

I am providing original research, I haven't found any trace of this in the litterature (though I didn't do anything like a full litterature review), so can't provide references, and the whole of the results are my idea, so, as far as I'm aware, nothing of what I am doing has been done before.

Replies from: Perplexed

↑ comment by Perplexed · 2010-10-27T16:31:21.716Z · LW(p) · GW(p)

I feel there are two major new results in here, neither of which I knew before, and neither of which I've found in the literature . They are:

1) NBS and KSBS are not pareto optimal when used in situations where the game is uncertain, or iterated independently.

But you don't show this, you simply claim it without proof or explanation when you write:

If we wanted to decide on our bargaining theory in advance, what would we do? ... Obviously not use the KSBS; it gives 1/4 of an aircraft carrier to each player.

No explanation, no calculations. Maybe if you had written "a quarter carrier plus 3/4 euros" a reader could reconstruct your thinking. (And you don't provide even an example of the "iterated independently" part of the claim. Presumably, you are "iterating" with changes to the game payoffs between each iteration.) It also is extremely puzzling that in this posting you are saying that NBS and KSBS are not Pareto optimal when in the last posting, it seemed that they were Pareto by definition. What has changed?

If you had analyzed this, your posting here might have been more clear and more interesting. You would have pointed out that if our protagonists wait to bargain until the coin has been flipped and it is by-that-time known which bargainer has the chance at a carrier, then you get a post-flip Pareto-optimal KSBS or NBS bargain which is not pre-flip Pareto optimal. That is, the numerical meaning of Pareto-optimality changes when the coin is flipped. Or, more precisely, the meaning changes when the bargainers learn the results of the coin flip - when their best model of the world changes.

Or, in the local jargon, to use the phrase Pareto-optimal as if it were an objective property of a given bargain is to commit the "mind projection fallacy".

To summarize my disagreement with this one of your claims - you have not shown anything wrong with KSBS or NBS - you have merely shown that the "optimal" bargain depends upon what the bargainers know, and that sometimes what you know hurts you.

2) Solutions that are Pareto-Optimal are all of a particular format, that involve linear isomorphisms between the two utilities (technically, classes of affine isomorphisms with the same linear part).

Perhaps because my undergrad degree was in economics, this item struck me as so trivial that it didn't seem even worth mentioning. But maybe you are correct that it is worth spelling out in detail. However, even here there are interesting points that you could have made, but didn't.

The first point is that your μ factor, as well as U1 and U2, are not pure real numbers, they are what a scientist or engineer would call dimensioned quantities. Say that U1 is denominated in apples, and U2 is denominated in oranges. So it is mathematical nonsense to even try to add U1 to U2 (as naive utilitarianism requires) unless a conversion factor μ is provided (denominated in apples/orange).

The second is to point out more clearly that every bargaining solution (including KSBS and NBS) is equivalent to a choice of a μ such that U1 +μU2 gets maximized. Your real claim here is the insistence upon dynamic consistency. Choose μ before the coin gets flipped, you advise, and then stick with that choice even after you know the true state of the world.

And then, if you are familiar with the work of John Rawls, point out that this advice is roughly equivalent to Rawls's "Veil of Ignorance".

Now that might have been interesting.

Another direction you might have gone with this series is to continue the standard textbook development of bargaining theory - first covering Nash's 1953 paper in which he shows how to select a disagreement point taking into account the credible threats which each bargainer can make against the other, and then continuing to the Harsanyi/Seldon theory for games with incomplete information, and then continuing on through the modern theory of mechanism design. Smart people have been working hard on these kinds of problems for more than 50 years, so there is little a smart amateur can add unless he first becomes familiar with existing results. My main complaint about your attempt is that you quite clearly are not familiar. This stuff is not rocket science. Papers and tutorials are available online. Go get them.

Replies from: Perplexed, Stuart_Armstrong

↑ comment by Perplexed · 2010-10-28T01:11:42.532Z · LW(p) · GW(p)

... sometimes what you know hurts you.

Actually, this is misstated. Misstated as badly as the equivalent claim that in some games it helps to be irrational.

What helps is not being irrational, the thing that helps is being thought irrational (even if the only way to be thought irrational is to actually be irrational).

And in this case, similarly, it is not what you know that hurts you, it is what the other guy knows.

Replies from: timtyler

↑ comment by timtyler · 2010-10-29T17:06:33.320Z · LW(p) · GW(p)

Hmm. Did you see:

"Information Hazards: A Typology of Potential Harms from Knowledge"

http://www.nickbostrom.com/information-hazards.pdf

...?

↑ comment by Stuart_Armstrong · 2010-10-27T17:52:18.608Z · LW(p) · GW(p)

It also is extremely puzzling that in this posting you are saying that NBS and KSBS are not Pareto optimal when in the last posting, it seemed that they were Pareto by definition. What has changed?

You explained that yourself: they are Pareto Optimal in a single game, but are not when used as sub-solutions to a game of many parts.

Well, you seem to have understood everything pretty well, without the need for extra information. And yes, I know about comparing utility functions, and yes I know about Rawl's Veil of Ignorance, and its relevance to this example; I just didn't want to clutter up a post that was long enough already.

The insistence on dynamic consistency is to tie it in with UDT agents, who are dynamically consistent. And the standard solutions to games of incomplete information do not seem to be dynamically consistent, so I did not feel they are relevant here.

The first point is that your μ factor, as well as U1 and U2, are not pure real numbers, they are what a scientist or engineer would call dimensioned quantities.

U1 and U2 are equivalent classes of functions from a measurable set of possible worlds to the reals, where the equivalence classes are defined by affine transformations on the image set. μ is a functional that takes an element of u1 of U1 and an element u2 of U2 and maps them to a utility function equivalence class U3. It has certain properties under affine transformations of u1 and u2, and certain other properties if U1 and U2 and replaced with U1' and U2', and these properties are enough to uniquely characterise μ, up to a choice of elements in the real projective line with some restrictions.

But U1+μU2 is an intuitive summary of what's going on.

Replies from: Perplexed, Stuart_Armstrong

↑ comment by Perplexed · 2010-10-27T17:59:46.020Z · LW(p) · GW(p)

Well, you seem to have understood everything pretty well, without the need for extra information.

Actually, I didn't understand at all on first reading. I only came up with the "dynamic consistency" interpretation on a third draft of the grandparent, as I struggled to explain my earlier complaint more fully.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-10-28T08:43:03.413Z · LW(p) · GW(p)

I didn't actually put in dynamic consistency by hand - it just seems that anything that is Pareto optimal in expected utility for GG requires dynamical consistency.

Which is a cool result.

Replies from: Perplexed

↑ comment by Perplexed · 2010-10-28T15:47:35.892Z · LW(p) · GW(p)

anything that is Pareto optimal in expected utility for GG requires dynamical consistency.

Which is a cool result.

In my opinion, it is an intuitively obvious, but philosophically suspect, result.

Obvious because of course you travel farthest if you continue in a straight line, refusing to change course in mid stream.

Suspect because you have received new information in midstream suggesting that your original course is no longer the direction you want to go. So isn't an insistence on consistency a kind of foolishness, a "hobgoblin of little minds"?

But then, arguing for consistency, it could be pointed out that we are allowed to take new information into account in adjusting our tactics so as to achieve optimal results - maximizing the acquisition of joint utility in accordance with our original goals. The only thing we are not allowed to do is to use the new information to adjust our notion of fairness.

But then, arguing against consistency, we must ask "Why not adjust our notion of fairness?" After all, fairness is not some standard bestowed upon us from heaven - it is something we constructed ourselves for an entirely practical purpose - fairness exists so as to bind together agents with divergent purposes so they can cooperate.

So, if the arrival of new information suggests that a new bargain should be struck, why not strike a new bargain? Otherwise we can get into a situation in which one or the other of the agents no longer has any reason to cooperate except for a commitment made back in his younger and more ignorant days.

So you can call the result "cool" if you wish. I'm going to call it puzzling and provocative. Yes, I know that the "rules" of cooperative game theory include free enforcement of commitments. What puzzles me, though, is why the agents are willing to commit themselves to a course of action which may seem foolish later.

In your example, they are, in a sense, trading commitments - the commitments are a kind of IOU (or, since they are conditional, a kind of lottery ticket). In effect, someone who makes a commitment is printing money, which can then be used in trade. An interesting viewpoint on bargaining - a viewpoint worth exploring, I think.

↑ comment by Stuart_Armstrong · 2010-10-27T17:54:53.713Z · LW(p) · GW(p)

Sorry, that last part of the response was unworthy; but if you're being condescending to me, I feel the immense urge to be condescending back.

Replies from: Perplexed

↑ comment by Perplexed · 2010-10-27T18:14:39.434Z · LW(p) · GW(p)

The last part of your response was unworthy? Don't apologize, I had it coming.

The last part of my ("not rocket science") response was unworthy? Well, I'll apologize, if you insist, but I really think that you did a good job with the first (tutorial) post, but a rather confused and confusing job with the second post, when you thought you were sharing original research.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-10-28T08:41:13.832Z · LW(p) · GW(p)

Well, the posts were actually written for the purpose of the second post, and the new results therein. The first one was tacked on as an afterthought, when I realised it would be nice to explain the background to people.

Once again, my ability to predict which post people on less wrong will like fails spectacularly.

↑ comment by Mass_Driver · 2010-10-27T04:51:45.764Z · LW(p) · GW(p)

I dunno, Perplexed; this isn't Wikipedia. The post is fairly well-signposted, good paragraph breaks, and most of it is just an informal mathematical proof of a simple but important concept in game theory. I read and digested the parts of it that interested me in 60 seconds, and enjoyed it.

Upvoted.

↑ comment by SilasBarta · 2010-10-26T21:08:38.164Z · LW(p) · GW(p)

Agreed.

comment by JGWeissman · 2010-10-26T16:57:22.905Z · LW(p) · GW(p)

Two UDT agents playing the one-shot deterministic bargaining game should be able to agree to the aircraft carrier/nothing result even without the possibility of compensation, as the UDT-measure of the world where one was offered the aircraft carrier would be the same as the world where the other got the offer, even though they find themselves in one of these worlds. So they can see this situation in the same way as standard game theory agents facing the uncertain game.

Replies from: Stuart_Armstrong

↑ comment by Stuart_Armstrong · 2010-10-26T17:29:18.182Z · LW(p) · GW(p)

They can. But they'll still be using a μSMBS in every individual game, which is a useful result in its own right.

If you don't know the name of the game, just tell me what I mean to you

Contents

26 comments