# Taking into account another's preferences

post by GuySrinivasan · 2012-12-10T05:06:00.819Z · score: 10 (11 votes) · LW · GW · Legacy · 19 commentsQuestion regarding: If you don't know the name of the game, just tell me what I mean to you

I've been thinking about what it means to prefer that someone else achieve their preferences. In particular, what happens if you and I both prefer to adopt and chase after each other's preferences to some extent. This has clear good points, like cooperating and making more resources more fungible and thus probably being more efficient and achieving more preferences overall, and clear failure modes, like "What do you want to do? -- I don't know, I want to do whatever *you* want to do. Repeat."

My first thought: okay, simple, I'll just define my utility function U' to be U + *a*V where U was my previous utility function and V is your utility function and a is an appropriate scaling factor as per Stuart's post and then I can follow U'!^{1}

This has a couple problems. First, if you're also trying to change your actions based on what I want, there's a circular reference issue. Second, U already contains part of V by definition, or something^{2}.

My second thought: Fine, first we'll both factor our preferences into U = U_{1} + *a*V where U_{1} is my preference without regards to what you want. (Yours is V = V_{1} + *b*U) Basically what I want to say is "What do you want to do, 'cause ignoring you I want burgers a bit more than Italian, which I want significantly more than sandwiches from home" and then you could say "well ignoring you I want sandwiches more than Italian more than burgers but it's not a big thing, so since you mean *b* to me, let's do Italian". It's that "ignoring you" bit that I don't know how to correctly intuit. And by intuit I mean put into math.

Assuming it means something coherent to factor U into U_{1} + *a*V, there's still a problem. Watch what happens when we remove the self-reference. First scale U and V to something you and I can agree is approximately fungible. Maybe marginal hours, maybe marginal dollars, whatever. Now U = U_{1} + *a*(V_{1} + *b*U), so U - *ab*U = U_{1} + *a*V_{1} and as long as *ab*<1, you can maximize U by maximizing U_{1} + *a*V_{1}. Which sounds great, except that my intuition screams that maximizing U should depend on *b*. So what's up there? My guess is that somewhere I snuck a dependence on *b* into *a*...

(I like that the *ab*<1 constraint appears... intuitively I think it should mean that if we both try to care too much about what the other person wants, neither of us will get anywhere making a decision. "I don't know, what do *you* want to do?" In general if no one ever lets *a*>=1 then things should converge.)

I feel like the obvious next step is to list some simple outcomes and play pretend with two people trying to care about each other and fake-elicit their preferences and translate that into utility functions and just *check* to see how those functions factor. But I've felt like that for a week and haven't done it yet, so here's what I've got.

^{1}Of course I know humans don't work like this, I just want the math.

^{2}"Or something" means I have an idea that sounds maybe right but it's pretty hand-wavy and maybe completely wrong and I certainly can't or don't want to formalize it.

## 19 comments

Comments sorted by top scores.

This was going to be a comment on RichardKennaway's comment, but I thought it deserved to be top-level since it addressed more than just Richard's point.

Your setup is U = U1 + aV and V = V1 + aU. Richard's is U = U1 + aV1 and V = V1 + bU1.

In words, U1 is U's selfish preference and V1 is V's selfish preference. aV is U's altruistic preference in your model and aV1 is U's altruistic preference in Richard's model. The case is analogous for V's preferences. The difference is that in your model, agents have altruistic preferences over the other agent's full preferences while in Richard's model, agents have altruistic preferences over the other agent's selfish preferences. You might think your model more interesting or general in some sense, since agents have preferences over other agents' full preferences. Actually, Richard's model is more general and it's fairly easy to see. The key is that as long as a,b<1 (we assume that they're both >0 in both models so that both agents prefer to help each other), then the two models are equivalent.

To see this, let's focus on your model, with a,b<1. CCC's comment below shows that:

- U = (U1 + aV1) / (1 - ab)
- V = (V1 + bU1) / (1 - ab)

You can see this another way: U = U1 + aV and V = V1 + bU. By recursively substituting, we have:

- U = U1 + aV
- U = U1 + aV1 + abU
- U = U1 + aV1 + abU1 + a^2bV
- U = U1 + aV1 + abU1 + a^2bV1 + a^2b^2U
- ...
- U = (U1 + aV1) * sum_(k=0 to infinity) (ab)^k

Now the series at the end is geometric, so as long as ab < 1, it converges (otherwise it diverges) so we get

- U = (U1 + aV1) / (1 - ab)

and by symmetry

- V = (V1 + aU1) / (1 - ab)

Now you might think that this is different from Richard's model because of the scaling factor, 1/(1-ab), but utility functions are only uniquely defined up to affine (i.e. positive linear) transformations, so we can drop this scaling factor from both utility functions and have it represent exactly the same preferences - that, presented with any given choice, any positive constant as the scaling factor will yield exactly the same preference ranking over the choices and lotteries over the choices. Thus in your case and Richard's, the utility functions are the same.

Now to see that Richard's is more general than yours, just note that in Richard's case, we can set a or b as large as we want without paradox (representing a case where an agent care more about the other agent than itself), while your framework won't allow for that without everyone always have inifinite (in absolute value) utility.

Despite the fact that Richard's is more flexible, it seems to me that GuySrinivasan's is more accurate. Maia and I have toyed with this idea for a while: I can be made happy because Maia is happy because I am happy, not just when Maia's happy for herself. You could argue that you should just factor this in as a multiplier on my internal utility (as opposed to that I get from Maia), but it only happens when she's around, so...

I suspect a less elegant but more accurate solution is to bound the utility you get from external sources, or to bound the utility you get that's reflected more than once, because I agree that ab<1 is a tricky constraint.

Despite the fact that Richard's is more flexible, it seems to me that GuySrinivasan's is more accurate.

In what sense is GuySrinivasan's more accurate? If ab < 1, the two models yield exactly the same preference relations. Guy may start buy explicitly modeling the behavior that you want to capture, but since the two models are equivalent, that behavior is implicit in Richard's model.

That's true unless you compare cases where the two people are together to when they are apart (a=b=0).

I don't follow. When a=b=0, U=U1 and V=V1 for both models.

Right, which is as it should be. However, say V1 is 0. Then In the model I favor, U>U1 if a,b>0, but U=U1 if ab=0, while in the model you favor, U=U1 in both cases. I believe the former corresponds better to reality, because, essentially, happiness is better when shared: you get to enjoy the other person being happy because you're happy.

Right, which is as it should be. However, say V1 is 0. Then In the model I favor, U>U1 if a,b>0, but U=U1 if ab=0, while in the model you favor, U=U1 in both cases. I believe the former corresponds better to reality, because, essentially, happiness is better when shared: you get to enjoy the other person being happy because you're happy.

Be careful. U1 means something different in our two models. In the model you favor, U1 represents how much the Jane cares her own selfish desires before taking into account the fact that she cares about all of Bob's desires and that Bob also cares about her selfish desires. In the model I favor, U1 represents how much Jane cares about her own selfish desires after taking everything into account. That the two models say something different about the relationship between U1 and U is no surprise because they define U1 differently.

My proof showed that the two models are equivalent - they represent exactly the same set of preferences. Anything that you model does, my model does, modulo the definition of terms.

However, say V1 is 0. Then In the model I favor, U>U1 if a,b>0,

Also, this is false. Your model says:

- U = U1 + aV
- V = V1 + bU

Since V1 = 0, we have V = bU. Thus U = U1 + abU, so assuming 0 < ab < 1, this gives U = U1 / (1 - ab). Now U1 can be negative, in which case U < U1.

You're right about U1 being negative: I meant to say |U|>|U1|, unless they're both 0.

If you only compare situations with the same a and b values to each other, then yes, the models do yield the same results, but it seems that comparing situations with varying a and b is relevant.

I agree that U1 means something different in each model, and you can of course choose values of U1 such that you force the predictions of one model to agree with the other. I prefer to define U1 as just your selfish desires because that way, only the empathy coefficients change when the people you're associated with change: you don't have to change your utilities on every single action.

If you only compare situations with the same a and b values to each other, then yes, the models do yield the same results, but it seems that comparing situations with varying a and b is relevant.

So you want to compare my model with one set of values for (a,b) to your model with another set of values, then say they're different?

Therre's something a bit odd about the formulation U = U1 + aV = U1 + aV1 + abU1 = ... The term abU1 amplifies your "autologous" utility U1 by adding the value you place on the value the other gets from knowing that you are getting U1. And there will be additional terms ababU1, abababU1, etc. like a series of reflections in a pair of mirrors. If ab is close to 1 then both of your autologous utilities get hugely amplified. (BTW, this is where dependence on b shows up: the larger b is, the greater the utility you get over U1 + aV1, by a factor of 1/(1-ab).)

Would U = U1 + aV1, V = V1 + bU1 be more realistic? You're still trying to maximise U1+aV1, but without the echo chamber of multiple orders of vicarious utility.

Or you could carry it on one term further, allowing two orders of vicarious utility: U = U1 + aV1 + abU1 = (1+ab)U1 + aV1, and V = (1+ab)V1 + bU1.

I am not sure there is a principled way to decide among these.

**[deleted]**· 2012-12-15T12:00:36.398Z · score: 3 (3 votes) · LW · GW

intuitively I think it should mean that if we both try to care too much about what the other person wants, neither of us will get anywhere making a decision.

If we are mistaken about each other's preferences, even worse stuff can happen.

Are you talking about humans or AIs?

If you're talking about AIs that care about each other's utility functions as such, then the possibility of "true" explosion here seems perfectly possible. So you might be talking about a reason not to design an AI (or multiple AIs) that way, for fairly concrete programming reasons.

If you're talking about empathy between humans, I suspect we don't and maybe even can't care about each others' utility functions as such. We care about each other's happiness and well-being, but that seems to be a different thing.

Which sounds great, except that my intuition screams that maximizing U should depend on b. So what's up there?

Your condition ab<1 is incomplete; you appear to be implicitly assuming a constant b.

Consider the equation:

U - abU = U1 + aV1

Therefore, U(1 - ab) = U1 + aV1

Now consider holding U1, V1 and a constant and changing b (but keeping to ab<1). Since U1, V1 and a are constant, the product U(1-ab) is constant. Thus, U and (1-ab) are inversely proportional; a decrease in the value of (1-ab) results in an increase in the value of U. A decrease in (1-ab) is caused by an increase in b.

Thus, an increase in b results in an increase in U. This goes to infinity the closer ab gets to 1; that is, the closer b gets to 1/a.

As a general strategy, picking random values for all-but-one variable and using some graphing software, like gnuplot, to plot the effects of the last variable will generally help to visualise this sort of thing.

Today, using rational agents as models, you have mathematically described why love is so complicated. Ive said before that love can me modeled as caring about another's preferences, but I never thought about it in terms of game theory before...

And I thought game theory was complicated when you are playing against strangers and enemies..im gonna return and play with your equations later.

The funny thing is, I think humans actually *do* work a lot like this, except we usually *don't* take into account that the other person cares about our utility function as well when we are factoring them in.

Just giving it an intuitive look though...if one persons scaling function is 2000 and the other is .01, it adds to more than 1... But wouldn't the intuitive result be that the person with the smaller scaling function gets their way all the time, rather than indecision?

if one persons scaling function is 2000 and the other is .01, it adds to more than 1... But wouldn't the intuitive result be that the person with the smaller scaling function gets their way all the time, rather than indecision?

The series U1 + aV1 + abU1 + abaV1 + ababU1 + ... diverges. The multiple orders of vicarious utility add up to an infinite amount.

If you have scaling functions of 50 and 0.01, then it converges, and indeed the person with the smaller scaling function gets their way, because whatever they want is what the other person vicariously wants for them, almost to the exclusion of their personal wants. Outside of the fictional world of Gor, I do not think this is a sustainable relationship.

But... Choice x and choice y. X gives me 1 util, y gives me 0 utils. My partner has reverse preferences. I get a=.01, she gets b = 1000. Ab=10

My partner gets 0 straight utils from x, 1000 vicarious utils + 10000 second order util + ... She gets 1 straight util from y, 0 vicarious utils, 10 second order utils, 100 third order ...

So even if it diverges, both divergences point to the same choice past n=1. It's not like ab>1 causes indecision paralysis.

Also, when I play with these later I'm going to find a way to dampen this, because the runaway utils aren't realistic...

Someonewrong already said that in real life we don't recurse it, so there is no series, so it doesn't explode - and I think this incorrect person is correct.

And sustainable or not, I've seen it.

And sustainable or not, I've seen it.

I've heard of it. Probably don't want to see it.

What happens if one of the humans says, "What I *don't* want is for us to be unable to make a decision-- what can we do to get out of this?"

I suspect that the situation happens when either neither of them has a strong desire, so they feel they might as well be accommodating, or they both have inhibitions against saying what they want.