Subjective Altruism

post by Scott Garrabrant · 2013-10-18T04:06:54.419Z · LW · GW · Legacy · 29 comments

Contents

29 comments

Let us assume for the purpose of this argument that Bayesian probabilities are subjective. Specifically, I am thinking in terms of the model of probability expressed in model 4 here. That is to say that the meaning of P(A)=2/3 is "I as a decision agent care twice as much about possible world in which A is true relative to the possible world in which A is false." Let us also assume that it is possible to have preferences about things that we will never observe.

Consider the following thought experiment:

Alice and Bob are agents who disagree about the fairness of a coin. Alice believes that the coin will come up heads with probability 2/3 and Bob believes the coin will come up tails with probability 2/3. They discuss their reasons for a long time and realize that their disagreement comes from different initial prior assumptions, and they agree that both people have rational probabilities given their respective priors. Alice is given the opportunity to gamble on behalf of Bob. Alice must call heads or tails, then the coin will be flipped once. If Alice calls the coin correctly, then Bob will be given a dollar. If she calls the coin incorrectly, then nothing happens. Either way, nobody sees the result of the coin flip, and Alice and Bob never interact again. Should Alice call heads or tails?

The meat of this question is this: When trying to being altruistic towards a person, should you maximize their expected utility under their priors or your own. I will present an argument here, but feel free to stop reading here and think about it on your own and post the results.

First of all, notice there there are actually 3 options:

1) Maximize your own expectation of Bob's utility

2) Maximize Bob's expectation of his utility

3) Maximize what Bob's expectation of his utility would be if he were to update on the evidence of everything that you have observed.

At first it may have looked like the main options were 1 and 2, but I claim that 2 is actually a very bad option and the only question is between options 1 and 3. Option 2 is stupid because for example it would cause Alice to call tails even if she has already seen the coin flip and it came up heads. There is no reason for Alice not to update on all of the information she has. The only question is whose prior should she update from. In this specific thought experiment, we are assuming that 2 and 3 are the same, since Alice has already convinced herself that her observations could not change Bob's mind, but I think that as a general options 1 and 3 are somewhat reasonable answers, while 2 is not.

Option 3 has the nice property that it does not have to observe Bob's utility function. It only has to observe Bob's expected utility of different choices. This is nice because in many ways "expected utility" seems like a more fundamental and possibly more well defined concept than "utility." We are trying to be altruistic towards Bob. It seems natural to give Bob the most utility in the possible worlds that he "cares about" the most.

On the other hand, we want the possible worlds that we care about most to be as good as possible. We may not ever be able to observe whether of not Bob gets the dollar, but it is not just Bob who wants to get the dollar. We also want Bob to get the dollar. We want Bob to get the dollar in the most important possible worlds, the worlds we assign a high probability to. What we want is for Bob to be happy in the worlds that are important. We may have subjectively assigned those possible worlds to be the most important ones, but from the standpoint of us as a decision agent, the worlds we assign high probability to really are more important than the other ones.

Option 1 is also simpler than option 3. We just have a variable for Bob's utility in our utility function, and we do what maximizes our expected utility. If we took option 3, we would be maximize something that is not just a product of our utilities with our probabilities. 

Option 3 has some unfortunate consequences. For example, it might cause us to pray for a religious person even if we are very strongly atheist. 

I prefer option 1. I care about the worlds that are simple and therefore are given high probability. I want everyone to be happy in those worlds. I would not sacrifice the happiness in someone in a simple/probable/important world just because someone else thinks another world is important. Probability may be subjective, but relative to the probabilities that I use to make all my decisions, Bob's probabilities are just wrong.

Option 3 is nice in situations where Alice and Bob will continue interacting, possibly even interacting through mutual simulation. If Alice and Bob were given a symmetric scenario, then this would become a prisoner dilemma, where Alice choosing heads corresponds to defecting, while Alice choosing tails corresponds to cooperating. However, I believe this is a separate issue.

29 comments

Comments sorted by top scores.

comment by Dagon · 2013-10-18T12:27:36.821Z · LW(p) · GW(p)

This seems like a lot of setup for a trivial answer. Or more likely, the complexity isn't where you think, so you've given the wrong detail in the setup. (or I've missed the point completely, which is looking likely based on other comments from people much smarter than I).

There's no part of the scenario that allows an update, so classical decision theory is sufficient to analyze this. Alice believes that there is a 2/3 chance of heads, and she prefers a world where her guess is correct. She guesses heads. Done.

This is actually option 0: maximize your own utility, recognizing that you get utility from your beliefs about other's happiness(*).

You can add complexity that leads toward different choices, but unless there are iterated choices and memory-loss, or reverse-causality, or other decision-topology elements, it's unlikely to be anything but option 0. Bob's probability assessment is completely irrelevant if you can't update on it (which you ruled out) and if Bob can never learn that you ignored his beliefs (so there's no utility or happiness in giving up expected money for him to show him loyalty).

  • note: I say "your utility" and "others' happiness" on purpose: the terms for them in your utility function are actually referring to your model of their utility's effect on you, rather than their utility, which you cannot detect.
Replies from: Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T17:51:06.734Z · LW(p) · GW(p)

So you are saying that you have no cares whatsoever about whether or not other peoples preferences are fulfilled. Just about their happiness? Is this just because you cannot observe others preferences?

I think that if you believe that, then I agree with you. There is no value of this thought experiment to you. This is one of the things I tried to say at the beginning about preferences over things you cannot observe. I probably should have said specifically how this relates to altruism.

Replies from: Dagon
comment by Dagon · 2013-10-18T20:18:44.484Z · LW(p) · GW(p)

I think you're confusing preferences about the world and preferences about an un-observable cause. As an altruist, Alice cares about Bob's preferences whether to have a dollar or not. Bob has no way of having knowledge of (or a preference over) Alice's prediction, and she knows it, so she'd be an idiot to project that onto her choice. If she thinks Bob may be right, then she updated her probability estimate, in contradiction of the story.

options 2 and 3 are twice as likely to lose than option 1. This is what it means for Alice to have the belief that there is a 2/3 chance of heads.

comment by Adele_L · 2013-10-18T04:36:19.697Z · LW(p) · GW(p)

That is to say that the meaning of P(A)=2/3 is "I as a decision agent care twice as much about possible world in which A is true relative to the possible world in which A is false."

Wait - I notice that I am really confused... Could an agent just pump good outcomes for themself through sheer apathy (and a human who did this would be considered a wireheader)? Or am I misunderstanding the idea?

My first intuition was that actually, Option 1 would be good for an iterated situation, whereas Option 3 is "truly" altruistic. After doing Option 1, we expect Bob to gain more utility as far as we are concerned, and in an iterated situation, expect Bob to respond "gratefully" more often in the worlds we care about.
But doing Option 3 is what makes Bob happiest in the worlds he cares about the most, like you said. It seems that the main reason for choosing Option 2 would be to signal altruism towards Bob.

Replies from: Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T04:44:14.511Z · LW(p) · GW(p)

People do not get to choose to be apathetic, so you cannot purposefully get good outcomes for yourself by only caring about the possible worlds in which you get good outcomes. The question you are asking about pumping out good outcomes could be translated to pretty much any model of probability: "Could an agent just increase their expected utility by choosing to believe that good things will happen to them?"

I do not think that Bob would respond gratefully to option 1 in the worlds I care about. I think he would respond most gratefully to Option 3, even though he did not get the dollar.

Replies from: Adele_L
comment by Adele_L · 2013-10-18T04:51:23.808Z · LW(p) · GW(p)

Well, I agree that current humans can't choose to do this, but it seems like it might be possible through technology. It seems that it would be something highly analgous to wireheading. Thanks for rephrasing it this way, it helped the idea click for me.

I guess it depends on whether or not Bob knows your decision algorithm. If all he can see is whether he got the dollar or not, then in the worlds you care about, he gets dollars more often, and thus reciprocates more often, as far as you are concerned. But if he realizes you are using option 3, then he would be more grateful in this case.

Replies from: DanielLC, Tyrrell_McAllister, Scott Garrabrant
comment by DanielLC · 2013-10-18T06:07:28.986Z · LW(p) · GW(p)

From what I understand, you're asking about people self-modifying to believe that something they desire is true.

There is no reason to do this. The choice of self-modifying doesn't increase the probability of it being true. All it does is result in future!you believing it's true, but you don't care about what future!you believes. You care about what actually happens.

comment by Tyrrell_McAllister · 2013-10-18T18:50:41.176Z · LW(p) · GW(p)

Well, I agree that current humans can't choose to do this, but it seems like it might be possible through technology.

Even with technology, it won't be possible to decide to have already been apathetic at the time when you are deciding whether to become apathetic. Hence, when you are deciding whether to become apathetic, you will make that decision based on what you cared about pre-apathy. So if, in that pre-apathetic state, you care about things that apathy would harm, then you won't decide to become apathetic.

Replies from: Adele_L
comment by Adele_L · 2013-10-18T20:25:13.255Z · LW(p) · GW(p)

Yeah, but people are stupid so they might do it anyway. This is why I said it was like wireheading, you are increasing your expected pleasure (in some sense) at the cost of losing lots of things you really care about.

comment by Scott Garrabrant · 2013-10-18T05:19:09.671Z · LW(p) · GW(p)

Also, an agent that could change his preferences could just give himself good outcomes by choosing to change his preferences to the outcomes that he will get. The point is that if you could do this, you would not increase your expected utility, but instead create a different agent with a very convenient utility function. This is not something I would want to do.

Oh, yes, you are right, I forgot that he doesn't get to see the result.

comment by [deleted] · 2013-10-21T13:39:32.310Z · LW(p) · GW(p)

I think I have an answer, but it requires repetition and the ability to gather information to be useful, and the problem above rules that out. I do see a way to get around that, but there is also a possible flaw in that as well. I'll post my current thoughts.

Essentially, have Alice track information about how accurate she is in circumstances where she thinks her priors are right, and another person think their priors are right, and no agreement can be made easily.

If Alice for instance finds that the last 10 times she has had an argument with someone else that appeared to be from different priors, and she later found out that she was only incorrect one of those times, then using that information, she should probably go with her own set of priors.

Alternatively, if Alice finds that the last 10 times she has had an argument with someone else that appeared to be from different priors, that she was only correct one of those times, then using that information, she should probably go with her interpretation of Bob's set of priors.

The problem is, In actual arguments, there doesn't seem to be a good way of accounting for this. I can think of a few cases where I did change my mind on priors after rereviewing them later, but I can't remember as clearly ever changing anyone elses, (but I have to bear in mind, I don't have access to their brains.)

Furthermore, there is the bizarre case of "I think I'm usually wrong. But sometimes, other people seem to trust my judgement and believe I'm correct more often then not." I have no idea how to process that thought using the above logic.

Replies from: ThisSpaceAvailable
comment by ThisSpaceAvailable · 2013-10-28T03:34:48.037Z · LW(p) · GW(p)

If Alice thinks that the other times she disagreed with people are representative of her current disagreement, I don't see why she shouldn't update according to the normal Bayesian rules.

comment by cousin_it · 2013-10-18T08:46:29.151Z · LW(p) · GW(p)

For what it's worth, the simple-minded answer given by UDT is 2. If Bob's argumentless utility computation is U(), whose internals encode both Bob's prior and his utility, then Bob would be willing to pay Alice a penny to maximize U(), but won't necessarily pay to maximize a version of U() that is "remixed" using Alice's beliefs. The distinction between 2 and 3 goes away because Alice is optimizing her input-output map instead of doing Bayesian updating.

That said, I'm not sure people's priors actually disagree that much. The difference between religion and atheism would probably disappear by Aumann agreement.

Replies from: Scott Garrabrant, Dagon, Scott Garrabrant, Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T18:13:13.344Z · LW(p) · GW(p)

I am not convinced that religion and theism would disappear by Aumann agreement. I know at least one religious person who is actually pretty rational, but who rejects Occam's razor. I feel like option 3 would tell me that it would be altruistic to let this person die, so that they could go to heaven.

comment by Dagon · 2013-10-18T12:34:19.994Z · LW(p) · GW(p)

wait, what? how does UDT make this calculation right for Alice? Bob is wrong (from all of her knowledge and all paths for her to update). Her utility is a direct mapping of the outcome of the bet - what path of communication does Bob's expected value take to get to her?

For a non-altruist, this is clearly a chance for Alice to money-pump Bob. But the setup of the problem is that Alice gets utility from Bob's actual money outcome, not his beliefs. Once Alice is done updating (leaving her at 2/3 chance of heads), that's her belief, and it doesn't change after that.

Replies from: Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T17:55:40.250Z · LW(p) · GW(p)

Her utility is a direct mapping of the outcome of the bet - what path of communication does Bob's expected value take to get to her?

I am not quite sure what you mean about this, but I had Alice and Bob discuss at the begining so that Alice would know Bob's probability, and I was assuming utility of the dollar was positive. That is all she needs to know Bob's (normalized) expected value.

comment by Scott Garrabrant · 2013-10-18T09:17:42.706Z · LW(p) · GW(p)

Also, it is not clear to me why what Bob would be willing to pay Alice to do is necessarily what Alice should do.

comment by Scott Garrabrant · 2013-10-18T09:09:11.302Z · LW(p) · GW(p)

I disagree. UDT does not give an answer to this.

The distinction between 2 and 3 is still there. 2 is still the strategy that chooses the action (not the function from inputs to actions) that Bob would like you to have chosen. 3 chooses the function from inputs to actions which 3 would like you to have chosen. Those two options are different no matter what decision theory you use.

From your claim that UDT gives 2, I think you did not understand what I meant by option 2. I think that what you mean when you say option 2 with UDT is what I mean when I say option 3. Yes, UDT doesnt update, but what option 3 is supposed to mean is that you choose the option you think Bob would want if he knew what you do. i.e. choose the function from input to output that bob would want. Option 2 was really meant to ignore your input and just care about what Bob would want you to choose not given your input. Again, I trust that this is just bad communication, and with my definitions of the 3 options you meant to say UDT says 3. Let me know if you think I am wrong about your intention.

I also disagree with the claim that UDT says 3 (or 2) is better than 1. The question is about should we take Bobs utility as a term in our own utility function (and then multiply it by our probabilities), or should we take Bob's expected utility of our action as a term in our own utility function (which already has his probability built into it). This is a question about how we think that we should allow other peoples utility function to influence our own. UDT doesn't tell us what kind of utility functions we should have.

Replies from: cousin_it
comment by cousin_it · 2013-10-18T09:27:56.355Z · LW(p) · GW(p)

Again, I trust that this is just bad communication, and with my definitions of the 3 options you meant to say UDT says 3.

Yes. Sorry for parsing your post incorrectly.

The question is about should we take Bobs utility as a term in our own utility function (and then multiply it by our probabilities), or should we take Bob's expected utility of our action as a term in our own utility function (which already has his probability built into it).

I guess the second option sounds better to me because it generalizes more easily. What if Alice and Bob have different sets of possible worlds (or "cared-about" worlds) in the first place? What if Alice can't disentangle the definition of Bob's U() into "probabilities" and "utilities", or can do it in multiple ways? My simple-minded answer still works in these cases, while the "remixing" answer seems to become more complicated.

About your other comment, it seems clear that bargaining should lead to a weighted sum like the one I described, and we get a nicer theory if altruism and bargaining are described by weighted sums of the same kind. You might disagree with arguments that rely on mathematical neatness, though...

Replies from: Vladimir_Nesov, Scott Garrabrant, Scott Garrabrant
comment by Vladimir_Nesov · 2013-10-18T10:44:52.242Z · LW(p) · GW(p)

Preference should still apply to all possible situations. If idealized Bob gains control of Alice's decision, he has access to both the action and Alice's factual knowledge, and so the decision specifies how the action depends on that knowledge. This looks more like option 3, even though I agree that separating prior and utility might be a wrong way of formulating this.

Replies from: Scott Garrabrant, cousin_it
comment by Scott Garrabrant · 2013-10-18T18:01:16.267Z · LW(p) · GW(p)

I think the only way to formulate option 1 is by separating prior and utility. (Not your own prior and utility, but having some model of the other person's prior and utility separately)

I agree that option 3 is prettier because it doesn't have to do this, but is it better?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2013-10-18T18:56:03.573Z · LW(p) · GW(p)

This needs a distinction between a prior that is "prior to all your knowledge" and prior that already takes into account your current knowledge and would be updated by future observations. I guess prior in the first sense could be seen as a fixed aspect of preference, while prior in the second sense reflects a situation where an action might be performed, so that it can be different in different situations with the same preference.

Thus, perfectly altruistic Alice should have a different prior in the second sense, taking into account Alice's knowledge rather than Bob's, but the same prior in the first sense, reflecting the same distribution of caring over possible worlds as Bob.

Replies from: Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T19:01:21.165Z · LW(p) · GW(p)

Why should Alice have the same distribution of caring as Bob?

My definition of prior was in the first sense.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2013-10-18T21:14:41.790Z · LW(p) · GW(p)

Bob's prior in the first sense is not factual knowledge, it's a description of which worlds Bob considers how important, so Alice can't improve on it by knowing something that Bob doesn't. A difference in priors in the first sense reflects different distributions of moral relevance associated with possibilities. When Alice knows something that Bob doesn't, it is a statement about her priors in the second sense, not the first sense.

Thus, to the extent that Alice doesn't assume Bob's priors in the first sense, Alice doesn't follow Bob's preference, which would be a failure of perfect altruism. Alice's prior doesn't reflect different (or additional) knowledge, so its use would not be an improvement in the sense of Bob's preference.

comment by cousin_it · 2013-10-18T11:15:38.358Z · LW(p) · GW(p)

Yes, when I said 2, I actually meant 3.

comment by Scott Garrabrant · 2013-10-18T09:43:14.742Z · LW(p) · GW(p)

You might disagree with arguments that rely on mathematical neatness, though...

I love mathematical neatness. The fact that the answer that felt right to me and the one that felt mathematically neat were different is what motivated me to make this post. It does not seem to me that the math of bargaining and the math of altruism and the math of bargaining should look the same though. They are not that similar, and they really feel like they are maximizing different things.

comment by Scott Garrabrant · 2013-10-18T09:37:03.160Z · LW(p) · GW(p)

Yes, these complaints about option 1 are very real, and they bother me, which makes me unsure about my answer, and is a big part of why I created this post.

However the fact that factoring Bob's U may not be easy or possible for Alice is not a good reason to say that Alice shouldn't try to take that action that maximizes her expectation of Bob's Utility. It makes her job harder, but that doesn't mean she should try to optimize something else just because it is simpler.

I prefer 1 to 3, in spite of the fact that I think 3 actually is the more aesthetically pleasing answer.

Replies from: cousin_it
comment by cousin_it · 2013-10-18T11:51:17.291Z · LW(p) · GW(p)

If probability is caring, what does it mean for Alice to say that Bob's caring is wrong? It seems to me that the intuitions in favor of option 1 are strongest in the case where some sort of "objective probability" exists and Alice has more information than Bob, not different priors. But in that case, options 1 and 3 are equivalent.

If you want to build a toy example where two agents have different but reasonable priors, maybe Robin Hanson's pre-rationality is relevant? I'm not sure.

Note that your interpretation of altruism might make Alice go to war against Bob, even if she has no wishes of her own and cares only about being altruistic toward Bob. I guess the question is what are your desiderata for altruism?

Replies from: Scott Garrabrant
comment by Scott Garrabrant · 2013-10-18T18:06:21.832Z · LW(p) · GW(p)

If probability is caring, what does it mean for Alice to say that Bob's caring is wrong?

In the exact same way that with subjective morality, relative to me, other peoples claims about morality are wrong. All I meant by that is that Alice doesn't care in the probability sense more about the world just because Bob does because relative to Alice, Bob is simply caring about the things that are not very important.