Babyeater's dilemma

post by Giles · 2011-11-15T20:15:25.446Z · LW · GW · Legacy · 72 comments

Imagine it's the future, and everything has gone according to plan. Humanity has worked out its own utility function, f0, and has worked out a strategy S0 to optimize it.

Humanity has also run a large number of simulations of how alien worlds evolve. It has determined that of those civilizations which reach the same level of advancement - that know their own utility function and have a strategy for optimizing it - there is an equal probability that they will end up with each of 10 possible utility functions. Call these f0...f9.

(Of course, these simulations are coarse-grained enough to satisfy the nonperson predicate).

Humanity has also worked out the optimal strategy S0...S9 for each utility function. But they just happen to score poorly on all of the others:

fi(Si) = 10
fi(Sj) = 1 for i != j

In addition, there is a compromise strategy C:

fi(C) = 3 for all i.

The utility functions, f0 through f9, satisfy certain properties:

They are altruistic, in the sense that they care just as much about far-away aliens that they can't even see as they do about members of their own species.

They are additive: if one planet implements Sj and another implements Sk, then:
fi(Sj on one planet and Sk on the other) = fi(Sj) + fi(Sk).

(This is just to make things easier - the problem I'm describing will still apply in cases where this rule doesn't hold).

They are non-negotiable. They won't "change" if that civilization encounters aliens with a different utility function. So if two of these civilisations were to meet, we would expect it to be like the humans and the babyeaters: the stronger would attempt to conquer the weaker and impose their own values.

In addition, humanity has worked out that it's very likely that a lot of alien worlds exist, i.e. aliens are really really real. They are just too far away to see or exist in other Everett branches.

So given these not entirely ridiculous assumptions, it seems that we have a multiplayer prisoner's dilemma even though none of the players has any causal influence on any other. If the universe contains 10 worlds, and each chooses its own best strategy, then each expects to score 19. If they all choose the compromise strategy then each expects to score 30.

Anyone else worried by this result, or have I made a mistake?

72 comments

Comments sorted by top scores.

comment by ShardPhoenix · 2011-11-15T22:22:29.966Z · LW(p) · GW(p)

The altruistic assumption given here seems implausible for a utility function ultimately derived from evolution, so while it's an interesting exercise I'm not sure there's anything to be worried about in practice.

comment by [deleted] · 2011-11-15T21:40:56.636Z · LW(p) · GW(p)

I'm not worried by the result because there are two very implausible constraints: the number of possible utility functions and the utility of the compromise strategy. Given that there are, in fact, many possible utility functions, it seems really really unlikely that there is a strategy that has 3/10 the utility of the optimal strategy for every possible utility function. Additionally, some pairs of utility functions won't be conducive to high-utility compromise strategies. For example: what if one civilization has paperclip maximization as a value, and another has paperclip minimization as a value?

ETA: Actually, this reasoning is somewhat wrong. The compromise strategy just has to have average utility > x/n, where n is the number of utility functions, and x is the average utility of the optimal strategies for each utility function. (In the context of the original example, n = 10 and x = 10. So the compromise strategy just has to be, on average, > 1 for all utility functions, which makes sense.) I still submit that a compromise strategy having this utility is unlikely, but not as unlikely as I previously argued.

Replies from: Giles, amcknight
comment by Giles · 2011-11-15T22:10:41.561Z · LW(p) · GW(p)

The compromise strategy just has to have average utility > x/n

I'm still not sure this is right. You have to consider not just fi(Si) but all the fi(Sj)'s as well, i.e. how well each strategy scores under other planets' utility functions. So I think the relevant cutoff here is 1.9 - a compromise strategy that does better than that under everyone's utility function would be a win-win-win. The number of possible utility functions isn't important, just their relative probabilities.

You're right that it's far from obvious that such a compromise strategy would exist in real life. It's worth considering that the utility functions might not be completely arbitrary, as we might expect some of them to be a result of systematizing evolved social norms. We can exclude UFAI disasters from our reference class - we can choose who we want to play PD with, as long as we expect them to choose the same way.

comment by amcknight · 2011-11-17T20:03:42.183Z · LW(p) · GW(p)

It's a toy example but doesn't it still apply if you have an estimate of the expected distribution of instances that will actually be implemented within mind space? The space of possible minds is vast, but the vast majority of those minds will not be implemented (or extremely less often). The math would be much more difficult but couldn't you still estimate it in principle? I don't think your criticism actually applies.

comment by Vladimir_Nesov · 2011-11-16T21:54:29.458Z · LW(p) · GW(p)

Edit: This comment is retracted. My comment is wrong, primarily because it misses the point of the post, which simply presents a usual game theory-style payoff matrix problem statement. Thanks to Tyrrell McAllister for pointing out the error, apologies to the readers. See this comment for details. (One more data point against going on a perceptual judgement at 4AM, and not double-checking own understanding before commenting on a perceived flaw in an argument. A bit of motivated procrastination also delayed reviewing Tyrrell's response.)


Humanity has also worked out the optimal strategy S0...S9 for each utility function. But they just happen to score poorly on all of the others:

f_i(S_i) = 10
f_i(S_j) = 1 for i != j

Who is following these strategies? The only interpretation that seems to make sense is that it's humanity in each case (is this correct?), that is S2 is the strategy that, if followed by humanity, would optimize aliens #2's utility.

In this case, the question is what do the f_i(S_j) mean. These are expected utilities of a possible strategy, but how do you compute them? CDT, TDT and UDT would have it differently.

In any case, it's conventional to mean by "expected utility of a possible decision" the value that you'll be actually optimizing. With CDT, it's computed in such a way that you two-box on Newcomb as a result, in TDT and UDT the bug is fixed and you one-box, but still by optimizing expected utility (computed differently) of the decision that you'd make as a result. Similarly for PD, where you one-box in UDT/ADT not because you take into account utilities of different agents, but because you take into account the effect on your own utility mediated by other agent's hypothetical response to your hypothetical decision, that is you just compute your own expected utility more accurately, and still just maximize only your own utility.

Cooperation in PD of the kind TDT and UDT enable is not about helping the other, it's about being able to take into account other's hypothetical cooperation arising in response to your hypothetical cooperation. Altruistic agents already have their altruism as part of their own utility function, it's a property of their values that's abstracted away at the level where you talk about utilities and should no longer be considered at that level.

So the answer to "What should you maximize?" is, by convention, "Your own expected utility, period." This is just what "expected utility" means (that is, where you can factor utility as expected value of a utility function over some probability distribution; otherwise you use "utility" in this role). The right question should be, "How should you compute your expected utility?", and it can't be answered given the setup in this post, since f_i are given as black boxes. (Alternatively, you could give a way of estimating utility as a black box, and then consider various ways of constructing an estimate of expected utility out of it.)

Replies from: Tyrrell_McAllister, wedrifid
comment by Tyrrell_McAllister · 2011-11-17T22:55:10.095Z · LW(p) · GW(p)

In this case, the question is what do the f_i(S_j) mean. These are expected utilities of a possible strategy, but how do you compute them? CDT, TDT and UDT would have it differently.

[...]

(Alternatively, you could give a way of estimating utility as a black box, and then consider various ways of constructing an estimate of expected utility out of it.)

The post calls the functions f_i "utility functions", not "expected utility functions". So, I take Giles to be pursuing your "alternative" approach. However, I don't think that f_i(S_j) denotes the total utility of a state of the universe. It is just one of the terms used to compute such a total utility.

From the comments about additivity, I take f_i(S_j) to be the amount by which the utility of a universe to species i would increase if a planet following strategy j were added to it (while the strategies of all other planets remained unchanged), regardless of how or by whom that planet is added. Giles's question, as I understand it, is, how should these "utility" terms be incorporated into an expected utility calculation? For example, what should the probability weights say is the probability that species i will produce a planet following the compromise strategy, given that we do?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-11-18T18:54:05.626Z · LW(p) · GW(p)

You are right, I retract my comment.

The post calls the functions f_i "utility functions", not "expected utility functions".

(As an aside, some terminological confusion can result from there being a "utility relation" that compares lotteries, that can be represented by a "utility function" that takes lotteries as inputs, and separately expected utility representation of utility relation (or of "utility function") that breaks it down into a probability distribution and a "utility function" in a different sense, that takes pure outcomes as inputs.)

However, I don't think that f_i(S_j) denotes the total utility of a state of the universe. It is just one of the terms used to compute such a total utility.

Right.

From the comments about additivity, I take f_i(S_j) to be the amount by which the utility of a universe to species i would increase if a planet following strategy j were added to it (while the strategies of all other planets remained unchanged), regardless of how or by whom that planet is added.

Or, more usefully (since we can't actually add planets), the utility function of aliens #k that takes a collection S of strategies for each of the planets under consideration (i.e. a state of the world) is

F_k (S) = sum_p f_k(S_p)

Then, the decision problem is to maximize expected value of F_0(S) by controlling S_0, a standard game theory setting. It's underdetermined only to the extent PD is underdetermined, in that you should still defect against CooperationBots or DefectBots, etc.

comment by wedrifid · 2011-11-17T03:44:08.332Z · LW(p) · GW(p)

So the answer to "What should you maximize?" is, by convention, "Your own expected utility, period."

I'm a little less confident with the period for 'expected' but that is a whole different philosophical issue to the one important here!

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-11-17T03:48:18.308Z · LW(p) · GW(p)

I believe the issue discussed in the post doesn't exist, and only appears to be present because of the confusion described in my comment. [Edit: I believe this no longer, see the edit to the original comment.]

(I'm actually not sure what you refer to by "that philosophical issue", "one issue discussed here" and what you are less confident about.)

Replies from: wedrifid
comment by wedrifid · 2011-11-17T04:02:29.156Z · LW(p) · GW(p)

I believe the issue discussed here doesn't exist, and only appears to be present because of the confusion I discussed in my comment.

It is not absolutely determined that finding that multiplying the probability of universe-state by the value of it is what must be done period. Another relationship between probabilities, values for states of the universe and behavior could actually be legitimate. I noted that this is an obscure philosophical question that is not intended to detract from your point.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-11-17T04:18:49.595Z · LW(p) · GW(p)

Right; since probabilities (and expected utility axioms) break in some circumstances (for decision-theoretic purposes), expected utility of the usual kind isn't fundamental, but its role seems to be.

(I did anticipate this objection/clarification, see the parenthetical about utility failing to factor as expectation of a utility function...)

comment by Grognor · 2011-11-16T03:32:32.809Z · LW(p) · GW(p)

An interesting idea, but I'm afraid the idea is little more than interesting. Given all your premises, it does follow that compromise would be the optimal strategy, but I find some of them unlikely:

  • That there is a small, easily computable number of potential utility functions, like 10 as opposed to 10^(2^100)
  • I have qualms with the assumption that these computed utility functions are added. I would more readily accept them being mutually exclusive (e.g. one potential utility function is "absorb all other worlds" or "defect in all inter-species prisoner's games for deontological reasons")
  • Though I won't give it a probability estimate, I consider "humanity has worked out that it's very likely that a lot of alien worlds exist" to be a potential defeater.

If none of those complaints holds up, I still see no reason to be worried about the result. Why worry about getting a higher score?

comment by endoself · 2011-11-15T22:52:38.597Z · LW(p) · GW(p)

I think this result means that you understand the true prisoner's dilemma and acausal trade.

Replies from: amcknight, None, endoself
comment by amcknight · 2011-11-16T04:01:51.556Z · LW(p) · GW(p)

I'm having trouble finding anything about acausal trade. Any recommended readings?

Replies from: SilasBarta, endoself
comment by SilasBarta · 2011-11-16T23:57:36.417Z · LW(p) · GW(p)

I think acausal trade is just a special case of TDT-like decision theories, which consider "acausal consequences" of your decisions. That is, you reason in the following form, "If I were to output X in condition Y, so would all other sufficiently similar instantiations of me (including simulations). Therefore, in gauging the relative impact of my actions, I must also include the effect of all those instantiations outputting X."

"Sufficiently similar" includes "different but symmetric" conditions like those described here, i.e., where you have different utility functions, but are in the same position with respect to each other.

In this case, the "acausal trade" argument is that, since everyone would behave symmetrically to you, and you would prefer that everyone do the 3-utility option, you should do it yourself, because it would entail everyone else doing so -- even though your influence on the others is not causal.

Replies from: amcknight
comment by amcknight · 2011-11-17T19:43:02.797Z · LW(p) · GW(p)

Thanks! Is anything similar to acausal trade discussed anywhere outside of LessWrong? Coming up with the simplest case where acausal trade may be required seems like a thought experiment that (at least) philosophers should be aware of.

Replies from: SilasBarta
comment by SilasBarta · 2011-11-17T20:13:07.635Z · LW(p) · GW(p)

That I don't know, and I hope someone else (lukeprog?) fills it in with a literature review.

I do, however, want to add a clarification:

TDT-like decision theories are the justification for engaging in "acausal trade", while acausal trade itself refers to the actions you take (e.g. the 3-utility option) based on such justifications. (I blurred it a little by calling acausal trade a decision theory.)

Glad to have clarified the issue for and saved time for those who were wondering the same thing.

Replies from: None
comment by [deleted] · 2011-11-18T01:01:55.994Z · LW(p) · GW(p)

I've read all the literature on TDT that I can find, but I still find that I disagree with the people in this thread who claim that the compromise strategy is recommended by TDT in this problem.

Here is Yudkowsky's brief summary of TDT:

The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation. [...]

You treat your choice as determining the result of the logical computation, and hence all instantiations of that computation, and all instantiations of other computations dependent on that logical computation.

In the TDT pdf document, he also says:

Nonetheless, as external observers, we expect Andy8AM to correlate with AndySim, just as we expect calculators set to compute 678*987 to return the same answers at Mongolia and Neptune [...] We can organize this aspect of our uncertainty by representing the decisions of both Andy8AM and AndySim as connected to the latent node AndyPlatonic.

This refers to the idea that in a Pearlian causal graph, knowing the accurate initial physical state of two causally isolated but physically identical calculators, which are both poised to calculate 678x978, doesn’t (or shouldn’t) allow us to screen them off from each other and render them probabilistically independent. Knowing their physical state doesn’t imply that we know the answer to the calculation 678x978 – and if we press the “equals” button on one calculator and receive the answer 669186, this leads us to believe that this will be the answer displayed when we press the equals button on the other, causally isolated calculator.

Since knowing their initial physical state entirely does in fact cause us to screen off the two calculators in the causal graph, as such a graph would normally be drawn, we are led to conclude that the standard way of drawing a causal graph to represent this scenario is simply wrong. Therefore Yudkowsky includes another “latent” node with arcs to each of the calculator outputs, which represents the “platonic output” of the computation 678x987 (about which we are logically uncertain despite our physical knowledge of the calculators).

The latent node “AndyPlatonic” referred to by Yudkowsky in that quote is similar to the latent node representing the output of the platonic computation 678x987, except that in this case the computation is the computation implemented in an agent's brain that determines whether he takes one or two boxes, and the causal graph is the one used by a TDT-agent in Newcomb’s problem.

So on the one hand we have an abstract or platonic computation “678x987” which is very explicit and simple, then later on page 85 of the TDT document we are shown a causal graph which is similar except that “678x987” is replaced by a platonic computation of expected utility that occurs in a human brain, which is not made explicit and must be extremely complex. This still seems fair enough to me because despite the complexity of the computation, by specification in Newcomb’s problem Omega has access to a highly accurate physical model of the human agent so that the computation it performs is expected to be very similar (i.e. accurate with ~99% probability) to the computation implemented in the human agent’s brain.

On the other hand in the problem under discussion in this thread, it seems that that the similarity in computations implemented in the human brains and the alien brains is rather vague. Assuming that the human responsible for making the decision whether humanity implements the “selfish” 10-utilon strategy or the co-operative 3-utilon strategy is a TDT agent – because this is the winning way – I still don’t see why he would choose the 3-utilon strategy.

He has no reason to think that the aliens possess a highly accurate model of him and the computations that occur in his brain. Therefore, he should expect that the extremely complex computation occurring in his brain, which decides whether to choose the 10-utilon or the 3-utilon strategy, is not instantiated in the alien brains with anything remotely close to the probability that would be necessary for it to be optimal for him to implement the 3-utilon strategy.

It is not enough that the computation is similar in a very general way, because within that generality there is much opportunity for the output to differ. It might only take a few bits difference for the computation to determine a different choice of strategy. For example if the aliens happen to be causal decision theorists then they are bound to choose the selfish strategy.

In other words I don’t see why “sufficient similarity” should hold in this case. It seems to me that the type of computation in question (determining the choice of strategy) is inevitably extremely complex – not comparable to 678x978. There is only good reason to expect such a complex computation to be instantiated predictably (i.e. with high probability) in any particular other location in the Universe if there is a powerful optimisation process (such as Omega) attempting to and capable of realising that goal. In this case there is not.

I therefore conclude that anyone advocating that humans implement the 3-utilon strategy in this problem is mistaken.

comment by endoself · 2011-11-16T05:08:25.325Z · LW(p) · GW(p)

The links from http://wiki.lesswrong.com/wiki/Decision_theory should cover most of the main ideas. There are both more basic and more advanced ones, so you can read as many as appropriate to your current state of knowledge. It's not all relevant, but most of what is relevant is at least touched on there.

comment by [deleted] · 2011-11-15T22:57:29.544Z · LW(p) · GW(p)

Or rather: to understand this result means that you understand acausal trade. To agree with this result requires that you agree with the idea of acausal trade, as well.

Replies from: endoself
comment by endoself · 2011-11-15T23:36:03.161Z · LW(p) · GW(p)

Yes, that is what I meant. Were you confused by my less rigourous style, are you trying to point out that one can understand acausal trade without agreeing with it, or are you asking for clarification for some other reason?

Replies from: None
comment by [deleted] · 2011-11-15T23:52:48.383Z · LW(p) · GW(p)

The second.

Replies from: endoself
comment by endoself · 2011-11-16T00:59:06.088Z · LW(p) · GW(p)

I apologize for any implications of condescension in my comment. I think you are wrong, but I encourage you to present your ideas, if you want to.

Replies from: None
comment by [deleted] · 2011-11-16T01:20:11.544Z · LW(p) · GW(p)

You... think it is impossible to understand acausal trade without agreeing with it?

Replies from: endoself
comment by endoself · 2011-11-16T02:39:02.028Z · LW(p) · GW(p)

I think that acausal trade is a valid way of causing things to happen (I could have phrased that differently, but it is causation in the Pearlian sense). I think that this is somewhat value-dependent, so a general agent in reflective equilibrium need not care about acausal effects of its actions, but I think that, if it makes any sense to speak of a unique or near-unique reflective equilibrium for humans, it is very likely that almost all humans would agree with acausal trade in their reflective equilibria.

comment by endoself · 2011-11-16T14:30:26.925Z · LW(p) · GW(p)

Someone downvoted all my comments in this thread. This is the first time this has happened to me. I am not sure what exactly they meant to discourage. What is the proper procedure in this case?

Replies from: Viliam_Bur, XiXiDu
comment by Viliam_Bur · 2011-11-16T23:38:08.395Z · LW(p) · GW(p)

I did not vote, but here is the thing I disliked about your comments: You write shortly, without context, using a phrase "acausal trade" like it is supposed to mean something well-known though I never heard it before, and when amcknight asks for some directions, you post a link to a page that does not contain any of these words.

Based on this information, my guess is that you are intentionally cryptical (signalling deep wisdom), which I dislike, especially on a site like this.

The reason I did not downvote is because at this moment I do not trust my reasoning, because I am too tired. Also it seemed to me that your apology somehow made things worse; it's as if you admit that you are doing something wrong, but you continue doing it anyway. You seem to suggest that agreeing with "acausal trade" is somehow necessary if one understands what it means, and instead of explaining why (which could be interesting for readers) you just push the burden of proof away; in my opinion, since you have inroduced this phrase in this topic, the burden is obviously on you.

But this is just my impression, and the person who did downvote might have different reasons.

Replies from: endoself
comment by endoself · 2011-11-17T01:37:14.431Z · LW(p) · GW(p)

Thank you for this. Even if this is not why someone had a negative reaction toward me, I appreciate such feedback.

I am definitely not trying to be cryptic. There are a lot of posts about decision theory on LW going back a few years, which resulted in the (continuing) development of updateless decision theory. It is a fascinating subject and it is about, among other things, exactly the same topic that this post covered. I expect lesswrongers discussing decision theory to be aware of what has already been done on this website.

By your metric, I fear this may sound as dismissive as the rest of what I wrote. Does it?

Replies from: Viliam_Bur
comment by Viliam_Bur · 2011-11-20T20:24:21.266Z · LW(p) · GW(p)

I expect lesswrongers discussing decision theory to be aware of what has already been done on this website.

This is why Eliezer always uses hyperlinks, even when sometimes it seems strange. :D The LessWrong site is too big, and many people are not here from the beginning. With so many articles even people who seriously try to read the Sequences can miss a few ideas.

By your metric, I fear this may sound as dismissive as the rest of what I wrote. Does it?

No it doesn't. I feel I understand this comment completely.

Thanks for not being angry for my comment, because by standard metric it was impolite. Somehow I felt the information is more important... and I am happy you took it this way.

Replies from: endoself
comment by endoself · 2011-11-20T23:10:38.777Z · LW(p) · GW(p)

This is why Eliezer always uses hyperlinks, even when sometimes it seems strange. :D The LessWrong site is too big, and many people are not here from the beginning. With so many articles even people who seriously try to read the Sequences can miss a few ideas.

Thank you for this advise. I will definitely try to hyperlink a lot more in the future.

By your metric, I fear this may sound as dismissive as the rest of what I wrote. Does it?

No it doesn't. I feel I understand this comment completely.

There's a good chance I went back and edited a few things after writing this sentence. :)

Thanks for not being angry for my comment, because by standard metric it was impolite. Somehow I felt the information is more important... and I am happy you took it this way.

I think this type of feedback should be the norm here. It might just be me, but I think the number of LWers who would appreciate this type of constructive criticism is greater than the number who would be offended, especially after weighting based on commenting frequency.

Replies from: Viliam_Bur
comment by Viliam_Bur · 2011-11-22T18:21:32.692Z · LW(p) · GW(p)

This type of feedback can be invited explicitly in a comment. It was suggested that LW users should be able to invite it permanently through a user profile, but this suggestion was not implemented yet.

comment by XiXiDu · 2011-11-16T15:00:47.202Z · LW(p) · GW(p)

What is the proper procedure in this case?

If, upon reflection, you have no clue why you have been downvoted, then I suggest to ignore the information as noise and continue to to express your point (maybe more thoroughly in future, in case someone just misunderstood you). I would recommend to do this until someone explains why they think that you are wrong (at least if you don't value your karma score more than additional information on why you might be mistaken).

Replies from: endoself
comment by endoself · 2011-11-16T23:07:24.188Z · LW(p) · GW(p)

I think the ideas that I was expressing were rather representative of the LWers who think a lot about decision theory, so I don't expect to encounter someone who opposes them this strongly very often. I have a few theories that might explain why I was downvoted, but none are particularly probable and none give me reason to change my mind about decision theory.

comment by antigonus · 2011-11-15T20:56:18.064Z · LW(p) · GW(p)

I'm not sure if this qualifies as a mistake per se, but it seems very implausible to me that the only advanced civilization-enabling utility functions are altruistic towards aliens. Is there evidence in favor of that hypothesis?

Replies from: antigonus
comment by antigonus · 2011-11-15T21:24:45.511Z · LW(p) · GW(p)

Hmm, on second thought, I'm not sure this is a big deal. Even if the vast majority of civilization-enabling utility functions are xenophobic, we can still play PD with those that aren't. And if Everett is correct, there are presumably still lots of altruistic, isolated civilizations.

Replies from: Giles
comment by Giles · 2011-11-15T22:01:46.037Z · LW(p) · GW(p)

Sorry, yes - this is what I meant. I should have made that clearer.

comment by Incorrect · 2011-11-15T21:39:16.281Z · LW(p) · GW(p)

To make this more interesting, interpret it as a true prisoner's dilemma. I.E. the aliens care about something stupid like maximizing paperclips.

Replies from: Giles
comment by Giles · 2011-11-15T21:55:21.287Z · LW(p) · GW(p)

I consider this to be a true prisoner's dilemma already (basically, any prisoner's dilemma is true if it's written out in numbers and you believe that the numbers really capture everything). You can make it more paperclippy by substituting fi(Sj) = 0.

comment by [deleted] · 2011-11-20T23:32:35.214Z · LW(p) · GW(p)

Anyone else worried by this result, or have I made a mistake?

To update my reply to Silas Barta, after a little reflection I would say this:

The various species are supposed to possess common knowledge of each other's utility functions, and of each other's epistemic beliefs about how these utility functions can be satsified.

Since the various species' preferences are described by utility functions, we must assume that each species has self-modified collectively (or so the humans believe) such that they collectively obey the von Neumann-Morgenstern axioms - this eliminates much of the complexity that I had in mind when I wrote my reply to Barta.

However, one further item of common knowledge would be helpful: common knowledge of whether the various species are likely to be timeless decision theorists. If they possess this common knowledge, then the dilemma is just a disguised version of a standard Newcomblike problem: the agents possess common knowledge of all the relevant factors that might influence the abstract computation that they implement in determining which strategy to employ. This is no different to the scenario in which two AIs can read one another's source code - except in this case they do it by magic (the scenario is entirely ridiculous, I'm afraid). And in that case co-operation in the prisoner's dilemma is the optimal choice.

On the other hand if they don't possess common knowledge of whether they are TDT-agents (rather than CDT-agents for example) then whether it is wise for humans to defect or co-operate depends on their probability estimate regarding whether the aliens are mostly TDT-agents, and their estimates of the aliens' own estimates whether the other species are TDT-agents, et cetera. I don't really know how that infinite regress would be resolved by the humans, and your premises give us little way of knowing what these probability estimates might be.

comment by wedrifid · 2011-11-16T15:28:29.130Z · LW(p) · GW(p)

Anyone else worried by this result, or have I made a mistake?

This seems correct.

comment by [deleted] · 2011-11-15T22:09:52.227Z · LW(p) · GW(p)

If some far away world implements utility function f2 and strategy S2, I intuitively feel like I ought to care more about f2(S2) than about f0(S2), even if my own utility function is f0. Provided, of course, that S2 doesn't involve destroying my part of the world.

Replies from: Giles
comment by Giles · 2011-11-15T22:12:08.945Z · LW(p) · GW(p)

Out of interest, have you read the Three Worlds Collide story?

Replies from: None
comment by [deleted] · 2011-11-15T22:49:20.725Z · LW(p) · GW(p)

I have, and my intuitive feeling of the "right thing to do" is similar there as well: I have no problem with leaving the baby-eating aliens alone, with some qualifications to the effect of assuming they are not somehow mistaken about their utility function.

comment by summerstay · 2011-11-16T19:24:37.361Z · LW(p) · GW(p)

I don't expect that humans, on meeting aliens, would try to impose our ethical standards on them. We generally wouldn't see their minds as enough like ours to see their pain as real pain. The reason I think this is that very few people think we should protect all antelopes from lions, or all dolphins from sharks. So the babyeater dillemma seems unrealistic to me.

Replies from: TimS
comment by TimS · 2011-11-17T02:06:11.792Z · LW(p) · GW(p)

A person who decides not to save a deer from a wolf has committed no moral failing. But a person does commit an immoral choice by deciding not to save a human from the wolf. Both deer and human feel pain, so I think a better understanding is that only individual creatures that can (or potentially could) think recursively are entitled to moral weight.

If aliens can think recursively, then that principle states that a human would make an immoral choice not to save an alien from the wolf. If we ran into an alien species that disagreed with that principle (e.g. the Babyeaters), wouldn't we consider them immoral?

Replies from: summerstay, wedrifid, summerstay
comment by summerstay · 2011-11-18T12:32:27.988Z · LW(p) · GW(p)

Maybe the antelope was a bad example because they aren't intelligent enough or conscious in the right way to deserve our protection. So let's limit the discussion to dolphins. There are people who believe that humans killing dolphins is murder, that dolphins are as intelligent as people, just in a different way. Whether or not you agree with them, my point is that even these people don't advocate changing how the dolphins live their lives, only that we as humans shouldn't harm them. I imagine our position with aliens would be similar: for humans to do them harm is morally wrong for humans, but they have their own way of being and we should leave them to find their own way.

comment by wedrifid · 2011-11-17T03:31:28.317Z · LW(p) · GW(p)

Thinking recursively sounds like the wrong word for a concept that you are trying to name. My computer programs can think recursively. It wouldn't surprise me if certain animals could too, with a sufficiently intelligent researcher to come up with tests.

Replies from: smk, TimS
comment by smk · 2011-11-19T09:57:44.114Z · LW(p) · GW(p)

Into the silence of Harry's spirit where before there had never been any voice but one, there came a second and unfamiliar voice, sounding distinctly worried:

"Oh, dear. This has never happened before..."

What?

"I seem to have become self-aware."

WHAT?

There was a wordless telepathic sigh. "Though I contain a substantial amount of memory and a small amount of independent processing power, my primary intelligence comes from borrowing the cognitive capacities of the children on whose heads I rest. I am in essence a sort of mirror by which children Sort themselves. But most children simply take for granted that a Hat is talking to them and do not wonder about how the Hat itself works, so that the mirror is not self-reflective. And in particular they are not explicitly wondering whether I am fully conscious in the sense of being aware of my own awareness."

-Harry Potter and the Methods of Rationality

If any snake a Parselmouth had talked to, could make other snakes self-aware by talking to them, then...

Then...

Harry didn't even know why his mind was going all "then... then..." when he knew perfectly well how the exponential progression would work, it was just the sheer moral horror of it that was blowing his mind.

And what if someone had invented a spell like that to talk to cows?

What if there were Poultrymouths?

Or for that matter...

Harry froze in sudden realization just as the forkful of carrots was about to enter his mouth.

That couldn't, couldn't possibly be true, surely no wizard would be stupid enough to do THAT...

-Harry Potter and the Methods of Rationality

I suppose these two quotes might just be referring to a confused idea that Eliezer only put in his story for fun... but then again maybe not?

comment by TimS · 2011-11-17T13:25:57.266Z · LW(p) · GW(p)

I'm trying to label the capacity of humans to create proofs like Godel's incompleteness proofs or the halting problem. Cats and cows cannot create proofs like these, and it doesn't seem to be a shortfall in intelligence.

Is there a better label you would suggest?

Replies from: None
comment by [deleted] · 2011-11-18T06:52:17.468Z · LW(p) · GW(p)

What makes those proofs any different from proofs of other mathematical theorems? I imagine that the halting problem, in particular, would not be beyond the capability of some existing automated theorem prover, assuming you could encode the statement; its proof isn't too involved.

If your argument is that humans understand these proofs because of some magical out-of-the-box-thinking ability, then I am skeptical.

comment by summerstay · 2011-11-18T12:41:40.929Z · LW(p) · GW(p)

Dolphins do in fact engage in infanticide, among other behaviors we would consider evil if done by a human. But no one suggests we should be policing them to keep this from happening.

comment by rwallace · 2011-11-16T03:15:03.831Z · LW(p) · GW(p)

You are worried that, given your assumptions, civilizations might not be willing to pay an extremely high price to do things that aliens would like if they knew about them, which they don't.

But one of your assumptions is that every civilization has a moral system that advocates attacking and enslaving everyone they meet who thinks differently from them.

It would be worrying if a slightly bad assumption led to a very bad conclusion, but a very bad assumption leading to a slightly bad conclusion doesn't strike me as particularly problematic.

comment by drnickbone · 2012-05-01T21:32:10.843Z · LW(p) · GW(p)

An interesting question. Some thoughts here:

  1. Does this type of reasoning mean it is a good idea to simulate lots of alien civilizations (across lots of different worlds), to see what utility functions emerge, and how frequently each type emerges?

  2. It seems like detailed simulation is quite a sensible strategy anyway, if we're utility trading (detailed enough to create conscious beings). We could plausibly assume that each utility function f(i) assigns positive utility to the aliens of type (i) existing in a world, as long as their welfare in that world exceeds an acceptable threshold. (For instance, if we imagine worlds with or without humans, then we tend to prefer the ones with, unless they are being horribly tortured etc..) So by simulating alien species (i), and checking that they generally prefer to exist (rather than trying to commit suicide) we are likely doing them a favour according to f(i), and we can assume that since our TDT decision is linked to theirs, we are increasing the number of worlds humans exist in too.

I'm intrigued by the idea that TDT leads to a converged "average utilitity" function, across all possible worlds with TDT civilizations...

comment by amcknight · 2011-11-16T21:31:16.480Z · LW(p) · GW(p)

If you knew something about the expected development process of potential alien civilizations and could use that information to estimate a probability of them defecting in case like this, then which utility functions would you include in your set of aliens to cooperate with? Roughly, should you cooperate with each civilization proportional to your expectation of each civilization cooperating? Also, should you cooperate proportional to the number of expected civilizations implementing each utility function?

This seems unavoidable to me, as long as you first except "acausal trade", which it seems you must accept if you accept Pearl-style causality and additive consequentialism.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-11-18T19:17:02.498Z · LW(p) · GW(p)

If aliens cooperate in a way independent on your decision, you should defect. Only if they cooperate conditional on your cooperation, it might make sense to cooperate. That is, who cooperates unconditionally is irrelevant. (Which of these do you mean? I can't tell, taken literally you seem to be talking about unconditional cooperation.)

Replies from: amcknight
comment by amcknight · 2011-11-18T21:57:23.090Z · LW(p) · GW(p)

What I should have been saying is that the ones that cooperate conditionally are the ones that would matter. (I wasn't even thinking about conditional and unconditional cooperation, at the time.)

comment by jimrandomh · 2011-11-15T22:21:04.311Z · LW(p) · GW(p)

Humanity has also run a large number of simulations of how alien worlds evolve. It has determined that of those civilizations which reach the same level of advancement - that know their own utility function and have a strategy for optimizing it - there is an equal probability that they will end up with each of 10 possible utility functions.

No, humanity isn't going to do that. We'd be exposing ourselves to blackmail from any simulated world whose utility function had certain properties -- it'd be summoning a basilisk. For humanity's utility function in particular, there is an asymmetry such that the potential losses from acausal trade dramatically outweigh the potential gains.

comment by [deleted] · 2011-11-15T22:01:55.248Z · LW(p) · GW(p)

Assuming that TDT doesn't apply, the fact that we would be in a prisoner's dilemma is irrelevant. The only rational option for humanity would be to defect by maximising its local utility - whether humans defect or cooperate in the dilemma has no effect on what the aliens choose to do.

So really the problem is only interesting from a timeless decision theory perspective (I feel that you might have made this more explicit in your post).

According to my sketchy understanding of TDT, if in a prisoner's dilemma both parties can see the other's source code, or otherwise predict the other's behaviours with extreme accuracy then it is rational for them to co-operate because it is physically possible for each of them to be almost certain that the other will co-operate, and they would be able to tell if the other were to suddenly change his mind.

I don't see how this generalises to the situation that you have described. If anyone thinks that TDT is in fact relevant to this problem, I would be interested in hearing their (lucid) explanation why this is so.

comment by TimS · 2011-11-15T21:29:00.731Z · LW(p) · GW(p)

They are altruistic, in the sense that they care just as much about far-away aliens that they can't even see as they do about members of their own species.

comment by billswift · 2011-11-16T08:16:55.013Z · LW(p) · GW(p)

none of the players has any causal influence on any other.

An aggregate score without any causal influence is meaningless. Without influence on each other, each should pursue its own best interest, not some meaningless compromise solution.

Replies from: wedrifid
comment by wedrifid · 2011-11-16T15:38:17.480Z · LW(p) · GW(p)

How many boxes do you take in Newcomb's problem?

Replies from: billswift
comment by billswift · 2011-11-16T17:39:55.586Z · LW(p) · GW(p)

One. Either he can predict my actions and is honest and I get a million dollars, or he can't and I don't lose anything, but don't get a thousand dollars, and get to laugh at him. (Note that I think one of the reasons the problem is bogus is you are restricted by the conditions of the problem from considering any gain (ie, get to laugh at him) except the cash, which is too unrealistic to matter.) (Also note that this is from memory and I think Newcomb's and similar problems are bogus enough that I haven't read any posts, or anything else, about them in well over a year.)

Replies from: wedrifid
comment by wedrifid · 2011-11-16T17:46:05.863Z · LW(p) · GW(p)

So your objection (correct me if I am wrong) is that it makes no sense to value what the other aliens do because what they do doesn't effect you in any way. You don't have a problem with acting as if your own behavior determines what the aliens do, given that they decide their actions based on a reliable prediction of what you will do. You just don't care. You reject the premise about your own utility function depending on them as a silly utility function.

Replies from: billswift
comment by billswift · 2011-11-16T20:56:53.252Z · LW(p) · GW(p)

Your first sentence is a fair description.

As your second sentence admits, What they do is their decision. Letting their decision influence me in the way you seem to support is no different than giving in to "emotional blackmail".

Your final sentence makes no sense, I cannot figure out what you mean by it.

Replies from: lessdazed, wedrifid
comment by lessdazed · 2011-11-16T23:25:23.815Z · LW(p) · GW(p)

Your final sentence makes no sense, I cannot figure out what you mean by it.

You reject premise 1), where premise 1) is: your utility function depends on them, i.e."They are altruistic". You reject it because you think any utility function with premise 1) is silly.

comment by wedrifid · 2011-11-16T21:21:00.636Z · LW(p) · GW(p)

Letting their decision influence me in the way you seem to support

I didn't advocate one way or the other. Utility functions simply happened to be provided for us in the post so for me the question becomes one of abstract reasoning. But I certainly don't object to you mentioning an aversion to utility functions of the type given. It is a relevant contribution and a legitimate perspective.

Your final sentence makes no sense, I cannot figure out what you mean by it.

I read it, it makes sense. See above for more explanation.

is no different than giving in to "emotional blackmail".

Or simple trade. They have preferences about the future state of their local environment and preferences about the future state of your local environment (for whatever unspecified reason). You are in a symmetrical situation. You discover that you can get an overall higher utility by doing some slightly worse things in your local environment in exchange for them doing some more preferred things where they live. This doesn't exclude them outright synthesizing humans!

is no different than giving in to "emotional blackmail".

Or trade.

comment by timtyler · 2011-11-15T20:49:39.275Z · LW(p) · GW(p)

Anyone else worried by this result, or have I made a mistake?

Surely only utilitarians would be concerned by it. Others will just reject the "altruistic" assumption as being terribly unrealistic.

Replies from: None
comment by [deleted] · 2011-11-15T23:11:52.485Z · LW(p) · GW(p)

Anyone can reject the assumption as unrealistic. I don't see what utilitarianism has to do with that.

Replies from: timtyler
comment by timtyler · 2011-11-16T12:18:44.866Z · LW(p) · GW(p)

Many utilitarians claim that they aspire to be to be altruistic - in the sense that they would claim to "care just as much about far-away aliens that they can't even see as they do about members of their own species".

Possibly there are others that say they aspire to this too - but it is a pretty odd thing to want. Such selflessness makes little biological sense. It looks like either an attempt at a "niceness" superstimulus (though one that is rather hampered by a lack of plausibility) - or the result of memetic manipulation, probably for the benefit of others. Those are currently my two best guesses for explaining the existence of utilitarianism.

Replies from: None
comment by [deleted] · 2011-11-18T06:55:18.097Z · LW(p) · GW(p)

Ah, I think my mistake was assuming utilitarianism meant something reasonable along the lines of consequentialism (as opposed to belief in a specific and somewhat simple utility function). I thought I already knew what it meant, you see, so I didn't see the need to click on your link.

comment by DanielLC · 2011-11-16T01:58:16.903Z · LW(p) · GW(p)

My thought is that there's no reason to believe the humans are right over any of the other groups. One person was born with one mind, another with another. There's no reason to pick one mind. As such, I'd pick the compromise, even if we additionally worked out that the other aliens wouldn't try the compromise.

comment by amcknight · 2011-11-15T22:24:45.951Z · LW(p) · GW(p)

I hope we don't need to worry about that. It's odd because you and the aliens can never know whether the others are defecting. It reminds me of Simulation Warfare (pdf) where you can change the odds that you are in a simulation just by choosing to simulate universes like yours and also change the nature of your universe by simulating types of universes you want to be in that are compatible with your universe so far. If your argument works, we would be retroactively putting ourselves into a cooperating universe... basically "causing" things to happen outside of our observable universe... (EDIT: removed expressions of future shock)