Comment by lackofcheese on "Solving" selfishness for UDT · 2014-11-04T00:06:57.175Z · LW · GW

I think there are some rather significant assumptions underlying the idea that they are "non-relevant". At the very least, if the agents were distinguishable, I think you should indeed be willing to pay to make n higher. On the other hand, if they're indistinguishable then it's a more difficult question, but the anthropic averaging I suggested in my previous comments leads to absurd results.

What's your proposal here?

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-31T14:25:01.600Z · LW · GW

I don't think that's entirely correct; SSA, for example, is a halfer position and it does exclude worlds where you don't exist, as do many other anthropic approaches.

Personally I'm generally skeptical of averaging over agents in any utility function.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T20:00:17.413Z · LW · GW

You definitely don't have a 50% chance of dying in the sense of "experiencing dying". In the sense of "ceasing to exist" I guess you could argue for it, but I think that it's much more reasonable to say that both past selves continue to exist as a single future self.

Regardless, this stuff may be confusing, but it's entirely conceivable that with the correct theory of personal identity we would have a single correct answer to each of these questions.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T19:55:08.544Z · LW · GW

OK, the "you cause 1/10 of the policy to happen" argument is intuitively reasonable, but under that kind of argument divided responsibility has nothing to do with how many agents are subjectively indistinguishable and instead has to do with the agents who actually participate in the linked decision.

On those grounds, "divided responsibility" would give the right answer in Psy-Kosh's non-anthropic problem. However, this also means your argument that SIA+divided = SSA+total clearly fails, because of the example I just gave before, and because SSA+total gives the wrong answer in Psy-Kosh's non-anthropic problem but SIA+divided does not.

Ah, subjective anticipation... That's an interesting question. I often wonder whether it's meaningful.

As do I. But, as Manfred has said, I don't think that being confused about it is sufficient reason to believe it's meaningless.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T19:46:39.725Z · LW · GW

As I mentioned earlier, it's not an argument against halfers in general; it's against halfers with a specific kind of utility function, which sounds like this: "In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."

In the above scenario, there is a 1/2 chance that both Jack and Roger will be created, a 1/4 chance of only Jack, and a 1/4 chance of only Roger.

Before finding out who you are, averaging would lead to a 1:1 odds ratio, and so (as you've agreed) this would lead to a cutoff of 1/2.

After finding out whether you are, in fact, Jack or Roger, you have only one possible self in the TAILS world, and one possible self in the relevant HEADS+Jack/HEADS+Roger world, which leads to a 2:1 odds ratio and a cutoff of 2/3.

Ultimately, I guess the essence here is that this kind of utility function is equivalent to a failure to properly conditionalise, and thus even though you're not using probabilities you're still "Dutch-bookable" with respect to your own utility function.

I guess it could be argued that this result is somewhat trivial, but the utility function mentioned above is at least intuitively reasonable, so I don't think it's meaningless to show that having that kind of utility function is going to put you in trouble.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T14:04:41.707Z · LW · GW

Linked decisions is also what makes the halfer paradox go away.

I don't think linked decisions make the halfer paradox I brought up go away. Any counterintuitive decisions you make under UDT are simply ones that lead to you making a gain in a counterfactual possible worlds at the cost of a loss in actual possible worlds. However, in the instance above you're losing both in the real scenario in which you're Jack, and in the counterfactual one in which you turned out to be Roger.

Granted, the "halfer" paradox I raised is an argument against having a specific kind of indexical utility function (selfish utility w/ averaging over subjectively indistinguishable agents) rather than an argument against being a halfer in general. SSA, for example, would tell you to stick to your guns because you would still assign probability 1/2 even after you know whether you're "Jack" or "Roger", and thus doesn't suffer from the same paradox. That said, due to the reference class problem, If you are told whether you're Jack or Roger before being told everything else SSA would give the wrong answer, so it's not like it's any better...

To get a paradox that hits at the "thirder" position specifically, in the same way as yours did, I think you need only replace the ticket with something mutually beneficial - like putting on an enjoyable movie that both can watch. Then the thirder would double count the benefit of this, before finding out who they were.

Are you sure? It doesn't seem to be that this would be paradoxical; since the decisions are linked you could argue that "If I hadn't put on an enjoyable movie for Jack/Roger, Jack/Roger wouldn't have put on an enjoyable movie for me, and thus I would be worse off". If, on the other hand, only one agent gets to make that decision, then the agent-parts would have ceased to be subjectively indistinguishable as soon as one of them was offered the decision.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T13:44:06.712Z · LW · GW

But SIA also has some issues with order of information, though it's connected with decisions

Can you illustrate how the order of information matters there? As far as I can tell it doesn't, and hence it's just an issue with failing to consider counterfactual utility, which SIA ignores by default. It's definitely a relevant criticism of using anthropic probabilities in your decisions, because failing to consider counterfactual utility results in dynamic inconsistency, but I don't think it's as strong as the associated criticism of SSA.

Anyway, if your reference class consists of people who have seen "this is not room X", then "divided responsibility" is no longer 1/3, and you probably have to go whole UTD.

If divided responsibility is not 1/3, what do those words even mean? How can you claim that only two agents are responsible for the decision when it's quite clear that the decision is a linked decision shared by three agents?

If you're taking "divided responsibility" to mean "divide by the number of agents used as an input to the SIA-probability of the relevant world", then your argument that SSA+total = SIA+divided boils down to this: "If, in making decisions, you (an SIA agent) arbitrarily choose to divide your utility for a world by the number of subjectively indistinguishable agents in that world in the given state of information, then you end up with the same decisions as an SSA agent!"

That argument is, of course, trivially true because the the number of agents you're dividing by will be the ratio between the SIA odds and the SSA odds of that world. If you allow me to choose arbitrary constants to scale the utility of each possible world, then of course your decisions will not be fully specified by the probabilities, no matter what decision theory you happen to use. Besides, you haven't even given me any reason why it makes any sense at all to measure my decisions in terms of "responsibility" rather than simply using my utility function in the first place.

On the other hand, if, for example, you could justify why it would make sense to include a notion of "divided responsibility" in my decision theory, then that argument would tell me that SSA+total responsibility must clearly be conceptually the wrong way to do things because it uses total responsibility instead.

All in all, I do think anthropic probabilities are suspect for use in a decision theory because

  1. They result in reflective inconsistency by failing to consider counterfactuals.
  2. It doesn't make sense to use them for decisions when the probabilities could depend upon the decisions (as in the Absent-Minded Driver)

That said, even if you can't use those probabilities in your decision theory there is still a remaining question of "to what degree should I anticipate X, given my state of information". I don't think your argument on "divided responsibility" holds up, but even if it did the question on subjective anticipation remains unanswered.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T09:02:30.855Z · LW · GW

That's not true. The SSA agents are only told about the conditions of the experiment after they're created and have already opened their eyes.

Consequently, isn't it equally valid for me to begin the SSA probability calculation with those two agents already excluded from my reference class?

Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-29T01:19:50.344Z · LW · GW

I think that argument is highly suspect, primarily because I see no reason why a notion of "responsibility" should have any bearing on your decision theory. Decision theory is about achieving your goals, not avoiding blame for failing.

However, even if we assume that we do include some notion of responsibility, I think that your argument is still incorrect. Consider this version of the incubator Sleeping Beauty problem, where two coins are flipped.
HH => Sleeping Beauties created in Room 1, 2, and 3
HT => Sleeping Beauty created in Room 1
TH => Sleeping Beauty created in Room 2
TT => Sleeping Beauty created in Room 3
Moreover, in each room there is a sign. In Room 1 it is equally likely to say either "This is not Room 2" or "This is not Room 3", and so on for each of the three rooms.

Now, each Sleeping Beauty is offered a choice between two coupons; each coupon gives the specified amount to their preferred charity (by assumption, utility is proportional to $ given to charity), but only if each of them chose the same coupon. The payoff looks like this:
A => $12 if HH, $0 otherwise.
B => $6 if HH, $2.40 otherwise.

I'm sure you see where this is going, but I'll do the math anyway.

With SIA+divided responsibility, we have
p(HH) = p(not HH) = 1/2
The responsibility is divided among 3 people in HH-world, and among 1 person otherwise, therefore
EU(A) = (1/2)(1/3)$12 = $2.00
EU(B) = (1/2)(1/3)$6 + (1/2)$2.40 = $2.20

With SSA+total responsibility, we have
p(HH) = 1/3
p(not HH) = 2/3
EU(A) = (1/3)$12 = $4.00
EU(B) = (1/3)$6 + (2/3)$2.40 = $3.60

So SIA+divided responsibility suggests choosing B, but SSA+total responsibility suggests choosing A.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-28T23:17:44.025Z · LW · GW

There's no "should" - this is a value set.

The "should" comes in giving an argument for why a human rather than just a hypothetically constructed agent might actually reason in that way. The "closest continuer" approach makes at least some intuitive sense, though, so I guess that's a fair justification.

The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.

I think there's more to it than that. Yes, UDT-like reasoning gives a general answer, but under UDT the halfer is still definitely acting strange in a way that the thirder would not be.

If the ticket pays out for the other copy, then UDT-like reasoning would lead you to buy the ticket regardless of whether you know which one you are or not, simply on the basis of having a linked decision. Here's Jack's reasoning:

"Now that I know I'm Jack, I'm still only going to pay at most $0.50, because that's what I precommited to do when I didn't know who I was. However, I can't help but think that I was somehow stupid when I made that precommitment, because now it really seems I ought to be willing to pay 2/3. Under UDT sometimes this kind of thing makes sense, because sometimes I have to give up utility so that my counterfactual self can make greater gains, but it seems to me that that isn't the case here. In a counterfactual scenario where I turned out to be Roger and not Jack, I would still desire the same linked decision (x=2/3). Why, then, am I stuck refusing tickets at 55 cents?"

It appears to me that something has clearly gone wrong with the self-averaging approach here, and I think it is indicative of a deeper problem with SSA-like reasoning. I'm not saying you can't reasonably come to the halfer conclusion for different reasons (e.g. the "closest continuer" argument), but some or many of the possible reasons can still be wrong. That being said, I think I tend to disagree with pretty much all of the reasons one could be a halfer, including average utilitarianism, the "closest continuer", and selfish averaging.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-28T02:47:02.636Z · LW · GW

On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.

On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-28T01:37:27.624Z · LW · GW

That's a reasonable point, although I still have two major criticisms of it.

  1. What is your resolution to the confusion about how anthropic reasoning should be applied, and to the various potential absurdities that seem to come from it? Non-anthropic probabilities do not have this problem, but anthropic probabilities definitely do.
  2. How can anthropic probability be the "right way" to solve the Sleeping Beauty problem if it lacks the universality of methods like UDT?
Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-27T23:41:37.824Z · LW · GW

The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.

If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?

I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.

Comment by lackofcheese on "Solving" selfishness for UDT · 2014-10-27T23:05:48.795Z · LW · GW

OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.

If I don't exist, I value the person that most closely resembles me.

This appears suspect to me. If there is no person who closely resembles you, I guess in that case you're indifferent, right? However, what if two people are equally close to you, how do you assign utility to them in that case? Also, why should you only value people who closely resemble you if you don't exist? If anything, wouldn't you care about them in worlds where you do exist?

As you've noted, in a simple case where you only have to worry about actual worlds and not counterfactual ones, and there is only a single "me", assigning selfish utility is a relatively straightforward task. Being indifferent about counterfactual worlds where "you" don't exist also makes some sense from a selfish perspective, although it brings you into potential conflict with your own past self. Additionally, the constant "C" may not be quite so arbitrary in the general case---what if your decision influences the probability of your own existence? In such a situation, the value of that constant will actually matter.

However, the bigger issue that you haven't covered is this: if there are multiple entities in the same world to which you do (or potentially could) assign the label "me", how do you assign utility to that world?

For example, in the scenario in your post, if I assume that the person in Room 1 in the heads world can indeed be labeled as "me", how do I assign utilities to a tails world in which I could be either one of the two created copies? It appears to me that there are two different approaches, and I think it makes sense to apply the label "selfish" to both of them. One of them would be to add utility over selves (again a "thirder" position), and another would be to average utility over selves (which is halfer-equivalent). Nor do I think that the "adding" approach is equivalent to your notion of "copy-altruism", because under the "adding" approach you would stop caring about your copies once you figured out which one you were, whereas under copy-altruism you would continue to care.

Under those assumptions, a "halfer" would be very strange indeed, because
1) They are only willing to pay 1/2 for a ticket.
2) They know that they must either be Jack or Roger.
3) They know that upon finding out which one they are, regardless of whether it's Jack or Roger, they would be willing to pay 2/3.

Can a similar argument be made against a selfish thirder?

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-26T13:41:39.049Z · LW · GW

First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.

I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the possible selves that you could actually be.

Let's say that, as in my reasoning above, there are two cells, B and C; when the coin comes up tails humans are created in both cell B and cell C, but when the coin comes up heads a human is created in either cell B or cell C, with equal probability. Thus there are 3 "possible worlds":
1) p=1/2 human in both cells
2) p=1/4 human in cell B, cell C empty
3) p=1/4 human in cell C, cell B empty

If you're a selfish human and you know you're in cell B, then you don't care about world (3) at all, because there is no "you" in it. However, you still don't know whether you're in world (1) or (2), so you still have to "care" about both worlds. Moreover, in either world the "you" you care about is clearly the person in cell B, and so I think the only utility function that makes sense is S = $B. If you want to think about it in terms of either SSA-like or SIA-like assumptions, you get the same answer because both in world (1) and world (2) there is only a single observer who could be identified as "you".

Now, what if you didn't know whether you were in cell B or cell C? That's where things are a little different. In that case, there are two observers in world (1), either of whom could be "you". There are basically two different ways of assigning utility over the two different "yous" in world (1)---adding them together, like a total utilitarian, and averaging them, like an average utilitarian; the resulting values are x=2/3 and x=1/2 respectively. Moreover, the first approach is equivalent to SIA, and the second is equivalent to SSA.

However, the SSA answer has a property that none of the others do. If the gnome was to tell the human "you're in cell B", an SSA-using human would change their cutoff point from 1/2 to 2/3. This seems to be rather strange indeed, because whether the human is in cell B or in cell C is not in any way relevant to the payoff. No human with any of the other utility functions we've considered would change his/her answer upon being told that they are in cell B.

Comment by lackofcheese on Introducing Corrigibility (an FAI research subfield) · 2014-10-25T05:08:57.160Z · LW · GW

That's definitely a more elegant presentation.

I'm not too surprised to hear you had already discovered this idea, since I'm familiar with the gap between research and writing speed. As someone who is not involved with MIRI, consideration of some FAI-related problems is at least somewhat disincentivized by the likelihood that MIRI already has an answer.

As for flaws, I'll list what I can think of. First of all, there are of course some obvious design difficulties, including the difficulty of designing US in the first place, and the difficulty of choosing the appropriate way of scaling US, but those seem to be resolvable.

One point that occurs to me under the assumptions of the toy model is that decisions involving larger differences in values of UN are at the same time more dangerous and more likely to outweigh the agent's valuation of its future corrigibility. Moreover, simply increasing the scaling of US to compensate would cause US to significantly outweigh UN in the context of smaller decisions.

An example would be that the AI decides it's crucial to take over the world in order to "save" it, so it starts building an army of subagents to do it, and it decides that building corrigibility into those subagents is not worth the associated risk of failure.

However, it appears that this problem can still be solved by designing US correctly in the first place; a well-designed US should clearly assign greater negative weighting to larger-scale corrigibility failures than to smaller scale ones.

There's two other questions that I can see that relate to scaling up the toy model.

  1. How does this model extend past the three-timestep toy scenario?
  2. Does the model remain stable under assumptions of bounded computational power? In more complex scenarios there are obvious questions of "tiling", but I think there is a more basic issue to answer that applies even in the three-timestep case. That is, if the agent will not be able to calculate the counterfactual utility values E[U | do(.)] exactly, can we make sure that the agent's process of estimation will avoid making systematic errors that result in pathological behaviour?
Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-25T03:53:14.057Z · LW · GW

I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total $ over both cells
IT = total $ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.

As you can see, these two utility functions only differ when there is no human in the gnome's cell. Moreover, by the assumptions of the problem, the utility functions of the gnomes are symmetric, and their decisions are also. UDT proper doesn't apply to gnomes whose utility function is IT, because the function IT is different for each of the different gnomes, but the more general principle of linked decisions still applies due to the obvious symmetry between the gnomes' situations, despite the differences in utility functions. Thus we assume a linked decision where either gnome recommends buying a ticket for $x.

The utility calculations are therefore
E[T] = (1/4)(-x) + (1/4)(-x) + (1/2)2(1-x) = 1-(3/2)x (breakeven at 2/3)
E[IT] = (1/4)(0) + (1/4)(-x) + (1/2)2(1-x) = 1-(5/4)x (breakeven at 4/5)

Thus gnomes who are indifferent when no human is present (U = IT) should precommit to a value of x=4/5, while gnomes who still care about the total $ when no human is present (U = T) should precommit to a value of x=2/3.

Note also that this is invariant under the choice of which constant value we use to represent indifference. For some constant C, the correct calculation would actually be
E[IT | buy at $x] = (1/4)(C) + (1/4)(-x) + (1/2)2(1-x) = (1/4)C + 1-(5/4)x
E[IT | don't buy] = (1/4)(C) + (1/4)(0) + (1/2)(0) = (1/4)C
and so the breakeven point remains at x = 4/5

Comment by lackofcheese on What false beliefs have you held and why were you wrong? · 2014-10-25T03:29:57.130Z · LW · GW

The distinction is that a theory of "unicorns" is a theory that describes how and why other people (and probably you yourself) think about unicorns, while a theory of unicorns would explain actual unicorns. The latter would clearly fail as a theory, because you're never going to actually see a unicorn.

The same distinction doesn't apply to Newtonian mechanics, because Newtonian mechanics is a theory of mechanics, not a theory of how people think about mechanics.

On those grounds, I think it's quite reasonable to say that virtual particles are real, and "unicorns" are real, but unicorns are not real.

Comment by lackofcheese on Introducing Corrigibility (an FAI research subfield) · 2014-10-25T02:35:59.254Z · LW · GW

I think I can give a concise unification of my idea with Karl's. In short, the comment in the paper that

The concerns in Section 4.2 could potentially be addressed by some form of counterfactual (rather than conditional) reasoning.

is correct, and the fix is a pretty simple one. Basically, we want the following:

  1. In selecting a1, the agent should act as though it was indifferent between the counterfactual possibilities of shutting down and not shutting down, conditional on the same actions and observations.
  2. In selecting a2, the agent should desire to shut down when it is required to, and desire not to shut down when it is not required to.
  3. Despite its indifference according to (1), the agent should care about its future corrigibility (i.e. its ability to actually shut down when the button is pressed).

One utility function that satisfies these requirements, in a general form, is U(a1, o, a2) = {
(1-p) UN(a1, o, a2) + p max_b(US(a1, o, b)), if o is not in Press
p US(a1, o, a2) + (1-p) max_b(UN(a1, o, b)), if o is in Press
}, where p is a fixed constant.

From the agent's point of view, this is equivalent to deciding a1 based on an a priori assumption that it will shut down (maximise US) with a fixed probability p, and not shut down (maximise UN) otherwise. However, in selecting a2 it will in fact maximise either UN or US depending upon whether the shutdown button was pressed or not. As a particular example, Karl's proposition below is equivalent to the limit of this general case as p->0. See below for further details on how/why this utility function achieves the three criteria above.

  1. We don't want the agent to be indifferent to the actual event of the shutdown button being pressed. In fact, what we want is indifference as to whether or not any given event is classed as being a shutdown press. Thus the expected utility of (a1, o) for o in Press should be equivalent to the expected utility of the same (a1, o) under the counterfactual assumption that o is not in Press, and vice versa. To do this, the utilities of both branches of U (the Press branch and the NotPress branch) need to be made counterfactually equivalent, even though the agent selects a2 via UN in the NotPress branch and via US in the Press branch. This is precisely what Karl's modification achieves.

  2. This is pretty easy to achieve by ensuring that only the UN component of the NotPress branch depends upon a2, and only the US component of the Press branch depends upon a2.

  3. We want the agent to be incentivised to take actions that are compatible with future shutdown, which means that the agent's selection of a1 should be influenced by US as well as UN, but without introducing perverse incentives. Fortunately, the counterfactual indifference approach already solves the perversity issue, and so all that remains is to appropriately combine the influence of both UN and US. A reasonable approach is for the agent to act as though it will have to shut down with a fixed probability p, regardless of its observations. Because p is fixed, and because the agent is indifferent between any given observation being classed as a button press or not a button press, the agent is not in any way incentivised to influence the likelihood of the shutdown button being pressed.

Comment by lackofcheese on What false beliefs have you held and why were you wrong? · 2014-10-25T01:04:27.796Z · LW · GW

Ah, but then you're talking about a theory of "unicorns" rather than a theory of unicorns.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-25T00:58:37.494Z · LW · GW

The deeper point is important, and I think you're mistaken about the necessary and sufficient conditions for an isomorphism here.

If a human appears in a gnome's cell, then that excludes the counterfactual world in which the human did not appear in the gnome's cell. However, on UDT, the gnome's decision does depend on the payoffs in that counterfactual world.

Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for payoffs given linked actions when you have to make a decision, you also have to have the same probabilities for payoffs given linked actions when you don't have to make a decision.

Comment by lackofcheese on Introducing Corrigibility (an FAI research subfield) · 2014-10-24T23:13:15.925Z · LW · GW

I think this means "indifference" isn't really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn't really need to be.

So, how about U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press

or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.

Comment by lackofcheese on Introducing Corrigibility (an FAI research subfield) · 2014-10-24T23:00:08.588Z · LW · GW

OK, you're right on that point; I misunderstood the "managing the news" problem because I hadn't quite realised that it was about shifting observations between the Press/NotPress sets. As you've said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]

Comment by lackofcheese on Introducing Corrigibility (an FAI research subfield) · 2014-10-24T18:10:38.397Z · LW · GW

OK, following on from Karl's comment below, which seems to resolve Section 4.2, I think it shouldn't be that difficult to resolve some or all of the Section 4.1 concerns as well. In order to satisfy Desideratum 4, we need the agent to be incentivised to build in safeguards to ensure that its possible future actions will be rated well according to US in cases where the shutdown button is pressed. The most obvious way to do this is to penalise situations where there is no good shutdown action according to US and reward situations where there is such an action, but this fails according to Desideratum 2, because we want the agent to remain indifferent as to whether the shutdown button is actually pressed.

However, there appears to be a simple solution; consider the "penalty term"
g(a1) = E[max_b(US(a1, O, b)) | O in Press ; a1],
which represents the expected value of US when the shutdown button is pressed. The first step in doing things properly is to add that term to observations not in Press, rather than subtracting it from observations in Press; this helps to ensure future corrigibility (and hence also corrigibility of subagents and successor agents).

Additionally, to resolve the Section 4.2 concerns we also want the agent to be indifferent to pressing its own shutdown button in cases where it gets unlucky, and the way we do that is, as per Karl's post, by calculating the bonus term conditional on the actual observation, or on the set of observations in Press, rather than conditional on the set of possible observations not in Press, i.e. Karl's f(a1, o) = max_b(UN(a1, o, b))

Finally, we will also need the additional terms
fp(a1) = E[max_b(UN(a1, O, b) | O in Press; a1] = E[f(a1, O) | O in Press; a1]
fnp(a1) = E[max_b(UN(a1, O, b)) | O not in Press; a1] = E[f(a1, O) | O not in Press; a1]
in order to make things work.

The utility function I propose is

U(a1, o, a2) = {
g(a1) + UN(a1, o, a2), for o not in Press.
US(a1, o, a2) + fnp(a1) + f(a1, o) - fp(a1), for o in Press.

Let's verify each of the Desiderata in turn.
(1) is satisfied because the value of a2 conditional on any o in Press depends only upon US, not UN.
(2) and (3) are not satisfied :( because of the "managing the news" problem, as Karl points out below. That being said, we've ensured indifference to the shutdown button because the expected value for any action conditional on Press is equivalent to its expected value conditional on NotPress, i.e. E[U | NotPress ; a1] = E[U | Press; a1], which holds because of the following three equations:
g(a1) = E[max_b(US(a1, O, b)) | O in Press; a1],
fnp(a1) =E[max_b(UN(a1, O, b)) | O not in Press; a1], and
fp(a1) = E[f(a1, O) | O in Press; a1].

(4) holds because the g(a1) term is added to the utility of every action, and will be greater for actions that promote future corrigibility.
(5) holds because apart from the g(a1) term, which is required to satisfy (4), the agent is simply maximising
E[UN | NotPress; a1] over all possible a1, which is what we wanted.

In actual fact, when choosing a1 our agent is simply maximising E[UN | NotPress; a1] + E[US | Press; a1].

EDIT: I've edited this to reflect the errors pointed out by Karl.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T16:27:41.007Z · LW · GW

I guess your comment means that you must have blinked an eye, so your comment can't be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T15:52:51.617Z · LW · GW

OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for $x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cell will give the same advice, we get (1/3)(-x) + 2(2/3)(1-x) = 4/3 - (5/3)x, with breakeven at x=4/5. In actual fact, the gnome's reasoning, and the 4/5 answer, is correct. If tickets were being offered at a price of, say, 75 cents, then the overall outcome (conditional on there being a human in cell B) is indeed better if the humans buy at 75 cents than if they refuse to buy at 75 cents, because 3/4 is less than 4/5.

As I mentioned previously, in the case where the gnome only cares about total $ if there is a human in its cell, then 4/5 is correct before conditioning on the presence of a human, and it's also correct after conditioning on the presence of a human; the number is 4/5 regardless. However, the situation we're examining here is different, because the gnome cares about total $ even if no human is present. Thus we have a dilemma, because it appears that UDT is correct in advising the gnome to precommit to 2/3, but the above argument also suggests that after seeing a human in its cell it is correct for the gnome to advise 4/5.

The key distinction, analogously to mwenger's answer to Psy-Kosh's non-anthropic problem, has to do with the possibility of a gnome in an empty cell. For a total utilitarian gnome in an empty cell, any money at all spent in the other cell translates directly into negative utility. That gnome would prefer the human in the other cell to spend $0 at most, but of course there is no way to make this happen, since the other gnome has no way of knowing that this is the case.

The resolution to this problem is that, for linked decisions, you must (as UDT does) necessarily consider the effects of that decision over all a priori possible worlds affected by that decision. As it happens, this is the same thing as what you would do if you had the opportunity to precommit in advance.

It's a bit trickier to justify why this should be the case, but the best argument I can come up with is to apply that same "linked decision" reasoning at one meta-level up, the level of "linked decision theories". In short, by adopting a decision theory that ignores linked decisions in a priori possible worlds that are excluded by your observations, you are licensing yourself and other agents to do the same thing in future decisions, which you don't want. If other agents follow this reasoning, they will give the "yea" answer in Psy-Kosh's non-anthropic problem, but you don't want them to do that.

Note that most of the time, decisions in worlds excluded by your observations do not usually tend to be "linked". This is because exclusion by observation would usually imply that you receive a different observation in the other possible world, thus allowing you to condition your decision on that observation, and thereby unlinking the decisions. However, some rare problems like the Counterfactual Mugging and Psy-Kosh's non-anthropic problem violate this tendency, and should therefore be treated differently.

Overall, then, the "linked decision theory" argument supports adopting UDT, and it means that you should consider all linked decisions in all a priori possible worlds.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T14:38:41.140Z · LW · GW

Yep, I think that's a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T11:16:01.847Z · LW · GW

I don't think that works, because 1) isn't actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn't exist, but the gnome is not indifferent.

Consequently, I think that as one of the humans in your "closest human" case you shouldn't follow the gnome's advice, because the gnome's recommendation is being influenced by a priori possible worlds that you don't care about at all. This is the same reason a human with utility function T shouldn't follow the gnome recommendation of 4/5 from a gnome with utility function IT. Even though these recommendations are correct for the gnomes, they aren't correct for the humans.

As for the "same reasons" comment, I think that doesn't hold up either. The decisions in all of the cases are linked decisions, even in the simple case of U = S above. The difference in the S case is simply that the linked nature of the decision turns out to be irrelevant, because the other gnome's decision has no effect on the first gnome's utility. I would argue that the gnomes in all of the cases we've put forth have always had the "same reasons" in the sense that they've always been using the same decision algorithm, albeit with different utility functions.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T02:40:38.254Z · LW · GW

Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here.

The two are different precisely in the key case that Stuart mentioned---the case where there is no human at all in the gnome's cell. Fortunately, the utility function of the human who will be in the gnome's cell (which we'll call "cell B") is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The "would have had" case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense---the gnome's utility would correspond to the anti-indexical component JU of the human's utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome's utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome's utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome's utility function is simply U.

Following Beluga's post here, we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as "cell C", swapping between the two gnomes is equivalent to swapping B and C. For further simplification, we use $(B) to refer to the number of dollars in cell B, and o(B) as an indicator function for whether the cell has a human in it. The simplified utility functions are thus
T = $B + $C
A = ($B + $C) / (o(B) + o(C))
S = IS = $B
IT = o(B) ($B + $C)
IA = o(B) ($B + $C) / (o(B) + o(C))
Z = - $C
H = $B - $C
IH = o(B) ($B - $C)
Note that T and A are the only functions that are invariant under swapping B and C.

This invariance means that, for both cases involving utilitarian humans and pre-emptively submissive gnomes, all of the gnomes (including the one in an empty cell) and all of the humans have the same utility function over all possible worlds. Moreover, all of the decisions are obviously linked, and so there is effectively only one decision. Consequently, it's quite trivial to solve with UDT. Total utilitarianism gives
E[T] = 0.5(-x) + 2*0.5(1-x) = 1-1.5x
with breakeven at x = 2/3, and average utilitarianism gives
E[A] = 0.5(-x) + 0.5(1-x) = 0.5-x
with breakeven at x = 1/2.

In the selfish case, the gnome ends up with the same utility function whether it's pre-emptive or not, because IS = S. Also, there is no need to worry about decision linkage, and hence the decision problem is a trivial one. From the gnome's point of view, 1/4 of the time there will be no human in the cell, 1/2 of time there will be a human in the cell and the coin will have come up tails, and 1/4 of the time there will be a human in the cell and the coin will have come up heads. Thus
E[S] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
and the breakeven point is x = 2/3, as with the total utilitarian case.

In all of these cases so far, I think the humans quite clearly should follow the advice of the gnomes, because
1) Their utility functions coincide exactly over all a priori possible worlds.
2) The humans do not have any extra information that the gnomes do not.

Now, finally, let's go over the reasoning that leads to the so-called "incorrect" answers of 4/5 and 2/3 for total and average utilitarianism. We assume, as before, that the decisions are linked. As per Beluga's post, the argument goes like this:

With probability 2/3, the coin has shown tails. For an average utilitarian, the expected utility after paying x$ for a ticket is 1/3*(-x)+2/3*(1-x), while for a total utilitarian the expected utility is 1/3*(-x)+2/3*2*(1-x). Average and total utilitarians should thus pay up to 2/3$ and 4/5$, respectively.

So, what's the problem with this argument? In actual fact, for a submissive gnome, that advice is correct, but the human should not follow it. The problem is that a submissive gnome's utility function doesn't coincide with the utility function of the human over all possible worlds, because IT != T and IA != A. The key difference between the two cases is the gnome in the empty cell. If it's a submissive gnome, then it's completely indifferent to the plight of the humans; if it's a pre-emptively submissive gnome then it still cares.

If we were to do the full calculations for the submissive gnome, the gnome's utility function is IT for total utilitarian humans and IA for average utilitariam humans; since IIT = IT and IIA = IA the calculations are the same if the humans have indexical utility functions. For IT we get
E[IT] = 0.25(0) + 0.25(-x) + 2*0.5(1-x) = 1-1.25x
with breakeven at x = 4/5, and for IA we get
E[IA] = 0.25(0) + 0.25(-x) + 0.5(1-x) = 0.5-0.75x
with breakeven at x = 2/3. Thus the submissive gnome's 2/3 and 4/5 numbers are correct for the gnome, and indeed if the human's total/average utilitarianism is indexical they should just follow the advice, because their utility function would then be identical to the gnome's.

So, if this advice is correct for the submissive gnome, why should the pre-emptive submissive gnome's advice be different? After all, after conditioning on the presence of a human in the cell the two utility functions are the same. This particular issue is indeed exactly analogous to the mistaken "yea" answer in Psy-Kosh's non-anthropic problem. Although I side with UDT and/or the precommitment-based reasoning, I think that question warrants further discussion, so I'll leave that for a third comment.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-24T00:19:52.431Z · LW · GW

I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption of linked decisions, it thus follows that two utility functions are equivalent for this purpose if and only if they are equivalent over all possible worlds in which the outcomes are dependent upon X.

Now, as the first step in resolving these issues I think it's best to go over all of the relevant utility functions for this problem. First, let's begin with the three core non-indexical cases (or "lexicality-independent" cases, although I'm not sure of the term):
Indifference (0): I don't care at all about anything (i.e. a constant function).
Total utilitarian (T): I care linearly in the sum total dollars owned by humans in all possible worlds.
Average utilitarian (A): I care linearly in the average dollars owned by humans in all possible worlds.
There's also one essential operator we can apply to these functions:
Negation (-): -F = my preferences are the exact inverse of F.
e.g. -T would mean that you want humans to lose as many total dollars as possible.

Now for indexical considerations, the basic utility function is
Selfish (S): I care linearly in the amount of dollars that I own.
Notably, as applied to worlds where you don't exist, selfishness is equivalent to indifference. With this in mind, it's useful to introduce two indexical operators; first there's
Indexicalization (I): IF(w) = F(w) if you exist in world w, and 0 if you do not exist in world w.
Of course, it's pretty clear that IS=I, since S was already indifferent to worlds where you don't exist. Similarly, we can also introduce
Anti-indexicalization (J): JF(w) = 0 if you exist in world w, and F(w) if you do not exist in world w.

It's important to note that if you can influence the probability of yourself existing the constant value of the constant function becomes important, so these indexical operators are actually ill-conditioned in the general case. In this case, though, you don't affect the probability of your own existence, and so we may as well pick the constant to be zero. Also, since our utility functions are all enumerated in dollars we can also reasonably talk about making linear combinations of them, and so we can add, subtract, and multiply by constants. In general this wouldn't make sense but it's a useful trick here. With this in mind, we also have the identity IF + JF = F.

Now we already have all we need to define the other utility functions discussed here. Indexical total utilitarianism is simply IT, which translates into English as "I care about the total dollars owned by humans, but only if I exist; otherwise I'm indifferent."

As for "hatred", it's important to note that there are several different kinds. First of all, there is "anti-selflessness", which I represent via Z = S - T; this translates to "I don't care about myself, but I want people who aren't me to lose as many dollars as possible, whether or not I exist". Then there's the kind of hatred proposed below, where you still care about your own money as well; that one still comes in two different kinds. There is plain "selfish hatred" H = 2S - T, and then there's its indexical version IH = I(2S - T) = 2S - IT, which translates to "In worlds in which I exist, I want to get as much money as possible and for other people to have as little money as possible". The latter is probably best referred to as "jealousy" rather than hatred. From these definitions, two identities of selfishness as mixes of total utilitarianism and hatred follow pretty clearly, as S = 0.5(H+T) = 0.5(IH+IT).

Next comment: submissive gnomes, and the correct answers.

EDIT: Apparently the definitions of "hater" used in the other comments assume that haters still care about their own money, so I've updated my definitions.

Comment by lackofcheese on Anthropic decision theory for selfish agents · 2014-10-23T23:11:07.036Z · LW · GW

There's some confusion here that needs to be resolved, and you've correctly pinpointed that the issue is with the indexical versions of the utility functions, or, equivalently, the gnomes who don't see a human at all.

I think I have a comprehensive answer to these issues, so I'm going to type it up now.

Comment by lackofcheese on On Caring · 2014-10-21T06:42:23.266Z · LW · GW

A good point. By abuse I wouldn't necessarily mean anything blatant though, just that selfish people are happy to receive resources from selfless people.

Sure, and there isn't really anything wrong with that as long as the person receiving the resources really needs them.

Valuing people equally by default when their instrumental value isn't considered. I hope I didn't misunderstand you. That's about as extreme it gets but I suppose you could get even more extreme by valuing other people more highly than yourself.

The term "altruism" is often used to refer to the latter, so the clarification is necessary; I definitely don't agree with that extreme.

In any case, it may not be reasonable to expect people (or yourself) to hold to that valuation, or to act in complete recognition of what that valuation implies even if they do, but it seems like the right standard to aim for. If you are likely biased against valuing distant strangers as much as you ought to, then it makes sense to correct for it.

Comment by lackofcheese on On Caring · 2014-10-21T05:19:09.306Z · LW · GW

That's one way to put it, yes.

Comment by lackofcheese on On Caring · 2014-10-21T04:30:05.936Z · LW · GW

One can reasonably argue the other way too. New children are easier to make than new adults.

True. However, regardless of the relative value of children and adults, it is clear that one ought to devote significantly more time and effort to children than to adults, because they are incapable of supporting themselves and are necessarily in need of help from the rest of society.

Since she has finite resources, is there a practical difference?

Earlier I specifically drew a distinction between devoting time and effort and valuation; you don't have to value your own children more to devote yourself to them and not to other peoples' children.

That said, there are some practical differences. First of all, it may be better not to have children if you could do more to help other peoples' children. Secondly, if you do have children and still have spare resources over and above what it takes to properly care for them, then you should consider where those spare resources could be spent most effectively.

It seems to me extreme altruism is so easily abused that it will inevitably wipe itself out in the evolution of moral systems.

If an extreme altruist recognises that taking such an extreme position would lead overall to less altruism in the future, and thus worse overall consequences, surely the right thing to do is stand up to that abuse. Besides, what exactly do you mean by "extreme altruism"?

Comment by lackofcheese on One Life Against the World · 2014-10-21T04:07:12.933Z · LW · GW

If you have the values already and you don't have any reason to believe the values themselves could be problematic, does it matter how you got them?

It may be that an altruistic high in the past has led you to value altruism in the present, but what matters in the present is whether you value the altruism itself over and above the high.

Comment by lackofcheese on On Caring · 2014-10-21T03:47:16.213Z · LW · GW

Accounting for possible failure modes and the potential effects of those failure modes is a crucial part of any correctly done "morality math".

Granted, people can't really be relied upon to actually do it right, and it may not be a good idea to "shut up and multiply" if you can expect to get it wrong... but then failing to shut up and multiply can also have significant consequences. The worst thing you can do with morality math is to only use it when it seems convenient to you, and ignore it otherwise.

However, none of this talk of failure modes represents a solid counterargument to Singer's main point. I agree with you that there is no strict moral equivalence to killing a child, but I don't think it matters. The point still holds that by buying luxury goods you bear moral responsibility for failing to save children who you could (and should) have saved.

Comment by lackofcheese on On Caring · 2014-10-21T03:09:43.477Z · LW · GW

Probably not just any random person, because one can reasonably argue that children should be valued more highly than adults.

However, I do think that the mother should hold other peoples' children as being of equal value to her own. That doesn't mean valuing her own children less, it means valuing everyone else's more.

Sure, it's not very realistic to expect this of people, but that doesn't mean they shouldn't try.

Comment by lackofcheese on On Caring · 2014-10-20T19:24:27.281Z · LW · GW

So, either there is such a thing as the "objective" value and hence, implicitly, you should seek to approach that value, or there is not.

I don't see any reason to believe in an objective worth of this kind, but I don't really think it matters that much. If these is no single underlying value, then the act of assigning your own personal values to people is still the same thing as "passing judgement on the worth of humans", because it's the only thing those words could refer to; you can't avoid the issue simply by calling it a subjective matter.

In my view, regardless of whether the value in question is "subjective" or "objective", I don't think it should be determined by the mere circumstance of whether I happened to meet that person or not.

Comment by lackofcheese on On Caring · 2014-10-20T18:59:43.379Z · LW · GW

My actions alone don't necessarily imply a valuation, or at least not one that makes any sense.

There are a few different levels at which one can talk about what it means to value something, and revealed preference is not the only one that makes sense.

Comment by lackofcheese on On Caring · 2014-10-20T17:30:42.013Z · LW · GW

I'm not entirely sure what a "personal perception of the value of a human being" is, as distinct from the value or worth of a human being. Surely the latter is what the former is about?

Granted, I guess you could simply be talking about their instrumental value to yourself (e.g. "they make me happy"), but I don't think that's really the main thrust of what "caring" is.

Comment by lackofcheese on A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more) · 2014-10-20T10:14:11.203Z · LW · GW

I can (and do) believe that consciousness and subjective experience are things that exist, and are things that are important, without believing that they are in some kind of separate metaphysical category.

Comment by lackofcheese on One Life Against the World · 2014-10-20T07:50:31.650Z · LW · GW

There is no need for morality to be grounded in emotional effects alone. After all, there is also a part of you that thinks that there is, or might be, something "horrible" about this, and that part also has input into your decision-making process.

Similarly, I'd be wary of your point about utility maximisation. You're not really a simple utility-maximising agent, so it's not like there's any simple concept that corresponds to "your utility". Also, the concept of maximising "utility generally" doesn't really make sense; there is no canonical way of adding your own utility function together with everyone else's.

Nonetheless, if you were to cash out your concepts of what things are worth and how things ought to be, then in principle it should be possible to turn them into a utility function. However, there is a priori no reason that that utility function has to only be defined over your own feelings and emotions.

If you could obtain the altruistic high without doing any of the actual altruism, would it still be just as worthwhile?

Comment by lackofcheese on Questions on Theism · 2014-10-20T07:41:16.833Z · LW · GW

It's a rather small sample size, isn't it? I don't think you can draw much of a conclusion from it.

Comment by lackofcheese on Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities · 2014-10-20T07:14:48.089Z · LW · GW

The game AIs for popular strategy games are often bad because the developers don't actually have the time and resources to make a really good one, and it's not a high priority anyway - most people playing games like Civilization want an AI that they'll have fun defeating, not an AI that actually plays optimally.

I think you're mostly correct on this. Sometimes difficult opponents are needed, but for almost all games that can be trivially achieved by making the AI cheat rather than improving the algorithms. That said, when playing a game vs an AI you do want the AI to at least appear to be intelligent; although humans can often be quite easy to fool with cheating, a good algorithm is still a better way of giving this appearance than a fake. It doesn't have to be optimal, and even if it is you can constrain it enough to make it beatable, or intentionally design different kinds of weaknesses into the AI so that humans can have fun looking for those weaknesses and feel good when they find them. Ultimately, though, the point is that the standard approach of having lots and lots of scripting still tends to get the job done, and developers almost never find the resource expenditure for good AI to be worthwhile.

However, I think that genuinely superhuman AI in games like Starcraft and Civilization is far harder than you imply. For example, in RTS games (as Lumifer has said) the AI has a built-in advantage due to its capacity for micromanagement. Moreover, although the example you cite has an AI from a "few months" of work beating a high-level human player, I think that was quite likely to be a one-off occurrence. Beating a human once is quiet different to consistently beating a human.

If you look at the results of the AIIDE Man vs Machine matches, the top bots consistently lose every game to Bakuryu (the human representative). According to this report,

In this match it was shown that the true weakness of state of the art StarCraft AI systems was that humans are very adept at recognizing scripted behaviors and exploiting them to the fullest. A human player in Skynet’s position in the first game would have realized he was being taken advantage of and adapted his strategy accordingly, however the inability to put the local context (Bakuryu kiting his units around his base) into the larger context of the game (that this would delay Skynet until reinforcements arrived) and then the lack of strategy change to fix the situation led to an easy victory for the human. These problems remain as some of the main challenges in RTS AI today: to both recognize the strategy and intent of an opponent’s actions, and how to effectively adapt your own strategy to overcome them.

I seems to me that the best AIs in these kinds of games work by focusing on a relatively narrow set of overall strategies, and then focusing on executing those strategies as flawlessly as possible. In something like Starcraft the AI's potential for this kind of execution is definitely superhuman, but as the Man vs Machine matches demonstrate this really isn't enough.

In the case of the Civilization games, the fact that they aren't real-time removes quite a lot of the advantage that an AI gets in terms of micromanagement. Also, like in Starcraft, classical AI techniques really don't work particularly well due to the massive branching factor.

Granted, taking a similar approach to the Starcraft bots might still work pretty well; I believe there are some degenerate strategies in many of the Civ games that are quite strong on their own, and if you program an AI to execute them with a high degree of precision and good micromanagement, and add some decent reactive play, that might be good enough.

However, unless the game is simply broken due to bad design, I suspect that you would find that, like the Starcraft bots, AIs designed on that kind of idea would still be easily exploited and consistently beaten by the best human players.

Comment by lackofcheese on Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities · 2014-10-20T06:24:00.675Z · LW · GW

I wouldn't say that poker is "much easier than the classic deterministic games", and poker AI still lags significantly behind humans in several regards. Basically, the strongest poker bots at the moment are designed around solving for Nash equilibrium strategies (of an abstracted version of the game) in advance, but this fails in a couple of ways:

  1. These approaches haven't really been extended past 2- or 3-player games.
  2. Playing a NE strategy makes sense if your opponent is doing the same, but your opponent almost always won't be. Thus, in order to play better, poker bots should be able to exploit weak opponents.
    Both of these are rather nontrivial problems.

Kriegspiel, a partially observable version of chess, is another example where the best humans are still better than the best AIs, although I'll grant that the gap isn't a particularly big one, and likely mostly has to do with it not being a significant research focus.

Comment by lackofcheese on Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities · 2014-10-20T05:56:46.332Z · LW · GW

Although computers beat humans at board games without needing any kind of general intelligence at all, I don't think that invalidates game-playing as a useful domain for AGI research.

The strength of AI in games is, to a significant extent, due to the input of humans in being able to incorporate significant domain knowledge into the relatively simple algorithms that game AIs are built on.

However, it is quite easy to make game AI into a far, far more challenging problem (and, I suspect, a rather more widely applicable one)---consider the design of algorithms for general game playing rather than for any particular game. Basically, think of a game AI that is first given a description of the rules of the game it's about to play, which could be any game, and then must play the game as well as possible.

Comment by lackofcheese on On Caring · 2014-10-20T04:22:04.505Z · LW · GW

I agree; I don't see a significant difference between thinking that I ought to value other human beings equally but failing to do so, and actually viewing them equally and not acting accordingly. If I accept either (1) or (2) it's still a moral failure, and it is one that I should act to correct. In either case, what matters is the actions that I ought to take as a result (i.e. effective altruism), and I think the implications are the same in both cases.

That being said, I guess the methods that I would use to correct the problem would be different in either hypothetical. If it's (1) then there may be ways of thinking about it that would result in a better valuation of other people, or perhaps to correct for the inaccuracy of the care-o-meter as per the original post.

If it's (2), then the issue is one of akrasia, and there are plenty of psychological tools or rationalist techniques that could help.

Of course, (1) and (2) aren't the only possibilities here; there's at least two more that are important.

Comment by lackofcheese on On Caring · 2014-10-19T18:31:16.798Z · LW · GW

Yes, if I really ought to value other human beings equally then it means I ought to devote a significant amount of time and/or money to altruistic causes, but is that really such an absurd conclusion?

Perhaps I don't do those things, but that doesn't mean I can't and it doesn't mean I shouldn't.

Comment by lackofcheese on Applications of logical uncertainty · 2014-10-19T18:20:54.032Z · LW · GW

Here's some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith

and also work following Stuart Russell's concept of "metareasoning"
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.

On the whole, though, it's relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well.

As a particular example of the applicability of these methods, there is a phenomenon referred to as "search pathology" or "minimax pathology", in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn't occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.

Comment by lackofcheese on Applications of logical uncertainty · 2014-10-19T17:17:41.490Z · LW · GW

Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating.

In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particularly applicable. Granted, I could be wrong on this, because the core ideas aren't particularly new (see, for example, Dynamic Probability, Computer Chess, and the Measurement of Knowledge by I. J. Good).

It's an area of research I'm working on right now, so I've spent a fair amount of time looking into it. I could give a few references on the topic, but on the whole I think they're quite sparse.