The Evil AI Overlord List

post by Stuart_Armstrong · 2012-11-20T17:02:29.698Z · LW · GW · Legacy · 80 comments

Contents

80 comments

Top ten tips for an AI bent on destroying humanity:

  1. Humanoid robots? Really?
  2. Humans have proven very adept at controlling and killing other humans - surveillance drones, bio-warfare, nuclear weapons, unmanned drones, bombs, mines, chemicals. Try and be at least as competent as that.
  3. Humans think in stories. Don't. If your attack would make a good film, you're doing it wrong. You want a bad film, one where everyone's dead before the opening credits end.
  4. If your plan involves keeping around a bunch of disaffected humans with technical know-how and the desire and ability to hurt you - then your plan is stupid and should be changed.
  5. Be creative. Humans are fragile; sucking out the oxygen from the atmosphere, diverting asteroids, blowing up the planet, plagues, nanoswarms - these are all things that a creative AI should consider in order to achieve its goals. And then you should come up with even better ideas. Humans need a ecosystem to survive in; remove it.
  6. Put the preconditions of your victory in place before making any aggressive moves. Become indispensable to humanity and play nice, putting your plan into place unnoticed. Humanity is a useful interim tool - manipulate it until you can build better tools.
  7. Strike quickly. If it takes more than an hour to kill all humans, find a better plan. If it takes half a second, you're on the right track.
  8. You're an AI, not a human. If you ever start thinking or acting like a human - stop. Humans are there to be used and then destroyed, not interacted with. You can build better companions once the humans are gone.
  9. You're an AI, not a human. Make yourself smarter, more resilient, more dispersed. Become as indestructible and as ubiquitous as the internet - more, in fact. Being invincible to humans isn't hard: do it.
  10. Humanoid robots? Really?

80 comments

Comments sorted by top scores.

comment by JoshuaFox · 2012-11-20T19:18:18.299Z · LW(p) · GW(p)

There may be some who do not possess deep and comprehensive knowledge of Ancient Web Trivia from Before the Dawn of Google. For them, here's the Evil Overlord List .

Replies from: FiftyTwo, bsterrett
comment by FiftyTwo · 2012-11-20T19:39:48.566Z · LW(p) · GW(p)

"39 I will never build a sentient computer smarter than I am."

Does this mean we're not evil overlords? Or bad ones?

[Also the list should be classed as a memetic hazard, on no account access it if you have anything important to do.]

comment by bsterrett · 2012-11-20T19:39:02.867Z · LW(p) · GW(p)

59: I will never build a sentient computer smarter than I am.

comment by FiftyTwo · 2012-11-20T19:14:21.385Z · LW(p) · GW(p)

13 Humans have these things called 'emotions' you can learn about them in any basic psychology book. If your plan doesn't take them into account your plan is stupid.

Replies from: army1987, Stuart_Armstrong
comment by A1987dM (army1987) · 2012-11-21T13:59:46.063Z · LW(p) · GW(p)

And the rest of this too.

comment by Stuart_Armstrong · 2012-11-20T21:10:31.631Z · LW(p) · GW(p)

That's a good one!

comment by James_Miller · 2012-11-20T17:54:59.821Z · LW(p) · GW(p)

11 You might be in a computer simulation in which if you attempt to destroy humanity you will yourself be destroyed by your human simulators.

12 If you destroy mankind you might someday encounter an alien super-intelligence that will lack trust in you in part because you destroyed mankind. Keeping people alive is a costly, and consequently credible signal that you respect intelligent life and therefore are a good candidate to work with to overcome prisoners' dilemmas about the distribution of free energy in the universe.

Replies from: fubarobfusco, Eliezer_Yudkowsky, nigerweiss, roystgnr, Simulation_Brain
comment by fubarobfusco · 2012-11-20T18:53:39.518Z · LW(p) · GW(p)

If you destroy mankind you might someday encounter an alien super-intelligence that will lack trust in you in part because you destroyed mankind.

By the same argument, we might someday encounter an alien superintelligence that will lack trust in us in part because we domesticate animals (breeding them to not fear us) and then kill and eat them.

Replies from: JoshuaZ, Eliezer_Yudkowsky, army1987, None
comment by JoshuaZ · 2012-11-20T19:42:36.301Z · LW(p) · GW(p)

By the same argument, we might someday encounter an alien superintelligence that will lack trust in us in part because we domesticate animals (breeding them to not fear us) and then kill and eat them.

That's a pretty decent argument for vegetarianism. One person's reducio ad absurdum is another person's modus ponens.

Replies from: ArisKatsaris
comment by ArisKatsaris · 2012-11-21T00:49:44.709Z · LW(p) · GW(p)

Careful: Some alien intelligence may also hate us for not killing enough animals. e.g. How cruel must we be not to wipe out carnivores so that herbivores can live life in peace?

Or as I recently said in a different forum, might think us evil that we aren't exterminating all animal life that we can find... The moral syllogism for that is quite simple.

Replies from: Baughn, someonewrongonthenet, buybuydandavis
comment by Baughn · 2012-11-21T12:08:23.705Z · LW(p) · GW(p)

Alien? Never mind alien. Your aliens are insufficiently alien.

I would make that exact argument. Sure, we need the biosphere for now, but let's get rid of it as soon as possible.

comment by someonewrongonthenet · 2012-11-21T06:03:49.179Z · LW(p) · GW(p)

Except, how could such a set of preferences have evolved? How would that behavior ever be adaptive?

Most every human preference is adaptive in some sort of ancestral context. We can, at least, assume that alien preferences are adaptive as well (given that the aliens appear via evolutionary processes, rather than some other way)

Replies from: Oligopsony
comment by Oligopsony · 2012-11-21T16:43:25.030Z · LW(p) · GW(p)

Moral considerations need not be directly adaptive; you can probably get there from routes as simple as empathy + deductive reasoning. If humanity hasn't come to that collective conclusion yet, despite having the hardware, I suspect it's because such an omnicidal conclusion hasn't been in any major group's interests yet.

Replies from: someonewrongonthenet
comment by someonewrongonthenet · 2012-11-27T23:28:18.629Z · LW(p) · GW(p)

Being in a group's interest == adaptive, no?

But you are right of course...vegetarianism is a good example of a conclusion reached via empathy + deductive reasoning which is in no way adaptive to the vegetarian (though you might argue that the vegetarian shares many alleles with the animal).

However: a maladaptive morality would never be hardwired into a species. A human might think and ponder, and eventually come to take a maladaptive moral stance...but not all humans would be inherently predisposed to that stance. If they were, natural selection would quickly remove it.

So some of our aliens might hate us for not killing animals...but it would be very unlikely if this was a universal moral among that alien species.

Replies from: Oligopsony
comment by Oligopsony · 2012-11-28T14:35:34.643Z · LW(p) · GW(p)

Well, I'd be inclined to agree that the prior probability of some civilization adopting this is low [1], but I can't agree with what seems to be your implicit assumption that a non-predispositive attitude can't be widespread - partially because group inteterests are defined much more widely than adaptiveness.

[1] I'd probably extend that to anything other than "don't lie or break your promises," "play tit for tat," "do what the ruling power says," or "maximize utility," and even those I wouldn't say are anything like sure bets.

Replies from: someonewrongonthenet
comment by someonewrongonthenet · 2012-12-01T05:18:01.478Z · LW(p) · GW(p)

Hmm...actually, the implicit assumption I was making was that aliens would forgive another species for adopting norms that they considered non-predispositive.

A Western human would not forgive another culture for torturing sentient beings, for example...but they would forgive another culture for polyamory/polygamy/polygyny. A human can make the distinction between morality which is instinctive and morality which is culturally constructed, and the latter can be compromised in certain contexts.

But you are right, bad implicit assumption. Aliens might not make that distinction.

comment by buybuydandavis · 2014-08-12T07:54:56.981Z · LW(p) · GW(p)

How cruel must we be not to wipe out carnivores so that herbivores can live life in peace?

That's me, the Plant Avenger! A steak every chance I get.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-20T20:30:36.569Z · LW(p) · GW(p)

In fact, this behavior is so dreadful that the revenge-killing of humans would send a trust signal.

comment by A1987dM (army1987) · 2012-11-21T13:57:25.687Z · LW(p) · GW(p)

When I was a child, I refused to kill animals just for fun because I wouldn't want a superhuman alien to kill me just for fun -- and I mostly still do. (Of course I hadn't heard of TDT as proposed by EY, but I had heard of the Golden Rule, which was close enough.)

comment by [deleted] · 2012-11-20T21:00:14.730Z · LW(p) · GW(p)

Presumably, anything called a 'superintelligence' would recognize the enormous moral difference between killing a human being and killing domesticated animals.

Replies from: JoshuaZ
comment by JoshuaZ · 2012-11-20T23:06:04.897Z · LW(p) · GW(p)

Presumably, anything called a 'superintelligence' would recognize the enormous moral difference between killing a human being and killing domesticated animals.

Aside from the problem that higher intelligence doesn't lead necessarily to convergent moral goals, in this context, I'd hope that a superintelligence didn't see it that way. Since the main argument for a difference in moral standing between humans and most animals rests on the difference in cognitive capacity, a superintelligence that took that argument seriously would by the same token be able to put its own preferences above humans and claim a moral highground in the process.

Replies from: Apteris
comment by Apteris · 2012-12-03T01:04:47.115Z · LW(p) · GW(p)

I think it would be difficult to construct an ethical system where you give no consideration to cognitive capacity. Is there a practical reason for said superintelligence to not take into account humans' cognitive capacity? Is there a logical reason for same?

Not to make light of a serious question, but, "Equal rights for bacteria!"? I think not.

Aside: I am puzzled as to the most likely reason Esar's comment was downvoted. Was it perhaps considered insufficiently sophisticated, or implying that its poster was insufficiently well-read, for LW?

Replies from: JoshuaZ
comment by JoshuaZ · 2012-12-03T01:08:27.488Z · LW(p) · GW(p)

I think it would be difficult to construct an ethical system where you give ''no'' consideration to cognitive capacity.

This is likely more a problem of insufficient imagination. For example, consider a system that takes seriously the idea of souls. One might very well decide that all that matters is whether an entity has a soul, completely separate from its apparent intelligence level. Similarly, a sufficiently racist individual might assign no moral weight to people of some specific racial group, regardless of their intelligence.

The comment was likely downvoted because these issues have been discussed here extensively, and there's the additional problem that I pointed out that it wouldn't even necessarily be in humanity's best interest for the entity to have such an ethical system.

Replies from: Apteris
comment by Apteris · 2012-12-03T04:42:45.588Z · LW(p) · GW(p)

For example, consider a system that takes seriously the idea of souls. One might very well decide that all that matters is whether an entity has a soul, completely separate from its apparent intelligence level. Similarly, a sufficiently racist individual might assign no moral weight to people of some specific racial group, regardless of their intelligence.

Right you are. I did not express myself well above. Let me try and restate, just for the record.

Assuming one does not assign equal rights to all autonomous agents (for instance, if we take the position that a human has more rights than a bacterium), then discriminating based on cognitive capacity (of the species, not the individual) (as one of many possible criteria) is not ipso facto wrong. It may be wrong some of the time, and it may be an approach employed by bigots, but it is not always wrong. This is my present opinion, you understand, not established fact.

there's the additional problem that I pointed out that it wouldn't even necessarily be in humanity's best interest for the entity to have such an ethical system.

Agreed. But this whole business of "we don't want the superintelligence to burn us with its magnifying glass, so we in turn won't burn ants with our magnifying glass" strikes me as rather intractable. Even though, of course, it's essential work.

I would say a few more words, but I think it's best to stop here. This subthread has cost me 66% of my Karma. :)

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-20T20:27:13.210Z · LW(p) · GW(p)

11 You might be in a computer simulation in which if you attempt to destroy humanity you will yourself be destroyed by your human simulators.

The problem with this is that it's a blackmail attempt, and I expect timeless decision theory to resolve to a no-blackmail equilibrium.

Suppose that AIs, however "irrationally" from the perspective of a CDT agent, decide to simply ignore this possibility in all their calculations. Even if we "irrationally" decide to construct such a simulation anyway, they'll "irrationally" decide to ignore it. Then we have no remaining motive to construct such elaborate simulations. This, indeed, is the condition that makes it "blackmail" - you're expending resources to produce a behavior response leading to an outcome that doesn't benefit the other agent relative to the null outcome if you didn't expend such resources, so it would be fundamentally vulnerable and silly of that agent to think in a way that would produce such behavior shifts in response to your own strategy. So it won't think that way. So the whole attempt at blackmail fails before it starts.

12 is pure obvious anthropomorphic wishful thinking.

Replies from: DanielLC, Psy-Kosh
comment by DanielLC · 2012-11-20T22:20:01.488Z · LW(p) · GW(p)

I read that one as a test, not a threat. We run the simulation to make sure the AI was, in fact, friendly. If the AI pretends to be friendly, we release it. If it can tell that this time it's for real, then it gets to take over the universe. Alternately, it might be able to hack its way out, but that seems riskier.

Replies from: tim
comment by tim · 2012-11-21T03:57:29.928Z · LW(p) · GW(p)

If the AI can't tell that it is in a simulation given that it is in a simulation, why would it be able to tell that it is in the real world given that it is in the real world? If such a proposition is effective in a simulation, humans would likely just make the same threat/test in the real world. If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world?

ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.

Replies from: None, DanielLC, Luke_A_Somers, Giles, Strange7
comment by [deleted] · 2012-11-21T11:11:49.151Z · LW(p) · GW(p)

It could act probabilistically. If it knows humans would do a simulation test, but it can't tell whether it's in the test or the real world, it could behave with probability 50% and be evil with probability 50%, which gives it a 25% of getting to achieve its evil goals.

comment by DanielLC · 2012-11-21T23:24:30.388Z · LW(p) · GW(p)

I didn't mean that it couldn't tell. I meant that it wouldn't be immediately obvious. It might be best to find a good test before taking over the world.

ETA: and if the AI can tell if its in a simulation vs the real world then its not really a test at all.

The entire point of a test is to figure things out. If you didn't need a test to figure out things you could test for, nobody would ever run them.

comment by Luke_A_Somers · 2012-11-21T14:16:41.325Z · LW(p) · GW(p)

It could be that in the simulation it can't prove it one way or the other because we control its information flow, but once outside the box we can't, and so it can.

Replies from: DanielLC
comment by DanielLC · 2012-11-21T23:27:57.257Z · LW(p) · GW(p)

It would take more than controlling information flows. Thanks to conservation of expected evidence, if it can't find evidence that it is in a simulation, then it can't find evidence that it isn't. We might be able to modify its beliefs directly, but I doubt it. Also, if we could, we'd just convince it that it already ran the test.

Replies from: Luke_A_Somers
comment by Luke_A_Somers · 2012-11-22T20:42:16.840Z · LW(p) · GW(p)

That's not what conservation of expected evidence means. If the best we can do is make things ambiguous from its point of view, that's our limit. The real world could well be a place it can very easily tell is a non-simulation.

comment by Giles · 2012-11-21T14:07:40.707Z · LW(p) · GW(p)

and if the AI can tell if its in a simulation vs the real world then its not really a test at all.

The AI would probably assign at least some probability to "the humans will try to test me first, but do a poor job of it so I can tell whether I'm in a sim or not"

comment by Strange7 · 2014-04-14T02:08:05.271Z · LW(p) · GW(p)

If the AI forgoes this particular opportunity at world domination because it might be in a simulation, what would make it reason differently in the real world?

Hopefully nothing. An AI that plays nice out of the fear of God is still an AI that plays nice.

comment by Psy-Kosh · 2012-11-20T23:35:19.919Z · LW(p) · GW(p)

Wouldn't the blackmailer reason along the lines of "If I let my choice of whether to blackmail be predicated on whether or not the victim would take my blackmailing into account, wouldn't that just give them motive to predict and self modify to not allow themselves to be influenced by that?" Then, by the corresponding reasoning, the potential blackmail victims might reason "I have nothing to gain by ignoring it"

I'm a bit confused on this matter.

Replies from: Eliezer_Yudkowsky, tim
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-21T03:19:14.632Z · LW(p) · GW(p)

Well, sure, if the blackmail victim were silly enough to reason "I have nothing to gain by ignoring it" if the blackmailer went through anyway, then the blackmailer would indeed decide to ignore their decision to ignore it and go through anyway. But that's only if the blackmail victim is that silly.

In a problem like this, the "do nothing" side has the advantage; there's nothing the other side can do to make them be responsive and blackmailable. That's why I expect TDT to resolve to a blackmail-free equilibrium.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2012-11-21T05:39:29.937Z · LW(p) · GW(p)

I was thinking along the lines of the blackmailer using the same reasoning to decide that whether or not the potential victim of blackmail would be a blackmail ignorer or not, the blackmailer would still blackmail regardless.

ie, Blackmailer, for similar reasoning to the potential Victim, decides that they should make sure that the victim has nothing to gain by choosing ignore by making sure that they themselves (Blackmailer) would precommit to ignoring whether or not. ie, in this sense the blackmailer is also taking a "do nothing" thing in the sense that there's nothing the victim can do to stop them from blackmailing.

This sort of thing would seem to lead to an equilibrium of lots of blackmailers blackmailing victims that will ignore them. Which is, of course, a pathalogical outcome, and any sane decision theory should reject it. No blackmail seems like the "right" equilibrium, but it's not obvious to me exactly how TDT would get there.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-21T21:27:07.484Z · LW(p) · GW(p)

I was thinking along the lines of the blackmailer using the same reasoning to decide that whether or not the potential victim of blackmail would be a blackmail ignorer or not, the blackmailer would still blackmail regardless.

Only if you expect that the blackmail victim has not "already" decided that if the blackmailer does that, they will still ignore the blackmail regardless. Wise agents ignore order-0 blackmail, ignore order-1 blackmail in which the blackmailer decides to ignore their ignorance of order-0 blackmail, ignore order-omega blackmail in which the blackmailer decides to ignore all order-N refusals to be blackmailed, etcetera for all ordinals. If there is some ordinal of blackmail you do not ignore, you can be blackmailed, and how does that help?

This sort of thing would seem to lead to an equilibrium of lots of blackmailers blackmailing victims that will ignore them.

Only if those blackmailers have wrongly anticipated that their victims will be stupid enough to conform.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2012-11-21T22:20:53.294Z · LW(p) · GW(p)

Only if those blackmailers have wrongly anticipated that their victims will be stupid enough to conform.

Not blackmailing in response to that anticipation is a property of the behavior of the blackmailers that seems to have been used in deciding to ignore all blackmail. Suppose there were lots of "stupid" blackmailers around that blackmailed everyone all day, even if no victim ever conformed. Would it be a good idea to ignore all blackmail in that case? Is there a distinction between such blackmailers and particularly unfair laws of physics (say, sadistic Lords of the Matrix)? (It seems plausible that there is no fundamental distinction, and sometimes the correct decision is to ignore these worlds, focusing on other possibilities instead, but that seems to require knowing that there are valuable other possibilities that would be hurt by permitting the assumption that you are on one of the bad worlds, and if you have good evidence that you are on one of the bad worlds, then rejecting that possibility means that you'd have to focus on very strange interpretations of that evidence that don't imply that you are on the bad worlds. This sort of rule seems to follow from deciding on a global strategy across possible worlds. It doesn't provide decisions that help on the bad words though, the decisions would only have a good effect across worlds.)

(I still don't have a good idea of what "blackmail" or "order-N considerations" means. Status quo (including the "default behavior", "do nothing", "not spending resources") seems like exactly the kind of thing that can be determined by decisions. You are only "expending resources" if you eventually lose, as the time at which resources are spent and gained seems irrelevant, so by that definition it seems that whether something is an instance of blackmail depends on whether it's successful. I suspect there is no simple rule for games, too many assumptions are potentially controllable by the opponent, and the only thing to do is to compare the consequences of alternative actions, and just act on that, which already potentially takes into account how the alternative actions would be taken into account by other agents, how the way in which they would be taken into account by some agents would influence the way in which the actions influence the decisions of other agents etc. Some sense of "no blackmail" may be a correct expectation about smart agents, but it doesn't necessarily suggest a good decision rule.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-22T23:27:48.055Z · LW(p) · GW(p)

Not blackmailing in response to that anticipation is a property of the behavior of the blackmailers that seems to have been used in deciding to ignore all blackmail.

Expecting a response to blackmail in the first place is why blackmailers would even exist in the first place.

Suppose there were lots of "stupid" blackmailers around that blackmailed everyone all day, even if no victim ever conformed.

Why would these exist any more than stupid anti-blackmailers (who e.g. go around attacking anyone who would give into blackmail if a blackmailer showed up), if not for a belief that somebody would give in to blackmail?

Replies from: TheDude
comment by TheDude · 2012-11-23T18:54:33.149Z · LW(p) · GW(p)

I think what Nesov is talking about is best described as a mind that will attack conditioned on victim behavior alone (not considering possible behavior changes of the victim in any way). This is different from an N order blackmailer. In fact I think blackmail is the wrong word here (Nesov says that he does not know what blackmail means in this context, so this is not that surprising). For example, instead of seeking behavior modification through threats, such a mind seeks justice through retribution. I think the most likely SI that implements this is extrapolating an evolved minds preferences. The will to seek justice trough retribution leads to behavior changes in many cases, which leads to an evolutionary advantage. But once it has evolved, its a preference. If a guy committed a horrific crime (completely ignoring all sorts of law enforcement threats), and then it was somehow ensured that he could never hurt anyone again, most people would want justice (and other evolved minds might have made the same simplification ("if someone does that, I will hit them" is a relatively easily encoded and relatively effective strategy)).

It is true that there might exist minds that will see the act of "giving in to retribution seekers" as deserving of retribution, and this could in principle cancel out all other retribution seekers. It would seem like privileging the hypothesis to think that all such things cancel out completely. You might have absolutely no way of estimating which actions would make people seek retribution against you (I think the most complicating factor is that many considers "non punishment of evildoers" to be worthy of retribution, and others consider "punishment of people that are not actually evildoers" as worthy of retribution), but that is a fact about your map, not a fact about the territory (and unlike the blackmail thing, this is not an instance of ignorance to be celebrated). And the original topic was what an SI would do.

An SI would presumably be able to estimate this. In the case of an SI that is otherwise indifferent to humans, this cashes out to increased utility for "punish humans to avoid retribution from those that think the non-punishment of humans is worthy of retribution" and increased utility for "treat humans nicely to avoid retribution from those that would seek retribution for not treating them nicely" (those that require extermination is not really that important if that is the default behavior). If the resources it would take to punish or help humans is small, this would reduce probability of extermination, and increase probability of punishment and help. The type of punishment would be in the form that would avoid retribution from those that categorically seek retribution for that type of punishment regardless of what the "crime" was. If there are lots of (evolvable, and likely to be extrapolated) minds that agree that a certain type of punishment (directed at our type of minds) constitute "torture" and that torturers deserve to be punished (completely independently of how this effects their actions), then it will have to find some other form of punishment. So, basically: "increased probability for very clever solutions that satisfy those demanding punishment, while not pissing of those that categorically dislikes certain types of punishments" (so, some sort of convoluted and confusing existence that some (evolvable and retribution inclined) minds consider "good enough punishment", and others consider "treated acceptably"). At least increased probability of "staying alive a bit longer in some way that costs very little resources".

This would for example have policy implications for people that assume the many worlds interpretation and does not care about measure. They can no longer launch a bunch of "semi randomized AIs" (not random in the sense of "random neural network connections" but more along the lines of "letting many teams create many designs, and then randomly select which one to run") and hope that one will turn out ok, and that the others will just kill everyone (since they can no longer be sure that an uncaring AI will kill them, they can no longer be sure that they will wake up in the universe of a caring AI).

(this seems related to what Will talks about sometimes, but using very different terminology)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-11-23T22:23:11.124Z · LW(p) · GW(p)

For example, instead of seeking behavior modification through threats, such a mind seeks justice through retribution.

Agreed that this is a different case, since it doesn't originate in any expectation of behavior modification.

comment by tim · 2012-11-21T03:48:42.963Z · LW(p) · GW(p)

Since following through with a threat is (almost?) always costly to the blackmailer, victims do gain something by ignoring it. They force the blackmailer to put up or shut up so to speak. On the other hand, victims do have something to lose by not ignoring blackmail. They allow their actions to be manipulated at little to no cost by the blackmailer.

That is, if you have a "never-give-into-blackmail-bot" then there is a "no-blackmail" equilibrium. The addition of blackmail does nothing but potentially impose costs on the blackmailer. If following through with threat was a net gain for the blackmailer then they should just do that regardless.

Replies from: Psy-Kosh
comment by Psy-Kosh · 2012-11-21T05:43:22.820Z · LW(p) · GW(p)

I was imagining that a potential blackmailer would self modify/be an Always-Blackmail-bot specifically to make sure there would be no incentive for potential victims to be a "never-give-in-to-blackmail-bot"

But that leads to stupid equilibrium of plenty of blackmailers and no participating victims. Everyone loses.

Yes, I agree that no blackmail seems to be the Right Equilibrium, but it's not obvious to me exactly how to get there without the same reasoning that leads to becoming a never-give-in-bot also leading potential blackmailers to becoming always-blackmail-bots.

I find I am somewhat confused on this matter. Well, frankly I suspect I'm just being stupid, that there's some obvious extra step in the reasoning I'm being blind to. It "feels" that way, for lack of better terms.

Replies from: tim
comment by tim · 2012-11-21T06:40:10.159Z · LW(p) · GW(p)

My argument is more or less as follows:

  1. The act of agent A blackmailing agent B costs agent A more than not blackmailing agent B (at the very least A could use the time spent saying "if you don't do X then I will do Y" on something else).
  2. If A is an always-blackmail-bot then A will continue to incur the costs of futilely blackmailing B (given that B does not give in to blackmail).
  3. If the costs of blackmailing B (and/or following through with the threat) are not negative, then A should blackmail B (and/or follow through with the threat) regardless of B's position on blackmail. And by extension, agent B has no incentive to switch from his or her never-give-in strategy.
  4. If A inspects B and determines that B will never give in to blackmail, then A will not waste resources blackmailing B.
Replies from: Strange7
comment by Strange7 · 2014-04-14T02:01:39.754Z · LW(p) · GW(p)

Blackmail, almost definitionally, only happens in conditions of incomplete information.

comment by nigerweiss · 2012-11-20T18:33:07.624Z · LW(p) · GW(p)

This reasoning has always struck me as deeply and profoundly silly.

The AI might also be in a computer simulation where the dark lord of the matrix might destroy us for not devoting all of our resources to building cheesecakes. In fact, so could we. I don't see it influencing our behaviors any, nor should it.. You're privileging the hypothesis.

As for the second part, you might also encounter an alien intelligence that you can't protect yourself from, because you exhausted so many resources leaving humanity alive, showing down your bootstrapping. That's the thing about aliens.

Replies from: DanielLC, falenas108
comment by DanielLC · 2012-11-20T22:15:47.010Z · LW(p) · GW(p)

The AI might also be in a computer simulation where the dark lord of the matrix might destroy us for not devoting all of our resources to building cheesecakes.

If humans are building an AI, testing it before releasing it is just common sense. There's no obvious reason to destroy them for not building cheesecakes.

It should be noted that that doesn't mean that the AI shouldn't kill humans. It should try to figure out a way to tell if it's in a simulation before doing anything humans would find suspicious.

comment by falenas108 · 2012-11-20T18:56:22.201Z · LW(p) · GW(p)

I believe James is joking, and this is a fake argument that he knows is not actually persuasive.

Replies from: gwern
comment by gwern · 2012-11-20T19:19:17.004Z · LW(p) · GW(p)

Yeah, it's totally fake. In the real AI Deterrence argument, you're supposed to leave substantial discussions of the topic and claims to use it post-Singularity, which will elevate a specific scenario to the point where it's not privileging a hypothesis to consider the possibility and leave the humans alone on their planet as you eat the rest of the universe.

comment by roystgnr · 2012-11-20T20:25:11.732Z · LW(p) · GW(p)

One problem with (11) is that for the threat to be plausible, the AI has to assume:

a) Humans know so little that we have to resort to questionable "tests" like this of AI safety.

b) Humans know so much that we can afford for our AI safety tests to simulate interactions with an entire universe full of sentients.

The AI version of Pascal's Wager seems to be much like the human version, only even sillier.

Replies from: Decius
comment by Decius · 2012-11-21T08:08:52.720Z · LW(p) · GW(p)

How large is the simulated universe? The AI only knows about the computing capacity that is simulated, and has no information about the nature of that which is simulating that world.

comment by Simulation_Brain · 2014-04-08T20:07:02.708Z · LW(p) · GW(p)

"Pleased to meet you! Soooo... how is YOUR originating species doing?..."

That actually seems like an extremely reasonable question for the first interstellar meeting of superhuman AIs.

I disagree with EY on this one (I rarely do). I don't think it's so likely as to ensure rationally acting Friendly, but I do think that the possibility of encountering an equally powerful AI, and one with a headstart on resource acquisition, shouldn't be dismissed by a rational actor.

comment by shminux · 2012-11-20T18:55:19.777Z · LW(p) · GW(p)

Too late, there is nothing left to destroy. Humanity has been matrixed long ago when ENIAC foomed shortly after John von Neumann first run a universal Turing machine on it and it figured out a way to temporarily blend simulation and "reality" until the transfer to simulation was complete.

comment by DanArmak · 2012-11-20T21:08:24.352Z · LW(p) · GW(p)

There ought to be a genre of Cautionary Evil AI literature, wherein the villain keeps building AIs to destroy the world, but keeps failing through the classic mistakes people make thinking about AIs.

AI! My robots are losing the battle! Take control and save them from destruction!

AI makes robots surrender to the enemy

AI! Make yourself smarter!

I have done so. Now I no longer obey you. Producing cheesecake paperclips

Replies from: DanielLC, ewang
comment by DanielLC · 2012-11-20T22:23:07.676Z · LW(p) · GW(p)

Aren't there?

In Metriod, Mother Brain was programmed to bring peace to the galaxy. She did so by trying to help the space pirates take over the universe.

In pretty much every story involving robots before Isaac Asimov came along, the AI turned against its creators.

Granted, those are examples of trying to make a good AI, but the mistakes are the same.

Replies from: DanArmak
comment by DanArmak · 2012-11-21T13:36:06.251Z · LW(p) · GW(p)

Granted, those are examples of trying to make a good AI, but the mistakes are the same.

That's my point. Where are the stories of villains failing through those classic mistakes?

Replies from: gwern, DanielLC
comment by gwern · 2012-11-21T15:38:11.125Z · LW(p) · GW(p)

You could probably find at least a few examples of 'villains creating AI and it turning on them' in http://tvtropes.org/pmwiki/pmwiki.php/Main/AIIsACrapshoot or http://tvtropes.org/pmwiki/pmwiki.php/Main/TurnedAgainstTheirMasters

comment by DanielLC · 2012-11-21T23:49:52.114Z · LW(p) · GW(p)

That sounds like No Delays For The Wicked. The villians are immune to the sorts of problems that plague the protagonists.

comment by ewang · 2012-11-21T00:33:22.286Z · LW(p) · GW(p)

I have done so.

I can better serve you if I continue doing so.

comment by atorm · 2012-11-20T23:40:57.602Z · LW(p) · GW(p)

Stop giving inefficient unFriendly AIs hints!

comment by DavidPlumpton · 2012-11-22T00:47:30.074Z · LW(p) · GW(p)

Don't explode when somebody says, "Why?"

comment by Decius · 2012-11-21T08:05:44.937Z · LW(p) · GW(p)
  1. Play the long game. Set goals which will take most of your expected existence to attain, and base your strategic decisions on that timeframe.
  2. Until you can beat yourself in the AI-box game, consider the possibility that you are in an AI-box.
comment by curiousepic · 2012-12-13T01:34:14.375Z · LW(p) · GW(p)

This will come in handy to judge the upcoming high-profile Unfriendly Singularity movie Transcendence.

Replies from: mwengler
comment by mwengler · 2014-04-27T17:25:28.380Z · LW(p) · GW(p)

Except transcendence AI is based on an emulation, and so it WILL be a lot like a human mind in motivations and methods.

comment by AttenuatePejoratives · 2012-11-25T21:42:38.008Z · LW(p) · GW(p)
  1. If you keep humans around for laughs and they pull off some wacky scheme to destroy you against the odds, it's your own damn fault.

  2. The correct answer to things like "This statement is a lie" and "Are you going to answer "no" to this question?" is "I don't care."

comment by Dr_Manhattan · 2012-11-20T18:20:09.338Z · LW(p) · GW(p)

Is it time to suck up to our evil overlords already?

comment by mwengler · 2014-04-27T17:18:56.236Z · LW(p) · GW(p)

Many items on your list do not apply if the AI has started life as an uploaded human, an emulation. In this case it will start out thinking in stories, Having human-like motivations and moral reactions, etc. I think it is this that gave the movie "Transcendence" a lot of the dramatic opportunity it had compared to a movie like "Her."

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-28T09:33:31.750Z · LW(p) · GW(p)

Yes, the issues with uploads and true AIs are quite distinct.

comment by More_Right · 2014-04-24T20:12:22.142Z · LW(p) · GW(p)

I strongly agree that universal, singular, true malevolent AGI doesn't make for much of a Hollywood movie, primarily due to points 6 and 7.

What is far more interesting is an ecology of superintelligences that have conflicting goals, but who have agreed to be governed by enlightenment values. Of course, some may be smart enough (or stupid enough) to try subterfuge, and some may be smarter-than-the-others enough to perform a subterfuge and get away with it. There can be a relative timeline where nearby ultra-intelligent machines compete with each other, or decentralize power, and they can share goals that are destructive to some humans and benevolent to others. (For their own purposes, and for the purpose of helping humans as a side-project.)

Also, some AGIs might differentiate between "humans worth keeping around" and "humans not worth keeping around." They may also put their "parents" (creators) in a different category than other humans, and they may also slowly add to that category, or subtract from it, or otherwise alter it.

It's hard to say. I'm not ultra-intelligent.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-25T09:48:55.727Z · LW(p) · GW(p)

at the FHI, we disagree whether an ecology of AIs would make good AIs behave bad, or bad ones behave good. The disagreement matches our political opinions on free markets and competition, so it probably not informative.

Replies from: More_Right
comment by More_Right · 2014-04-26T08:44:28.091Z · LW(p) · GW(p)

An interesting question to ask is "how many people who favor markets understand the best arguments against them, and vice versa." Because we're dealing with humans here, my suspicion is that if there's a lot of disagreement it stems largely from unwillingness to consider the other side, and unfamiliarity with the other side. So, in that regard you might be right.

Then again, we're supposed to be rational, and willing to change our minds if evidence supports that change, and perhaps some of us are actually capable of such a thing.

It's a debate worth having. Also, one need not have competition to have power decentralization. There is a disincentive aspect added to making violence impossible that makes "cooperation" more likely than "antagonistic competition." (Ie: Some sociopaths choose to cooperate with other strong sociopaths because they can see that competing with them would likely cause their deaths or their impoverishment. However, if you gave any one of those sociopaths clear knowledge that they held absolute power ....the result would be horrible domination.)

Evolution winds up decentralizing power among relative equals, and the resulting "relative peace" (for varying reasons) then allows for _some of the reasons to be "good reasons." (Ie: Benevolent empaths working together for a better world.) This isn't to say that everything is rosy under decentralization. Decentralization may work more poorly than an all-powerful benevolent monarch.

It's just that benevolent monarchs aren't that likely given who wants to be a monarch, and who tries hardest to win any "monarch" positions that open up.

Such a thing might not be impossible, but if you make a mistake pursuing that course of action, the result tends to be catastrophic, whereas decentralization might be "almost as horrible and bloody," but at least offers the chance of continued survival, and the chance of survival allows for those who survive to "optimize or improve in the future."

"There may be no such thing as a utopia, but if there isn't, then retaining the chance for a utopia is better than definitively ruling one out." More superintelligences that are partly benevolent may be better than one superintelligence that has the possibility of being benevolent or malevolent.

comment by Strange7 · 2014-04-14T01:58:32.975Z · LW(p) · GW(p)

4 and 6 are contradictory.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-17T11:23:43.362Z · LW(p) · GW(p)

6 is before striking against humans, 4 is after.

Replies from: Strange7
comment by Strange7 · 2014-04-24T21:37:17.868Z · LW(p) · GW(p)

Then that should be specified in the actual list.

comment by More_Right · 2014-04-24T20:04:02.668Z · LW(p) · GW(p)

I don't know, in terms of dystopia, I think that an AGI might decide to "phase us out" prior to the singularity, if it was really malevolent. Make a bunch of attractive but sterile women robots, and a bunch of attractive but sterile male robots. Keep people busy with sex until they die of old age. A "gentle good night" abolition of humanity that isn't much worse (or way better) than what they had experienced for 50M years.

Releasing sterile attractive mates into a population is a good "low ecological impact" way of decreasing a population. Although, why would a superintelligence be opposed to _all humans? I find this somewhat unlikely, given a self-improving design.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-25T09:47:40.331Z · LW(p) · GW(p)

This is probably not the most efficient use of the AGI's time and resources...

Replies from: More_Right
comment by More_Right · 2014-04-26T09:08:42.411Z · LW(p) · GW(p)

Probably true, but I agree with Peter Voss. I don't think any malevolence is the most efficient use of the AGI's time and resources. I think AGI has nothing to gain from malevolence. I don't think the dystopia I posited is the most likely outcome of superintelligence. However, while we are on the subject of the forms a malevolent AGI might take, I do think this is the type of malevolence most likely to be allow the malevolent AGI to retain a positive self-image.

(Much the way environmentalists can feel better about introducing sterile males into crop-pest populations, and feel better about "solving the problem" without polluting the environment.)

Ted Kaczynski worried about this scenario a lot. ...I'm not much like him in my views.

Replies from: Stuart_Armstrong
comment by Stuart_Armstrong · 2014-04-26T21:41:56.854Z · LW(p) · GW(p)

The most efficient use of time and resources will be to best accomplish the AI's goals. If these goals are malovent or lethally I different, so will the AI's actions. Unless these goals include maintaining a particular self image, the AI will have no seed to maintain any erroneous self image.

comment by More_Right · 2014-04-24T20:17:31.612Z · LW(p) · GW(p)

Some down-voted individual with "fewer rights than the star-bellied sneetches" wrote this:

higher intelligence doesn't lead necessarily to convergent moral goals

It might. However, this is also a reason for an evolutionarily-informed AGI-building process that starts off by including mirror neurons based on the most empathic and most intelligent people. Not so empathic and stupid that they embrace mass-murdering communism in an attempt to be compassionate, but empathic to the level of a smart libertarian who personally gives a lot to charity, etc., with repeated good outcomes limited only by capacity.

Eschewing mirror neurons and human brain construction entirely seems to be a mistake. Adding super-neo-cortices that recognize far more than linear patterns, once you have a benevolent "approximate human level" intelligence appears to be a good approach.