What is the best compact formalization of the argument for AI risk from fast takeoff?

post by utilitymonster · 2012-03-13T01:44:51.788Z · LW · GW · Legacy · 21 comments

Contents

21 comments

Many people complain that the Singularity Institute's "Big Scary Idea" (AGI leads to catastrophe by default) has not been argued for with the clarity of, say, Chalmers' argument for the singularity. The idea would be to make explicit what the premise-and-inference structure of the argument is, and then argue about the strength of those premises and inferences.

Here is one way you could construe one version of the argument for the Singularity Institute's "Big Scary Idea":

  1. At some point in the development of AI, there will be a very swift increase in the optimization power of the most powerful AI, moving from a non-dangerous level to a level of superintelligence. (Fast takeoff)
  2. This AI will maximize a goal function.
  3. Given fast takeoff and maximizing a goal function, the superintelligent AI will have a decisive advantage unless adequate controls are used.
  4. Adequate controls will not be used. (E.g. Won’t box/boxing won’t work)
  5. Therefore, the superintelligent AI will have a decisive advantage
  6. Unless that AI is designed with goals that stably align with ours, if the superintelligent AI has a decisive advantage, civilization will be ruined. (Friendliness is necessary)
  7. Unless the first team that develops the superintelligent AI makes adequate preparations, the superintelligent AI will not have goals that stably align with ours.
  8. Therefore, unless the first team that develops the superintelligent AI makes adequate preparations, civilization will be ruined shortly after fast takeoff
  9. The first team that develops the superintelligent AI will fail to make adequate preparations
  10. Therefore, civilization will be ruined shortly after fast takeoff.
Edit to add: premises should be read as assuming the truth of all above premises. E.g., (9) is assuming that we've created an artificial agent with a decisive advantage.

My questions are:

21 comments

Comments sorted by top scores.

comment by John_Maxwell (John_Maxwell_IV) · 2012-03-13T04:20:15.176Z · LW(p) · GW(p)

I created a new page on the wiki to collect links like the Big Scary Idea one: Criticism of the sequences. If anyone knows of more links to intelligent disagreement with ideas prevailing on Less Wrong, please add them!

Replies from: lukeprog, orthonormal, XiXiDu
comment by lukeprog · 2012-03-14T13:37:49.311Z · LW(p) · GW(p)

Wasn't there a professional physicist who criticizes something specific in Eliezer's QM sequence? Can't remember where...

Also, not sure if these count, but:

comment by orthonormal · 2012-03-13T22:53:02.161Z · LW(p) · GW(p)

That's an awesome idea.

comment by XiXiDu · 2012-03-13T09:25:54.136Z · LW(p) · GW(p)

If anyone knows of more links to intelligent disagreement with ideas prevailing on Less Wrong...

Well, people like user:wedrifid disagree but I don't think that the following posts are fallacious (or at least I haven't heard counterarguments that would render those posts obsolete):

(I am currently writing up a post for my personal blog where I list all requirements that need to be true in conjunction for SIAI to be the best choice when it comes to charitable giving.)

Replies from: Will_Newsome, wedrifid, lukeprog
comment by Will_Newsome · 2012-03-13T21:54:36.296Z · LW(p) · GW(p)

(I am currently writing up a post for my personal blog where I list all requirements that need to be true in conjunction for SIAI to be the best choice when it comes to charitable giving.)

Be careful, it's very common for people to gerrymander such probability estimates by unjustifiably assuming complete independence or complete dependence of certain terms. (This is true even if the "probability estimate" is only implicit in the qualitative structure of the argument.) If people think that's what you're doing then they're likely to disregard your conclusions even if the conclusions could have been supported by a weaker argument.

Replies from: gwern
comment by gwern · 2012-03-13T22:16:58.722Z · LW(p) · GW(p)

I've just pointed out something very similar.

comment by wedrifid · 2012-03-14T01:38:31.561Z · LW(p) · GW(p)

Well, people like user:wedrifid disagree but I don't think that the following posts are fallacious

I can confirm this. Or at least the second of the links is fallacious. The first was merely overwhelmingly weak (and so only fallacious to the extent that strong conclusions were declared.)

comment by lukeprog · 2012-03-13T18:35:59.692Z · LW(p) · GW(p)

XiXiDu has also replied with Risks from AI and Charitable Giving.

comment by lukeprog · 2012-03-13T01:54:52.080Z · LW(p) · GW(p)

Good work.

Alternatively, one might construe the argument this way:

  1. There will be AI++ (before too long, absent defeaters). [See Chalmers.]
  2. If the goals of the AI++ differ significantly from the goals of human civilization, human civilization will be ruined soon after the arrival of AI++.
  3. Without a massive effort the goals of the AI++ will differ significantly from the goals of human civilization.
  4. Therefore, without a massive effort human civilization will be ruined soon after the arrival of AI++.

But this may be a less useful structure than the more detailed one you propose. My version simply packs more sub-arguments and discussion into each premise.

The premises (in your argument) that I feel least confident about are #1, #2, and #4.

Replies from: amcknight
comment by amcknight · 2012-03-14T07:31:28.472Z · LW(p) · GW(p)

Premise #2 seems very likely to me. Can you provide me with reasons why it wouldn't be likely?

Replies from: lukeprog
comment by lukeprog · 2012-03-14T09:21:03.668Z · LW(p) · GW(p)

Premise 2 in my version or utilitymonster's version?

Replies from: amcknight
comment by amcknight · 2012-03-14T18:59:13.819Z · LW(p) · GW(p)

Sorry, utilitymonster's version.

comment by utilitymonster · 2012-03-14T20:20:36.373Z · LW(p) · GW(p)

I prefer this briefer formalization, since it avoids some of the vagueness of "adequate preparations" and makes premise (6) clearer.

  1. At some point in the development of AI, there will be a very swift increase in the optimization power of the most powerful AI, moving from a non-dangerous level to a level of superintelligence. (Fast take-off)
  2. This AI will maximize a goal function.
  3. Given fast-take off and maximizing a goal function, the superintelligent AI will have a decisive advantage unless adequate controls are used.
  4. Adequate controls will not be used. (E.g. Won’t box/boxing won’t work)
  5. Therefore, the superintelligent AI will have a decisive advantage
  6. Unless that AI is designed with goals that stably and extremely closely align with ours, if the superintelligent AI has a decisive advantage, civilization will be ruined. (Friendliness is necessary)
  7. The AI will not be designed with goals that stably and extremely closely align with ours.
  8. Therefore, civilization will be ruined shortly after fast take-off.
comment by [deleted] · 2012-03-15T20:24:24.514Z · LW(p) · GW(p)

Where does premise 6 come from? That seems to be weak point in the argument.

comment by Dmytry · 2012-03-14T07:43:14.132Z · LW(p) · GW(p)

ahh, by the way, the points i have the most confidence about: 4, 9 . It seems virtually certain for me that precautions will not be adequate. The situation is similar to getting a server of some kind unhackable on the first compile and run.

Same goes also for creation of friendly AI. The situation is worse than writing a first autopilot ever, and on the first run of that autopilot software, flying in it complete with automated takeoff and landing. The plane's just going to crash, period. We are this sloppy at software development and there is nothing we can do about it. The worst that can happen is the AI that is not friendly but does treat humans as special; it can euthanise humans even if we are otherwise useful for it, for example. A buggy friendly AI is probably the worst outcome. Seriously, the people who don't develop software got all sorts of entirely wrong intuitions with regards to ability to make something work right on the first try (even with automated theorem proving). Furthermore, a very careful try is a very slow one as well, and is unlikely to be the first. What I am hoping for is that the AIs will just quietly wirehead themselves.

comment by Dmytry · 2012-03-13T16:25:39.155Z · LW(p) · GW(p)

Well, the critique I have:

1: We don't know that AI can go FOOM. It may be just as hard to prevent self improving AI from wireheading (when it becomes super-intelligent) as it is to ensure friendliness. Note: perfect wireheading has infinite utility according to agent prone to wireheading; the length of wireheading experience in time (or it's volume in space) is then irrelevant. The whole premise of fear of UFAI is that intelligence (human intelligence) can have faulty self improvement; it's inconsistent to assume that about human intelligence but not about any AI.

2: We don't know that the AI would likely to be substantially unfriendly. Other humans, and especially groups of humans (corporations, governments) are non-you non-friendly-to-you intelligences too, with historical examples of extreme unfriendliness (i'm going to coin a law that the (un)friendly intelligence discussion is incomplete without mention of nazis), yet they can be friendly enough - permitting you to live normal life while paying taxes (but note the military draft, which happens when meta-organism is threatened). It is plausible enough that the AI would be friendly enough. Humans would be cheap to store.

3: We may get there by mind uploading, which seems to me like the safest option.

4: We don't actually know if FAI attempt is more, or less dangerous than messy AI like 'replicate function of cortical columns, simulate a lot of cortical columns'. FAI attempt could just as well be more dangerous. You get it wrong, AI euthanizes you with the best intentions. The extrapolated volition idea btw entirely ignores fact that you are a massively parallel system that can have different goals in different parts of itself (and the mankind too is that kind of system, albeit less well connected).

The argumentation everywhere has very low external probabilities (when I evaluate probabilities if I see conflicting arguments that are opposite and both look similarly plausible, I assume external probability of zero, even if its 1 argument vs 10; much more so for 10 arguments vs 10), and so acting upon those arguments has rather low utility values.

comment by Dmytry · 2012-03-13T16:24:21.013Z · LW(p) · GW(p)

Well, the critique I have:

1: We don't know that AI can go FOOM. It may be just as hard to prevent self improving AI from wireheading (when it becomes super-intelligent) as it is to ensure friendliness. Note: perfect wireheading has infinite utility according to agent prone to wireheading; the length of wireheading experience in time (or it's volume in space) is then irrelevant. The whole premise of fear of UFAI is that intelligence (human intelligence) can have faulty self improvement.

2: We don't know that the AI would likely to be substantially unfriendly. Other humans, and especially groups of humans (corporations, governments) are non-you non-friendly-to-you intelligences too, with historical examples of extreme unfriendliness (i'm going to coin a law that the (un)friendly intelligence discussion is incomplete without mention of nazis), yet they can be friendly enough - permitting you to live normal life while paying taxes (but note the military draft). It is plausible enough that the AI would be friendly enough. Humans would be cheap to store.

3: We may get there by mind uploading, which seems to me like the safest option (and a botched attempt at FAI as a very dangerous one).

4: We don't actually know if FAI attempt is more, or less dangerous than messy AI like 'replicate function of cortical columns, simulate a lot of cortical columns'. It could just as well be more dangerous.

The argumentation everywhere has very low external probabilities, and so acting upon that argumentation has rather low utility values.

comment by timtyler · 2012-03-13T20:10:47.903Z · LW(p) · GW(p)
  1. At some point in the development of AI, there will be a very swift increase in the optimization power of the most powerful AI, moving from a non-dangerous level to a level of superintelligence. (Fast takeoff)

...unless people want it to go slowly. It isn't a law of nature that things will go quickly. It seems likely that a more unified society will be able to progress as slowly as it wants to. There are plenty of proposals to throttle development - via "nannies" or other kinds of safety valve.

Insistence on a rapid takeoff arises from a position of technological determinism. It ignores sociological factors.

IMO, the "rapid takeoff" idea should probably be seen as a fundraising ploy. It's big, scary, and it could conceivably happen - just the kind of thing for stimulating donations.

Replies from: utilitymonster
comment by utilitymonster · 2012-03-14T00:04:08.284Z · LW(p) · GW(p)

IMO, the "rapid takeoff" idea should probably be seen as a fundraising ploy. It's big, scary, and it could conceivably happen - just the kind of thing for stimulating donations.

It seems that SIAI would have more effective methods for fundraising, e.g. simply capitalizing on "Rah Singularity!". I therefore find this objection somewhat implausible.

Replies from: timtyler
comment by timtyler · 2012-03-14T10:23:38.945Z · LW(p) · GW(p)

They did originally try the "Rah Singularity" strategy. Only more recently did they switch to using more negative marketing.

Replies from: CarlShulman
comment by CarlShulman · 2012-03-14T19:24:36.931Z · LW(p) · GW(p)

See Singularity University and the Singularity Summit. Cheerleading is more effective in raising money (and there are many more things to do in that line), but money to cheerlead and accelerate, not to try to shape outcomes.