Students asked to defend AGI danger update in favor of AGI riskiness

lukeprog

Students asked to defend AGI danger update in favor of AGI riskiness

post by lukeprog · 2011-10-18T05:24:33.812Z · LW · GW · Legacy · 38 comments

38 comments

From Geoff Anders of Leverage Research:

In the Spring semester of 2011, I decided to see how effectively I could communicate the idea of a threat from AGI to my undergraduate classes. I spent three sessions on this for each of my two classes. My goal was to convince my students that all of us are going to be killed by an artificial intelligence. My strategy was to induce the students to come up with the ideas themselves. I gave out a survey before and after. An analysis of the survey responses indicates that the students underwent a statistically significant shift in their reported attitudes. After the three sessions, students reported believing that AGI would have a larger impact1 and also a worse impact2 than they originally reported believing.

Not a surprising result, perhaps, but the details of how Geoff taught AGI danger and the reactions of his students are quite interesting.

38 comments

Comments sorted by top scores.

comment by lessdazed · 2011-10-18T10:36:08.838Z · LW(p) · GW(p)

In the (season) semester of (year), I decided to see how effectively I could communicate the idea of a threat from (noun) to my undergraduate classes. I spent three sessions on this for each of my two classes. My goal was to convince my students that all of us are going to be killed by a(n) (noun). My strategy was to induce the students to come up with the ideas themselves. I gave out a survey before and after. An analysis of the survey responses indicates that the students underwent a statistically significant shift in their reported attitudes. After the three sessions, students reported believing that (noun) would have a larger impact1 and also a worse impact2 than they originally reported believing.

This is less than surprising. I can't think of any threat already existing in the minds of some undergraduates that a competent believing professor requiring attendance couldn't, on average, increase. Control groups are needed.

Replies from: Normal_Anomaly

↑ comment by Normal_Anomaly · 2011-10-18T21:07:34.063Z · LW(p) · GW(p)

What would you want to do with the control groups? Teach them that AGI won't destroy the world? Not teach them anything in particular about AI? Teach them that invading aliens will destroy the world, or that the biblical End Times are near? Any of these would yield useful information. Which one(s) do you favor?

Replies from: lessdazed

↑ comment by lessdazed · 2011-10-18T22:15:58.055Z · LW(p) · GW(p)

Not teach them anything in particular about AI? Teach them that...the biblical End Times are near?

I was specifically thinking of these exact two conditions, which is why I said "groups", for they are different in kind. The aliens example is even better than the supernatural end times one.

I thought of but rejected "Teach them that AGI won't destroy the world?" when I couldn't think of how to implement that neutrally. How would one do that?

Replies from: Normal_Anomaly

↑ comment by Normal_Anomaly · 2011-10-18T23:35:05.043Z · LW(p) · GW(p)

I thought of but rejected "Teach them that AGI won't destroy the world?" when I couldn't think of how to implement that neutrally.

True. Most arguments against the AGI-apocalypse scenario are responses to arguments for it; it would be difficult to present only one side of the question.

comment by Shmi (shminux) · 2011-10-18T18:31:05.312Z · LW(p) · GW(p)

This seems like an incredibly biased presentation, with the author never realizing the depth of the bias. Then again, he writes "My goal was to convince my students that all of us are going to be killed by an artificial intelligence," not to probe the validity of the point, so his bottom line was already written.

He says "I presented a neutral summary" after judiciously guiding the students through one-sided claims and refutations about the AGI (can never play -- plays ), not any of the claims that have not (yet) been refuted, then spicing it up with the Terminator quotes.

He says "At all points in the discussion I did my best to appear neutral and to not reveal my views." right after scaring them with a bomb in a trashcan.

He assigned no homework and gave no time outside the class for the students to come up with counter-arguments.

He writes:

In my classes, my primary goal was to teach students how to construct and assess arguments... Arguments can be assessed. If an argument has flaws, you can find those flaws. If you find flaws in an argument, the argument is refuted. If you cannot find flaws, you can take time to think about it more. If you still cannot find flaws, you should consider the possibility that the argument has no flaws. And if there are no flaws in an argument, then the conclusion of that argument has to be true, no matter what that conclusion might be.

The idea that an argument can sometimes be tested experimentally seems utterly foreign to him (even when it is in his favor, like the "AI can never be better at chess" one). Must be something about the philosophers in general, I suppose.

He primed his students in advance:

Up to this point, I had not presented any AI material to anyone in any of the classes. I had only remarked a couple of times that the AI arguments were “awesome” or “epic” or some such.

He did not attempt to provide a balanced context by inviting (or at least quoting) an expert in the area who does not share his views.

So his conclusion, that it is possible to convince a person who never thought about a topic before of the dangers of an AGI, was a foregone one. He could probably have convinced them that AGI is the second coming of Christ, if he bothered (it is a Catholic college, so the leap is not that large).

Replies from: lessdazed

↑ comment by lessdazed · 2011-10-19T05:07:30.241Z · LW(p) · GW(p)

"My goal was to convince my students that all of us are going to be killed by an artificial intelligence," not to probe the validity of the point, so his bottom line was already written.

Sort of. Assuming he was basically convinced that "...all of us are going to be killed by an artificial intelligence," he knew he was trying to convince his students of that but he did not know if he would succeed at doing so with this method. He wasn't testing the dangers of AI, he was testing a method of persuasion.

comment by timtyler · 2011-10-18T13:45:48.345Z · LW(p) · GW(p)

The paper gives what it describes as the “AGI Apocalypse Argument” - which ends with the following steps:

_12. For almost any goals that the AGI would have, if those goals are pursued in a way that would yield an overwhelmingly large impact on the world, then this would result in a catastrophe for humans.

_13. Therefore, if an AGI with almost any goals is invented, then there will be a catastrophe for humans.

_14. If humans will invent an AGI soon and if an AGI with almost any goals is invented, then there will be a catastrophe for humans, then there will be an AGI catastrophe soon.

_15. Therefore, there will be an AGI catastrophe soon.

It is hard to tell whether anyone took this seriously - but it seems that an isomorphic argument 'proves' that computer programs will crash - since "almost any" computer program crashes. The “AGI Apocalypse Argument” as stated thus appears to be rather silly.

If the stated aim was: "to convince my students that all of us are going to be killed by an artificial intelligence" - why start with such a flawed argument?

Replies from: Bongo, shinoteki, jhuffman

↑ comment by Bongo · 2011-10-19T22:34:58.752Z · LW(p) · GW(p)

it seems that an isomorphic argument 'proves' that computer programs will crash - since "almost any" computer program crashes.

More obviously, an isomorphic argument 'proves' that books will be gibberish - since "almost any" string of characters is gibberish. An additional argument that non-gibberish books are very difficult to write and that naively attempting to write a non-gibberish book will almost certainly fail on the first try, is required. The analogous argument exists for AGI, of course, but is not given there.

Replies from: timtyler

↑ comment by timtyler · 2011-10-20T15:53:20.149Z · LW(p) · GW(p)

Right - so we have already had 50+ years of trying and failing. A theoretical argument that we won't succeed the first time does not tell us very much that we didn't already know.

What is more interesting is the track record of engineers of not screwing up or killing people the first time.

We have records about engineers killing people for cars, trains, ships, aeroplanes and rockets. We have failure records from bridges, tunnels and skyscrapers.

Engineers do kill people - but often it is deliberately - e.g. nuclear bombs - or with society's approval - e.g. car accidents. There are some accidents which are not obviously attributable to calculated risks - e.g. the Titanic, or the Tacoma Narrows bridge - but they typicallly represent a small fraction of the overall risks involved.

↑ comment by shinoteki · 2011-10-18T16:35:56.875Z · LW(p) · GW(p)

It is hard to tell whether anyone took this seriously - but it seems that an isomorphic argument 'proves' that computer programs will crash - since "almost any" computer program crashes. The “AGI Apocalypse Argument” as stated thus appears to be rather silly.

I don't see why this makes the argument seem silly. It seems to me that the isomorphic argument is correct, and that computer programs do crash.

Replies from: timtyler

↑ comment by timtyler · 2011-10-18T21:26:42.123Z · LW(p) · GW(p)

Some computer programs crash - just as some possible superintelligences would kill alll humans.

However, the behavior of a computer program chosen at random tells you very little about how an actual real-world computer program will behave - since computer programs are typically produced by selection processes performed by intelligent agents.

The "for almost any goals" argument is bunk.

Replies from: Eugine_Nier, J_Taylor, Logos01

↑ comment by Eugine_Nier · 2011-10-19T06:07:17.706Z · LW(p) · GW(p)

Some computer programs crash - just as some possible superintelligences would kill all humans.

No most computer programs crash, it's just that most people never see them because said programs are repeatedly tested and modified until they no longer crash before being shown to people (this process is called "debugging"). With a self-modifying AI this is a lot harder to do.

Replies from: timtyler

↑ comment by timtyler · 2011-10-19T17:34:12.237Z · LW(p) · GW(p)

Some computer programs crash - just as some possible superintelligences would kill all humans.

No *most" computer programs crash [...]

By "no", you apparently mean "yes".

With a self-modifying AI this is a lot harder to do.

Well, that is a completely different argument - and one that would appear to be in need of supporting evidence - since automated testing, linting and the ability to program in high-level languages are all improving simultaneously.

I am not aware of any evidence that real computer programs are getting more crash-prone with the passage of time.

Replies from: Eugine_Nier

↑ comment by Eugine_Nier · 2011-10-19T23:35:51.250Z · LW(p) · GW(p)

With a self-modifying AI this is a lot harder to do.

Well, that is a completely different argument - and one that would appear to be in need of supporting evidence - since automated testing, linting and the ability to program in high-level languages are all improving simultaneously.

The point is that the first time you run the seed AI it will attempt to take over the world, so you don't have the luxury of debugging it.

Replies from: timtyler, asr

↑ comment by timtyler · 2011-10-20T15:01:05.874Z · LW(p) · GW(p)

The point is that the first time you run the seed AI it will attempt to take over the world, so you don't have the luxury of debugging it.

That is not a very impressive argument, IMHO.

We will have better test harnesses by then - allowing such machines to be debugged.

↑ comment by asr · 2011-10-20T00:47:20.113Z · LW(p) · GW(p)

Almost certainly, the first time you run the seed AI, it'll crash quickly. I think it's very unlikely that you construct a successful-enough-to-be-dangerous AI without a lot of mentally crippled ones first.

Replies from: wedrifid

↑ comment by wedrifid · 2011-10-20T13:48:04.733Z · LW(p) · GW(p)

Almost certainly, the first time you run the seed AI, it'll crash quickly. I think it's very unlikely that you construct a successful-enough-to-be-dangerous AI without a lot of mentally crippled ones first.

If so then we are all going to die. That is, if you have that level of buggy code then it is absurdly unlikely that the first time the "intelligence" part works at all it works well enough to be friendly. (And that scenario seems likely.)

Replies from: timtyler

↑ comment by timtyler · 2011-10-20T15:03:06.971Z · LW(p) · GW(p)

The first machine intellligences we build will be stupid ones.

By the time smarter ones are under developpment we will have other trustworthy smart machines on hand to help keep the newcomers in check.

↑ comment by J_Taylor · 2011-10-19T03:14:08.461Z · LW(p) · GW(p)

I am not entirely sure I disagree with you. However, I am having difficulty modeling you.

"Achieving a goal" seems to mean, for our purposes, something along the lines of "Bringing about a world-state." Most possible world-states do not involve human existence. Thus, it seems that for most possible goals, achieving a goal entails human extinction.

However, your mention of computer programs being produced by intelligent agents is interesting. Are you implying that most AGI's (assume these intelligences can go FOOM) would not result in human extinction?

If this is not what you were implying, I apologize for modeling you poorly. If this is what you were implying, I would like to indicate that this post was non-hostile.

Replies from: timtyler

↑ comment by timtyler · 2011-10-19T17:41:04.571Z · LW(p) · GW(p)

Are you implying that most AGI's (assume these intelligences can go FOOM) would not result in human extinction?

Questions about fractions of infinite sets require an enumeration strategy to be specified - or they don't make much sense. Assuming lexicographic ordering of their source code - and only considering the set of superintelligent programs - no: I don't mean to imply that.

↑ comment by Logos01 · 2011-10-19T03:24:44.210Z · LW(p) · GW(p)

The "for almost any goals" argument is bunk.

A statement which we can derive from the simple fact that the mere existence of general intelligence (apes) does not result automatically in catastrophe.

I wonder how long it'll take before people catch onto the notion that artificial "dumbness" is in many ways a more interesting field than artificial "intelligence"? (As in, how much could an AGI no smarter than a dog, but hooked into expert systems similar to Watson, do?)

Replies from: TheOtherDave

↑ comment by TheOtherDave · 2011-10-19T03:53:57.503Z · LW(p) · GW(p)

It was pretty well accepted at MIT's Media Lab back when my orbit took me around there periodically, a decade or so ago, that there was a huge amount of low-hanging fruit in this area... not necessarily of academic interest, but damned useful (and commercial).

Replies from: JoshuaZ, Logos01

↑ comment by JoshuaZ · 2011-10-19T04:02:35.501Z · LW(p) · GW(p)

That's interesting since my impression if anything is the exact opposite. There seem to be a lot of people trying to apply Bayesian learning systems and expert learning systems to all sorts of different practical problems. I wonder if this is a new thing or whether I simply don't have a good view of the field.

Replies from: Logos01

↑ comment by Logos01 · 2011-10-19T04:07:12.112Z · LW(p) · GW(p)

For what it's worth, I consider Bayesian learning systems and expert learning systems to be "narrow" AI -- hence the example I gave of Watson.

I think Ben Goertzel's Novamente project is the closest extant project to a 'general' AI of any form that I've heard of.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-10-19T04:09:54.755Z · LW(p) · GW(p)

I can see that for expert systems, but Bayesian learning systems seem to be a distinct category. The primary limits seem to be scalibility not architecture.

Replies from: Logos01

↑ comment by Logos01 · 2011-10-19T06:12:58.348Z · LW(p) · GW(p)

Bayesian learning systems are essentially another form of trainable neural network. That makes them very good in a narrow range of categories but also makes them insufficient to the cause of achieving general intelligence.

I do not see that scaling Bayesian learning networks would ever achieve general intelligence. No matter how big the hammer, it'll never be a wrench. That being said, I do believe that some form of pattern recognition and 'selective forgetting' is important to cognition and as such Bayesian learning architecture is a good tool towards that end.

↑ comment by Logos01 · 2011-10-19T12:50:16.977Z · LW(p) · GW(p)

not necessarily of academic interest, but damned useful (and commercial).

Actually, I'm curious that isn't seen as an area of significan academic interest -- designing artificial systems around being efficient parsers of extraneous data. I recall that one of the major differences between Deep Blue and Deep Fritz in the Kasperov chess matches was precisely that Fritz was designed around not probing every last possible set of playable moves; that is, Deep Fritz was "learning to forget the right things".

It seems to me that understanding this mechanism and how it behaves in humans could have huge potential for opening up the understanding of general intelligence and cognition. And that's a very academic concern.

↑ comment by jhuffman · 2011-10-18T14:27:37.454Z · LW(p) · GW(p)

Yes, 15 does not follow unless we resolve this question from 14:

_14. If humans will invent an AGI soon

comment by [deleted] · 2011-10-18T13:32:57.850Z · LW(p) · GW(p)

Does anyone else see a problem with the data table on page 22 of that PDF file?

The paper mentions this criteria on page 22:

Cells in the “change” column where I did not receive both a “before” answers and an “after” answer have been left blank.

Yet the data points at 15 8:30am, 16 8:30am, and 17 8:30am on page 22 all appear to have blank Before, present After, and a Change which is identical to the After column.

This also appears to affect the change column statistics on page 24, (8:30 a.m) and on page 26 (Both Classes Combined) For page 28, the people who only have one survey are dropped entirely, and I no longer see this problem.

Since he uses the statistics on page 28 for his conclusion, this may not change the conclusion, but I did want to point it out.

Replies from: Geoff_Anders

↑ comment by Geoff_Anders · 2011-10-19T17:28:10.269Z · LW(p) · GW(p)

Thanks for pointing this out. There was in fact an error. I've fixed the error and updated the study. Some of the conclusions embedded in tables change; the final conclusions reported stay the same.

I've credited you on p.3 of the new version. If you want me to credit you by name, please let me know.

Thanks again!

comment by Geoff_Anders · 2011-10-19T18:56:07.018Z · LW(p) · GW(p)

Hi everyone. Thanks for taking an interest. I'm especially interested in (a) errors committed in the study, (b) what sorts of follow-up studies would be the most useful, (c) how the written presentation of the study could be clarified.

On errors, Michaelos already found one - I forgot to delete some numbers from one of the tables. That error has been fixed and Michaelos has been credited. Can anyone see any other errors?

On follow-up studies, lessdazed has suggested some. I don't know if we need to see what happens when nothing is presented on AGI; I think our "before" surveys are sufficient here. But trying to teach some alternative threat is an interesting idea. I'm interested in other ideas as well.

On clarity of presentation, it will be worth clarifying a few things. For instance, the point of the study was to test a method of persuasion, not to see what students would do with an unbiased presentation of evidence. I'll try to make that more obvious in the next version of the document. It would be good to know what other things might be misunderstood.

Replies from: None

↑ comment by [deleted] · 2011-10-20T12:57:21.167Z · LW(p) · GW(p)

I appreciate the credit and have sent you a message with my name, but I have to let you know that while Version 1.2 contains the fix, version 1.3 appears to have reverted back to the unfixed version as if it was made off of version 1.0 instead of 1.2.

Replies from: Geoff_Anders

↑ comment by Geoff_Anders · 2011-10-21T00:53:28.208Z · LW(p) · GW(p)

Fixed.

comment by Eugine_Nier · 2011-10-19T06:10:06.249Z · LW(p) · GW(p)

This post seems to serve no purpose except to promote the dark arts.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2011-10-21T16:41:00.694Z · LW(p) · GW(p)

A quote from the PDF:

Also, wherever possible I tried to choose material that was freaky. The Big Dog video is particularly good example, as lots of people seem to find Big Dog freaky. Of course, at no point did I comment on the freakiness. I did not want my students to think that I wanted to unsettle them. I simply wanted them to experience their own natural reactions as they witnessed the power of artificial intelligence unfolding in front of them.

I then played two clips from the movie Terminator 2: Judgment Day. The first clip, from the very beginning of the movie, showed the future war between humans and robots. The second clip showed John Connor, Sarah Connor and the Terminator discussing the future of humanity and how the artificial intelligence Skynet was built. I chose the first clip in order to vividly present the image of an AGI catastrophe. I chose the second clip in order to present the following pieces of dialogue. ... These were the most ominous and portentous bits of dialogue I could find.

So, yes, dark arts. But the way he kept asking "And how would the AI do that?" was excellent.

comment by Douglas_Knight · 2011-10-18T17:33:59.661Z · LW(p) · GW(p)

I found this an extremely surprising result. Geoff Anders claims immediate effects from essentially only two interventions:

But you see, a plan can’t be very good if it can be thwarted by some mild fluctuations in the weather. Let’s say there’s a thunderstorm and the power goes out. Well, then the AGI will turn off. And if it turns off, it won’t be able to accomplish its goal of becoming the best possible chess player. You see, if we humans executed your plans, we would all die of starvation. We would study the rules of chess, we’d calculate chess moves and then we’d die...

and

But what if humans don’t want to install a backup generator? ... Alright, you have made some progress. You’ve solved the power source problem. But in doing so you replaced it with another problem: the human compliance problem.

There were more interventions between these and the surveys of average belief, but these interventions caused at least a few students to generate the idea that AGIs are much more creative and powerful than in Terminator 2. The effect on the tail seems to me more important and surprising than the effect on the mean.

There were a lot of interventions before these two, including whatever idiosyncrasies Anders's philosophy course had, but the outcome before these two interventions seemed pretty standard. The first AI day seemed pretty standard. The chess exercise is probably not common and the two quotes above require its context, but the initial reaction to the chess exercise did not surprise me.

comment by JoshuaZ · 2011-10-18T12:48:59.519Z · LW(p) · GW(p)

I don't know of any studies backing this up, but I'd be expect that if a position doesn't go against deeply held beliefs then having to argue for it will make you update in that direction.

Replies from: Logos01

↑ comment by Logos01 · 2011-10-19T03:17:33.602Z · LW(p) · GW(p)

Sounds like a classic case of priming to me.

Students asked to defend AGI danger update in favor of AGI riskiness

Contents

38 comments