Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization"

daniel-kokotajlo

Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization"

post by Daniel Kokotajlo (daniel-kokotajlo) · 2020-02-27T18:10:11.129Z · LW · GW · 5 comments

5 comments

Oren Etzioni recently wrote a popular article titled "How to know if artificial intelligence is about to destroy civilization." I think it's a good idea to write publicly available responses to articles like this, in case interested people are googling around looking for such. If I get lucky, perhaps Oren himself will come across this! Then we could have a proper discussion and we'd probably both learn things.

For most readers of LW, it's probably not worth reading further, since you probably already agree with what I'm saying.

Here is the full text of the article, interspersed with my comments:

Could we wake up one morning dumbstruck that a super-powerful AI has emerged, with disastrous consequences? Books like Superintelligence by Nick Bostrom and Life 3.0 by Max Tegmark, as well as more recent articles, argue that malevolent superintelligence is an existential risk for humanity.

But one can speculate endlessly. It’s better to ask a more concrete, empirical question: What would alert us that superintelligence is indeed around the corner?

It is unfair to claim that Bostrom is speculating whereas your question is concrete and empirical. For one thing, Bostrom and others you denigrate have already asked the same question you now raise; there is not a consensus as to the answer yet, but plenty of arguments have been made. See e.g. Yudkowsky "There's No Fire Alarm for Artificial General Intelligence" More importantly though, how is your question any less speculative than the others? You too are making guesses about what future technologies will come when and in what order, and you too have zero historical data to draw from. Perhaps you could make arguments based on analogy to other technologies -- e.g. as Grace and Christiano have argued about "slow takeoff" and "likelihood of discontinuous progress" but funnily enough (looking ahead at the rest of the article) you don't even do that; you just rely on your intuition!

We might call such harbingers canaries in the coal mines of AI. If an artificial-intelligence program develops a fundamental new capability, that’s the equivalent of a canary collapsing: an early warning of AI breakthroughs on the horizon.

Could the famous Turing test serve as a canary? The test, invented by Alan Turing in 1950, posits that human-level AI will be achieved when a person can’t distinguish conversing with a human from conversing with a computer. It’s an important test, but it’s not a canary; it is, rather, the sign that human-level AI has already arrived. Many computer scientists believe that if that moment does arrive, superintelligence will quickly follow. We need more intermediate milestones.

If you think "many computer scientists believe X about AI" is a good reason to take X seriously, then... well, you'll be interested to know that many computer scientists believe superintelligent AI will come in the next 15 years or so, and also many computer scientists believe AI is an existential risk.

Is AI’s performance in games such as Go, poker or Quake 3, a canary? It is not. The bulk of so-called artificial intelligence in these games is actually human work to frame the problem and design the solution. AlphaGo’s victory over human Go champions was a credit to the talented human team at DeepMind, not to the machine, which merely ran the algorithm the people had created. This explains why it takes years of hard work to translate AI success from one narrow challenge to the next. Even AlphaZero, which learned to play world-class Go in a few hours, hasn’t substantially broadened its scope since 2017. Methods such as deep learning are general, but their successful application to a particular task requires extensive human intervention.

I agree with you here; rapid progress in AI gaming is evidence, but not nearly conclusive evidence, that human-level AGI is coming soon.

More broadly, machine learning is at the core of AI’s successes over the last decade or so. Yet the term “machine learning” is a misnomer. Machines possess only a narrow sliver of humans’ rich and versatile learning abilities. To say that machines learn is like saying that baby penguins know how to fish. The reality is, adult penguins swim, capture fish, digest it, regurgitate into their beaks, and place morsels into their children’s mouths. AI is likewise being spoon-fed by human scientists and engineers.

What about GPT-2? There was relatively little spoon-feeding involved there; they just gave it pretty much the whole Internet and told it to start reading. For comparison, I received much more spoon-feeding myself during my education, yet I managed to reach human-level intelligence pretty quickly!

Anyhow, I agree that machines can't currently learn as well as humans can. But even you must admit that over the past few decades there has been steady progress in that direction. And it seems that progress is and will continue.

In contrast to machine learning, human learning maps a personal motivation (“I want to drive to be independent of my parents”) to a strategic learning plan (“Take driver’s ed and practice on weekends”). A human formulates specific learning targets (“Get better at parallel parking”), collects and labels data (“The angle was wrong this time”), and incorporates external feedback and background knowledge (“The instructor explained how to use the side mirrors”). Humans identify, frame, and shape learning problems. None of these human abilities is even remotely replicated by machines. Machines can perform superhuman statistical calculations, but that is merely the last mile of learning.

I think I agree with you here, that we don't really have an impressive example of an AI organically identifying, framing, and shaping learning problems. Moreover I agree that this probably won't happen in the next ten years, based on my guesses about current rates of progress. And I agree that this would be a canary -- once we have impressive AI doing this organically, then human-level AGI is probably very close. But shouldn't we start preparing for human-level AGI before it is probably very close?

The automatic formulation of learning problems, then, is our first canary. It does not appear to be anywhere close to dying.

Self-driving cars are a second canary. They are further in the future than anticipated by boosters like Elon Musk. AI can fail catastrophically in atypical situations, like when a person in a wheelchair is crossing the street. Driving is far more challenging than previous AI tasks because it requires making life-critical, real-time decisions based on both the unpredictable physical world and interaction with human drivers, pedestrians, and others. Of course, we should deploy limited self-driving cars once they reduce accident rates, but only when human-level driving is achieved can this canary be said to have keeled over.

AI doctors are a third canary. AI can already analyze medical images with superhuman accuracy, but that is only a narrow slice of a human doctor’s job. An AI doctor would have to interview patients, consider complications, consult other doctors, and more. These are challenging tasks that require understanding people, language, and medicine. Such a doctor would not have to fool a patient into thinking it is human—that’s why this is different from the Turing test. But it would have to approximate the abilities of human doctors across a wide range of tasks and unanticipated circumstances.

Again, I agree that we are far from being able to do that -- being a doctor requires a lot of very general intelligence, in the sense that you have to be good at a lot of different things simultaneously and also good at integrating those skills. And I agree that once we have AI that can do this, human-level AGI is not far away. But again, shouldn't we get started preparing before that point?

And though the Turing test itself is not a good canary, limited versions of the test could serve as canaries. Existing AIs are unable to understand people and their motivations, or even basic physical questions like “Will a jumbo jet fit through a window?” We can administer a partial Turing test by conversing with an AI like Alexa or Google Home for a few minutes, which quickly exposes their limited understanding of language and the world. Consider a very simple example based on the Winograd schemas proposed by computer scientist Hector Levesque. I said to Alexa: “My trophy doesn’t fit into my carry-on because it is too large. What should I do?” Alexa’s answer was “I don’t know that one.” Since Alexa can’t reason about sizes of objects, it can’t decide whether “it” refers to the trophy or to the carry-on. When AI can’t understand the meaning of “it,” it’s hard to believe it is poised to take over the world. If Alexa were able to have a substantive dialogue on a rich topic, that would be a fourth canary.

Yep. Same response.

Current AIs are idiots savants: successful on narrow tasks, such as playing Go or categorizing MRI images, but lacking the generality and versatility of humans. Each idiot savant is constructed manually and separately, and we are decades away from the versatile abilities of a five-year-old child. The canaries I propose, in contrast, indicate inflection points for the field of AI.

Some theorists, like Bostrom, argue that we must nonetheless plan for very low-probability but high-consequence events as though they were inevitable. The consequences, they say, are so profound that our estimates of their likelihood aren’t important. This is a silly argument: it can be used to justify just about anything. It is a modern-day version of the argument by the 17th-century philosopher Blaise Pascal that it is worth acting as if a Christian God exists because otherwise you are at risk of an everlasting hell. He used the infinite cost of an error to argue that a particular course of action is “rational” even if it is based on a highly improbable premise. But arguments based on infinite costs can support contradictory beliefs. For instance, consider an anti-Christian God who promises everlasting hell for every Christian act. That’s highly improbable as well; from a logical point of view, though, it is just as reasonable a wager as believing in the god of the Bible. This contradiction shows a flaw in arguments based on infinite costs.

First of all, this isn't an argument based on tiny probabilities of infinite costs. The probability that human-level AI will arrive soon may be small but it is much much higher than other probabilities that you regularly prepare for, such as the probability that you will be in a car accident today, or the probability that your house will burn down. If you think this is a Pascal's Wager, then you must think buckling your seatbelt and buying insurance are too.

Secondly, again, the rationale for preparing isn't just that AGI might arrive soon, it's also that it is good to start preparing before AGI is about to arrive. Suppose you are right and there is only a tiny tiny chance that AGI will arrive before these canaries. Still, preparing for AGI is an important and difficult task; it might take several -- even many! -- years to complete. So we should get started now.

My catalogue of early warning signals, or canaries, is illustrative rather than comprehensive, but it shows how far we are from human-level AI. If and when a canary “collapses,” we will have ample time before the emergence of human-level AI to design robust “off-switches” and to identify red lines we don’t want AI to cross.

What? No we won't, you yourself said that human-level AGI will not be far away when these canaries start collapsing. And moreover, you seem to think here that preparation will be easy: All we need to do is design some off-switches and specify some red lines... This is really naive. You should read the literature, which contains lots of detailed discussion of why solutions like that won't work. I suggest starting with Bostrom's Superintelligence, one of the early academic works on the topic, and then branching out to skim the many newer developments that have arisen since then.

AI eschatology without empirical canaries is a distraction from addressing existing issues like how to regulate AI’s impact on employment or ensure that its use in criminal sentencing or credit scoring doesn’t discriminate against certain groups.

As Andrew Ng, one of the world’s most prominent AI experts, has said, “Worrying about AI turning evil is a little bit like worrying about overpopulation on Mars.” Until the canaries start dying, he is entirely correct.

Here's another argument that has the same structure as your argument: "Some people think we should give funding to the CDC and other organizations to help prepare the world for a possible pandemic. But we have no idea when such a pandemic will arise--and moreover, we can be sure that there will be "canaries" that will start dying beforehand. For example, before there is a worldwide pandemic, there will be a local epidemic that infects just a few people but seems to be spreading rapidly. At that point, it makes sense to start funding the CDC. But anything beforehand is merely un-empirical speculation that distracts from addressing existing issues like how to stay safe from the flu. Oh, what's that? This COVID-2019 thing looks like it might become a pandemic? OK sure, now we should start sending money to the CDC. But Trump was right to slash its budget just a few months earlier, since at that point the canary still lived."

Also, Andrew Ng, the concern is not that AI will turn evil. That's a straw man which you would realize is a straw man if you read the literature. Instead, the concern is that AI will turn competent. As leading AI expert Stuart Russell puts it, we are currently working very hard to build an AI that is smarter than us. What if we succeed? Well then, by the time that happens, we'd better have also worked hard to make it share our values. Otherwise we are in deep trouble.

5 comments

Comments sorted by top scores.

comment by Steven Byrnes (steve2152) · 2020-02-27T22:53:36.587Z · LW(p) · GW(p)

I would also object to his implication that the only time-sensitive thing about AGI safety is to figure it out before AGI comes. The other thing is determining which of today's many ongoing research paths towards AGI is likeliest to lead to an AGI architecture that's amenable to being used safely and beneficially [LW · GW]. ...And that kind of information is only useful if we get it WAY before AGI is around the corner. :-)

comment by Sammy Martin (SDM) · 2020-02-27T19:14:55.449Z · LW(p) · GW(p)

The sad/good thing is that this article represents progress. I recall that in Human Compatible Stuart Russell said that there was a joint declaration from some ML researchers that AGI is completely impossible, and its clear from this article that Oren is at least thinking about it as a real possibility that isn't hundreds of years away. Automatically forming learning problems sounds a lot like automatically discovering actions [LW · GW], which is something Stuart Russell also mentioned in a list of necessary breakthroughs to reach AGI, so maybe there's some widespread agreement about what is still missing.

That aside, even by some of Oren's own metrics, we've made quite substantial progress - he mentions the Winograd schemas as a good test of when we're approaching human-like language understanding and common sense, but what he may not know is that GPT-2 actually bridged a significant fraction of the gap on Winograd schema performance between the best existing language models and humans, which is a good object lesson in how the speed of progress can surprise you - from 63% to 71%, with humans at about 92% accuracy according to deepmind.

Replies from: None

↑ comment by [deleted] · 2020-02-28T00:56:09.951Z · LW(p) · GW(p)

I find that the Winograd schemas is more useful as a guideline to adversarial queries to stump AIs than an actual test. An AI reaching human-level accuracy on Winograd schemas would be much less impressive to me than an AI passing the traditional Turing test conducted by an expert who is aware of Winograd schemas and experienced in adversarial queries in general. The former is more susceptible to Goodhart's law due to the stringent format and limited problem space.

comment by Eli Tyre (elityre) · 2020-02-27T18:32:00.270Z · LW(p) · GW(p)

Chapter XXX of Bostrom's Superintelligence.

Did you mean to fill this in?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-02-27T19:33:42.141Z · LW(p) · GW(p)

Ooops, yes I did, thanks for catching that.

Response to Oren Etzioni's "How to know if artificial intelligence is about to destroy civilization"

Contents

5 comments