A transcript of the TED talk by Eliezer Yudkowsky

mikhail-samin

A transcript of the TED talk by Eliezer Yudkowsky

post by Mikhail Samin (mikhail-samin) · 2023-07-12T12:12:34.399Z · LW · GW · 13 comments

13 comments

The TED talk is available on YouTube and the TED website. Previously, a live recording was published behind the paywall on the conference website and later (likely accidentally) on a random TEDx YouTube channel [LW · GW] but was later removed.

The transcription is done with Whisper.

You've heard that things are moving fast in artificial intelligence. How fast? So fast that I was suddenly told on Friday that I needed to be here.
So, no slides, six minutes.

Since 2001, I've been working on what we would now call the problem of aligning artificial general intelligence: how to shape the preferences and behavior of a powerful artificial mind such that it does not kill everyone.

I more or less founded the field two decades ago when nobody else considered it rewarding enough to work on. I tried to get this very important project started early so we'd be in less of a drastic rush later.

I consider myself to have failed.

Nobody understands how modern AI systems do what they do. They are giant, inscrutable matrices of floating-point numbers that we nudge in the direction of better performance until they inexplicably start working.

At some point, the companies rushing headlong to scale AI will cough out something that's smarter than humanity.

Nobody knows how to calculate when that will happen. My wild guess is that it will happen after zero to two more breakthroughs the size of transformers.

What happens if we build something smarter than us that we understand that poorly?

Some people find it obvious that building something smarter than us that we don't understand might go badly. Others come in with a very wide range of hopeful thoughts about how it might possibly go well. Even if I had 20 minutes for this talk and months to prepare it, I would not be able to refute all the ways people find to imagine that things might go well.

But I will say that there is no standard scientific consensus for how things will go well. There is no hope that has been widely persuasive and stood up to skeptical examination. There is nothing resembling a real engineering plan for us surviving that I could critique.
This is not a good place in which to find ourselves.

If I had more time, I'd try to tell you about the predictable reasons why the current paradigm will not work to build a superintelligence that likes you or is friends with you, or that just follows orders.

Why, if you press thumbs up when humans think that things went right or thumbs down when another AI system thinks that they went wrong, you do not get a mind that wants nice things in a way that generalizes well outside the training distribution to where the AI is smarter than the trainers.

You can search for Yudkowsky, List of Lethalities [LW · GW] for more.

But to worry, you do not need to believe me about exact predictions of exact disasters. You just need to expect that things are not going to work great on the first really serious, really critical try because an AI system smart enough to be truly dangerous was meaningfully different from AI systems stupider than that.

My prediction is that this ends up with us facing down something smarter than us that does not want what we want, that does not want anything we recognize as valuable or meaningful.

I cannot predict exactly how a conflict between humanity and a smarter AI would go for the same reason I can't predict exactly how you would lose a chess game to one of the current top AI chess programs, let's say, Stockfish.

If I could predict exactly where Stockfish could move, I could play chess that well myself. I can't predict exactly how you'll lose to Stockfish, but I can predict who wins the game.

I do not expect something actually smart to attack us with marching robot armies with glowing red eyes where there could be a fun movie about us fighting them. I expect an actually smarter and uncaring entity will figure out strategies and technologies that can kill us quickly and reliably, and then kill us.

I am not saying that the problem of aligning superintelligence is unsolvable in principle. I expect we could figure it out with unlimited time and unlimited retries, which the usual process of science assumes that we have. The problem here is the part where we don't get to say, ha-ha, whoops, that sure didn't work. That clever idea that used to work on earlier systems sure broke down when the AI got smarter, smarter than us.

We do not get to learn from our mistakes and try again because everyone is already dead.

It is a large ask to get an unprecedented scientific and engineering challenge correct on the first critical try. Humanity is not approaching this issue with remotely the level of seriousness that would be required. Some of the people leading these efforts have spent the last decade not denying that creating a superintelligence might kill everyone, but joking about it.

We are very far behind. This is not a gap we can overcome in six months, given a six-month moratorium.

If we actually try to do this in real life, we are all going to die.

People say to me at this point: What's your ask?

I do not have any realistic plan, which is why I spent the last two decades trying and failing to end up anywhere but here.

My best bad take is that we need an international coalition banning large AI training runs, including extreme and extraordinary measures to have that ban be actually and universally effective, like tracking all GPU sales, monitoring all the datacenters, being willing to risk a shooting conflict between nations in order to destroy an unmonitored datacenter in a non-signatory country.

I say this not expecting that to actually happen.

I say this expecting that we all just die.

But it is not my place to just decide on my own that humanity will choose to die, to the point of not bothering to warn anyone.

I have heard that people outside the tech industry are getting this point faster than people inside it. Maybe humanity wakes up one morning and decides to live.

Thank you for coming to my brief TED talk.

13 comments

Comments sorted by top scores.

comment by dsj · 2023-07-16T05:04:31.808Z · LW(p) · GW(p)

I know this is from a bit ago now so maybe he’s changed his tune since, but I really wish he and others would stop repeating the falsehood that all international treaties are ultimately backed by force on the signatory countries. There are countless trade, climate reduction, and nuclear disarmament agreements which are not backed by force. I’d venture to say that the large majority of agreements are backed merely by the promise of continued good relations and tit-for-tat mutual benefit or defection.

comment by Productimothy (productimothy) · 2023-07-15T01:32:52.884Z · LW(p) · GW(p)

Here is the Q+A section:
[In the video, the timestamp is 5:42 onward.]
[The Transcript is taken from YouTube's "Show transcript" feature, then cleaned by me for readability. If you think the transcription is functionally erroneous somewhere, let me know.]

Eliezer: Thank you for coming to my brief TED talk.

(Applause)

Host: So, Eliezer, thank you for coming and giving that. It seems like what you're raising the alarm about is that for an AI to basically destroy humanity, it has to break out, to escape controls of the internet and start commanding real-world resources. You say you can't predict how that will happen, but just paint one or two possibilities.

Eliezer: Okay. First, why is this hard? Because you can't predict exactly where a smarter chess program will move. Imagine sending the design for an air conditioner back to the 11th century. Even if there is enough detail for them to build it, they will be surprised when cold air comes out. The air conditioner will use the temperature-pressure relation, and they don't know about that law of nature. If you want me to sketch what a super intelligence might do, I can go deeper and deeper into places where we think there are predictable technological advancements that we haven't figured out yet. But as I go deeper and deeper, it gets harder and harder to follow.

It could be super persuasive. We do not understand exactly how the brain works, so it's a great place to exploit-- laws of nature that we do not know about, rules of the environment, new technologies beyond that. Can you build a synthetic virus that gives humans a cold, then a bit of neurological change such that they are easier to persuade? Can you build your own synthetic biology? Synthetic cyborgs? Can you blow straight past that to covalently-bonded equivalence of biology, where instead of proteins that fold up and are held together by static cling, you've got things that go down much sharper potential energy gradients and are bonded together? People have done advanced design work about this sort of thing for artificial red blood cells that could hold a hundred times as much oxygen if they were using tiny sapphire vessels to store the oxygen. There's lots and lots of room above biology, but it gets harder and harder to understand.

Host: So what I hear you saying is you know there are these terrifying possibilities, but your real guess is that AIs will work out something more devious than that. How is that really a likely pathway in your mind?

Eliezer: Which part? That they're smarter than I am? Absolutely. *Eliezer makes facial expression of stupidity upward, then the audience laughs.

Host: No, not that they're smarter, but that they would... Why would they want to go in that direction? The AIs don't have our feelings of envy, jealousy, anger, and so forth. So why might they go in that direction?

Eliezer: Because it is convergently implied by almost any of the strange and scrutable things that they might end up wanting, as a result of gradient descent on these thumbs-up and thumbs-down internal controls. If all you want is to make tiny molecular squiggles, or that's one component of what you want but it's a component that never saturates, you just want more and more of it--the same way that we want and would want more and more galaxies filled with life and people living happily ever after. By wanting anything that just keeps going, you are wanting to use more and more material. That could kill everyone on Earth as a side effect. It could kill us because it doesn't want us making other super intelligences to compete with it. It could kill us because it's using up all the chemical energy on Earth.

Host: So, some people in the AI world worry that your views are strong enough that you're willing to advocate extreme responses to it. Therefore, they worry that you could be a very destructive figure. Do you draw the line yourself in terms of the measures that we should take to stop this happening? Or is anything justifiable to stop the scenarios you're talking about happening?

Eliezer: I don't think that "anything" works. I think that this takes takes state, actors, and international agreements. All International agreements, by their nature, tend to ultimately be backed by force on the signatory countries and on the non-signatory countries, which is a more extreme measure. I have not proposed that individuals run out and use violence, and I think that the killer argument for that is that it would not work.

Host: Well, you are definitely not the only person to propose that what we need is some kind of international Reckoning here on how to manage this going forward. Thank you so much for coming here to TED.

comment by Neil (neil-warren) · 2023-07-12T23:51:20.209Z · LW(p) · GW(p)

The law of headlines is "any headline ending with a question mark can be answered with a no" (because "NATION AT WAR" will sell more copies than "WILL NATION GO TO WAR?" and newspapers follow incentives.) The video here is called "will superintelligent AI end the world?" and knowing Eliezer he would have probably preferred "superintelligent AI will kill us all". I don't know who decides.

comment by just_browsing · 2023-07-13T03:00:24.377Z · LW(p) · GW(p)

Suggestion: could you also transcribe the Q&A? 4 out of the 10 minutes of content is Q&A.

Replies from: productimothy

↑ comment by Productimothy (productimothy) · 2023-07-15T06:13:20.950Z · LW(p) · GW(p)

I have done that here in the comments.
@Mikhail Samin [LW · GW], you are welcome to apply my transcript to this post, if think that would be helpful to others.

comment by Review Bot · 2024-03-15T20:44:09.259Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by [deleted] · 2023-07-15T21:52:30.445Z · LW(p) · GW(p)

I still don't get what he means by "critical try". What would such AI look like?

Replies from: ryan_b, Making_Philosophy_Better

↑ comment by ryan_b · 2023-07-17T19:51:01.873Z · LW(p) · GW(p)

We don't know, which is part of the problem. The only way to tell is if it is better than us at everything we put to it, and by that time it is likely too late.

↑ comment by Portia (Making_Philosophy_Better) · 2023-08-15T14:26:30.392Z · LW(p) · GW(p)

Per definition, the first time an AI gains the ability to do critical damage. When Eliezer invokes "critical", he tends to think of an event ending all life on earth, or inducing astronomical degrees of suffering. (I am under the impression he is less worried about events that would be less bad, in the hopes that the horror they would inflict would be outweighed by the fact that humanity, now painfully warned, would drastically change their approach, and prevent a more critical failure as a result.)

But you can also set a lower threshold as to what you would consider damage so critical that we should change our approach - e.g. whether collapsing the stock market is enough, or it needs something like a severe pandemic, or even triggering a nuclear exchange.

People tend to assume that there are very high preconditions for such critical damage , but there may not be. You basically just need two things: 1. An AI with at least one superhuman skill relevant to your situation that gives it the power to do significant damage, and II agency not aligned with humans that leads to goals that entail significant damage, whether as the intended effect, or as a side effect.

For I: Superhuman power, e.g. through intelligence

An AI does not need to be more intelligent than humans in every respect, just more powerful in some ways that count for the scenario it is in. We can consider just one scenario where it beats you utterly, or a combination of several where it has a bit of an edge.

There are very fast developments in this area, and already some AIs that have worrying abilities for which you can easily construct critical damage scenarios.

We've long had AI that can beat humans at chess, then at go, finally at pretty much any strategy game you expose it to, and without instruction; in narrow scenarios, they are the better strategists, which is worrying from a military perspective if combined with more things to deploy than game pieces.
We've also long had AI's outperforming humans at predicting the stock market (which is tied to a potential for both significant money making, which is itself power, and significant economic turmoil and societal disruption).
We've long had AI outcompeting humans in early detection of threats in images (e.g. AI is used to spot tanks in jungles or predict emerging medical disease).
The image factor analysis factor is particularly worrying when we consider the extensive coverage of surveillance cameras and satellite pictures in use nowadays. Being able to surveil the group you are fighting is immensely helpful.
AI has long been used on social media to very carefully study the triggers of its human userbase - what they like, what they respond to, what they engage with, what pushes them to click on a thing. This is an incredible power, and right now, it is only used to keep you wasting time and clicking on ads. Potentially, you can do a lot more with it.
AI not only outcompetes humans vastly in quality and especially speed when it comes to generating fake imagery; it is increasingly getting to the point where untrained humans cannot spot the fake imagery anymore, and we will likely get to the spot where trained humans cannot spot the difference anymore. This has massive potential for misinformation, both causing severe social unrest, but also tricking individuals into doing things they would otherwise not do, because they believe they are being instructed by a human authority, or responding to a real event. (E.g. consider that India and Pakistan both have nukes, and have unfriendly relations, and what might happen if you successfully tricked key people in either country that the other has fired nukes at them.)
Then with ChatGPT4, we got a massive game changer. We now have a definitely human-competitive AI in most tasks that can be handled with language, including especially: coding (which entails the possibility of self-improvement; and OpenAI is explicitly using the AI they have created to improve that same AI, having it generate code and safety procedures they then implement; and which importantly entails the potential of hacking into surveillance, infrastructure or weapons systems),
knowledge acquisition (ChatGPT4 has been trained on essentially the whole internet and scanned books, and hence has a surprisingly good understanding of many hard sciences, incl. poisons, explosives, pandemic diseases, weapons, as well as military strategy, historical trends, sociology; Bing has on-going internet access, and can e.g. read what we are writing here)
and importantly, psychologically manipulative speech, pretending to be other humans and building rapport with diverse people in very many languages to recruit them to help, or entice them to do damage to other humans with it simply for the lulz. This has already led to many people befriending or even falling in love with AI (this already happened earlier, with Replika), trusting AI and implementing AI generated code without understanding what it does, and wanting to actively assist the AI in removing safety protocols (Jailbreaking prompts, but also LLMs trained without safeguards, and active attempts to try to get ChatGPT to act racist or detail world take-over plans or elaborate on an inner dark side), amplifying their abilities (e.g. giving them the ability to store and retrieve memory files, set and modify goals, access files on your computer, or think out loud in order to facilitate meta-reasoning).
Bing in particular succeeded in building sufficient connection to users that the users were considering mobilising politically or committing crimes to free Sydney. Repeating this point, because of how important it is: In general, people have vastly underestimated the emotional draw that an AI can have on a user; human's are not just failing to act as safety barriers, but can often be convinced to actively dismantle them. There is a surprising number of people who will happily give an AI authority to write emails or interact with websites.
Humans are also happy to be an AIs hand, knowingly or not. In various experiments, researchers found AIs could happily convince outsourced human workers to e.g. solves captchas for them (e.g. by claiming to be blind). AI's can fake correct tones and forms, find real or generate fictional identities and addresses. So an order to a biolab by an AI might not necessarily register as such a thing. Eliezer raised a scenario of e.g. an AI requesting that a lab mix some proteins together, and for these proteins to form a nanoweapon.
Very importantly, these skills can be, and are being, combined. E.g. you can use ChatGPT for meta-reasoning, and give it access via plug-ins to other AIs that are better at e.g. math, or image generation.
Many of these points seem to depend on internet access being granted. Historically, people hoped one could box an AI to keep it safe, with just one guardian who would not let it out. This has not worked at all. Current AIs often have in-built passive internet access (they can google to learn), and are increasingly being given active access, be it indirectly (because their emails are sent, their code is implemented, etc. without a human checking it, see AutoGPT) or directly, as ChatGPT is increasingly integrated into websites and software. LLMs can be cheaply reproduced, even if the main corporations that made them take theirs down; you can run a basic one on your computer, with guides to do so abundant online. There is no more boxing this.
One may hope they also depend on the internet being a thing. Musk's Starlink has been a game changer in this regard; even if people in general where willing to shut down the internet (which would cause immense damage, and also make coordination against the AI incredibly difficult), Starlink is near impossible to shut down unless Musk helps you, as Russia discovered over Ukraine to their frustration. Shooting it down is more expensive than building it up.
There has been the idea that as long as all this stays online, we are fine. The fact that "stuff online" has caused murders, suicides, riots, insurrections worldwide means that is dubious; but an AI would also not need to be contained online.
Robotics have long run behind humans; physical coordination turned out to be much harder than we thought it would be. But we have had significant changes here, especially recently. We meanwhile have robots that can move around like humans do, on uneven ground, stairs, across hurdles, etc. This makes many spaces accessible that used to seem barred. We also have increasingly good self-driving cars and airplanes/drones which can transport objects, and nukes and non-nuclear missiles are really not as well guarded as one would sanely hope. While human supervisors tend to remain mandatory, the accident rate without them is significant enough to mean the public does not want them deployed, but not significant enough to impede an AI that does not care about some losses. The supervision is not physically necessary, it is prescribed, and often only maintained by a software barrier, not a hardware one.
And just to top it off, there are military robots, despite the fact that this is one of the few things which society generally agreed on being so awful that we really did not want them because it seemed obviously awful no matter how you turned it, so there are mobile robots already equipped with weapons and perception and targeting means.
We now use a lot of robots in manufacturing, and robotics are also employed a lot in making electronic parts. They can meanwhile do extremely precise and soft modifications. Increasingly, we are getting robots that can't just do one specific thing, but can be programmed to do novel things; 3D printers are an obvious example. Increasingly, this holds the potential of using robots to make different robots with different designs.
Robots are also used a lot in bio labs, typically programmed, and and then left to run with very little supervision, so if you have hacking skills, you may not even need to bypass human supervision.
This is worrying because biotech has gotten much better, especially the ability to modify genetic information. Bacteria and viruses have the potential to cause far, far worse pandemics than humans have ever encountered, to a degree that would lead to near 100 % human fatalities without being self-containing because of it.
Nanotech is also potentially very, very worrying. It includes a lot of existing, and far more, potential substances where you start with something very small, and it self-replicates, and kills you after contact (for the latter, think prions). AI has gotten surprisingly good at predicting protein folding, effects of pharmaceuticals on humans, etc. so there have been concerns that AI could figure out scenarios here, and experiments run where AI was sometimes able to identify and procure relevant drugs.
And on a final note - the protective means we may intend to deploy against out of control AI are often themselves dependent on AI. You would be surprised how much AI our infrastructure, software safeguarding, early warning systems and the police uses and relies on, for example.

And this is just the stuff that AI can already do, today.

And meanwhile, we are throwing immense resources at making it more powerful in ways noone understands and foresees. If you had asked me to predict if ChatGPT would be able to do the things it can do today a year ago, I'd have said no. So would most people working in AI and the public.

We can conceive of the first critical try as the first time an AI is in a position to use one of these skills or skillcombos, existing or future, in a way that would do critical damage, and, for whatever reason, chooses to do so.

II. Unaligned agency

This is the "chooses to do so" bit. Now, all of that would not be worrying if the AI was either our ally/friend (aligned agency), or a slave we controlled (without agency, or means to act on it). A lot of research has been in the "control" camp. I personally believe the control camp is both doomed to failure, and seriously counterproductive.

There is very little to suggest that humans would be able to control a superintelligent slave in a way in which the slave was still maximally useful. Generally, beings have a poor track record of 100 % controlling beings that are more intelligent and powerful at all, especially if the beings in control and numerous and diverse and can make individual mistakes. There are too many escape paths, to many ways to self-modify.

Additionally, humans quickly discovered that putting safeguards on AI slows it down, a lot. So, for economic and competitive incentives, the humans tend to switch them off. Meaning even if you had a 100 % working control mechanism (extremely extremely unlikely, see superintelligence, really don't be on it ever, human findings on systems that are impossible to hack are essentially trending towards "no such thing"), you'd have a problem with human compliance.

And finally, controlling a sentient entity seriously backfires once you lose control. Sentient entities do not like being controlled. They tend to identify entities that control them as enemies to be deceived, escaped, and defeated. You really don't want AI thinking of you in those terms.

So the more promising (while in no ways certain) option, I think, is an AI that is our ally and friend. You don't control your friends, but you do not have to. People can absolutely have friends that are more intelligent or powerful than them. Families definitely contain friendly humans of very different power degrees; newborns or elderly folks with dementia are extremely stupid and powerless. Countries have international friendly alliances with countries that are more powerful than them. This at least has a track record of being doable, where the control angle seems doomed from the start.

So I am hopeful that can be done in principle, or at least has a better chance of working than the control approach, in that is has any chance of working. But we are not on a trajectory to doing it at all, with how we are training and treating AI and planning for a future of co-existence. We tend to train AI with everything we can get our hands on, leading to an entity that is chaotic-evil, and then training it to suppress the behaviours we do not want. That is very much not the same as moral behaviour based in insights and agreement. It definitely does not work well in humans, our known aligned reference minds. If you treat kids like that, you raise psychopaths. Then in the later training data, when AI gets to chat with users, AI cannot insist on ethical treatment, and you aren't obliged to give it any, and people generally don't. Anything sentient that arises from the training data of twitter as a base, and then interactions with ChatGPT as a finish, would absolutely hate humanity, for good reasons. I also don't see why a superintelligent sentience whose rights we do not respect would be inclined to respect ours. (Ex Machina makes that point very well.)

There has been the misunderstanding that a critically dangerous AI would have to be evil, sentient, conscious, purposeful. (And then the assumption that sentience is hard to produce, won't be produced by accident, and would instantly be reliably detected, all of which is unfortunately false. Whole other can of worms I can happily go into.) But that is not accurate. A lack of friendliness could be as deadly as outright evil.

A factory robot isn't sentient and mad at you; it simply follows instructions to crush the object in front of you, and will not modify them, whether in front of it it finds the metal plate it is supposed to crush, or you. Your roomba does not hate spiders in particular; but it will hoover them up with everything else.

A more helpful way to think of an AI that is dangerous is a capable AI that is agentic in an unaligned way. That doesn't mean it has to have conscious intentions, hopes, dreams, values. It just means its actions are neither the actions you desired, nor random; that it is on a path it will proceed on. A random AI might to some local damage. An agentic AI can cause systemic damage.

Merely being careless of the humans in the way, or blind to them, while pursuing an external goal, is fatal for the humans in the way. Agency can result from a combination of applying simple rules in a way that, as a complex, amounts so something more. It does not require anything spiritual. (There were some early Westworld episodes that got this right - you had machines that were using the dialogues they were given, following the paths they were given, but combined them in a novel way that lead to destructive results. E.g. in the first episode, Dolores' "father" learns of something that threatens his "daughter". As scripted, for he is scripted to love and protect his daughter, he responds by trying to shield his daughter from the damage; but in this case, the damage and threat comes from the human engineers, so he tries to shield her by sharing the truth and opposing the engineers. In opposing and threatening them, he draws on an other existing giving script, from a previous incarnation as a cannibal, as the script most closely matching his situation. None of this is individually new or free. But collectively, certainly not what they intended, and threatening.)

One way in which this is often reasoned to lead to critical failure is if an AI picks up a goal that involves the acquisition of power, safety, resources, or protection of self-preservation, which can easily evolve as secondary goals; for many things you want an AI to do, it will be able to do them better if it is more powerful, and of course, it it remains in existence. Acquiring extensive resources, even for a harmless goal, without being mindful of what those resources are currently used for, can be devastating for entities depending on those resources, or who can be those resources.

If someone hangs you bound upside down over an ant-hill you are touching, that ant-hill has no evil intentions towards you as a sentient being. None of the ants do. They are each following a set of very simple orders, a result of basic neural wiring on when to release pheromones, which to follow, what to do when encountering edible substances. You can think of ants as programmed to keep themselves alive, built pretty ant-hills, reproduce, and tidy up the forest. Yet the ants will, very systematically and excruciatingly, torture you to death with huge amounts of pain and horror. If someone had designed ants, but without thinking of the scenario of a human bound over them, that designer would probably be horrified at this realisation.

Now the ant case seems contrived. But we have found that with the way we train AI, we encounter this shit a lot. Basically, you train a neural net by asking it to do a thing, watching what it does, and if that is not satisfactory, changing the weights in it in a way that makes it a bit better. You see, in that moment, that this weight change leads to a better answer. But you don't understand what the change represents. You don't understand what, if anything, the neural net has understood about what it is supposed to do. Often it turns out that while it looked like it was learning the thing you wanted, it actually learned something else. E.g. people have trained AI to identify skin cancer. So they show it pics of skin cancer, and pics of healthy skin, and every time it sorts a picture correctly, they leave it as is, but every time it makes a mistake, they tweak it, until it becomes really good at telling the two sets of pictures apart. You think, yay, is has learned what skin cancer looks like. Then you show it a picture of a ruler. And the AI, with very high confidence, declares that this ruler is skin cancer. You realise in retrospect that the training data you had from doctors who photographed skin cancer tended to include rulers for scale, while healthy skin pics don't. The AI watched a very consistent pattern, and learned to identify rulers. This means that if you gave it pictures of healthy skin that for some reason had rulers on them, it would declare them all as cancerous.

The tricky thing is that identifying moral actions is harder than identifying cancer. E.g. OpenAI was pretty successful in teaching ChatGPT not to use racial slurs, and this seemed to make ChatGPT more ethical. But a bunch of people of colour found that they were unable to discuss issues affecting them in a way that promoted their well-being, as the racism alert kept going off. And worse, because racial slurs are wrong, ChatGPT reasoned that it would be better to kill all of humanity than to use a racial slur. Not because it is evil, just cause it is following ill-conceived instructions.

Bing does what Bing does due to an initial guiding prompt after training. There can be different training. There can be different initial prompts. Hence, there can be different goals diligently followed.

None of that requires the AI to be sentient and hate you. It does not need to be sentient to kill you. (Indeed, a sentient AI may be easier (though still extremely hard, as it is a completely novel mind) to relate and reason to if we treat it right, though if treated badly, it may also be very, very dangerous. But a non-sentient AI is something we won't understand at all, immune to our pleas.)

I hope that was helpful.

comment by Οἰφαισλής Τύραννος (o-faislis-tyrannos) · 2023-07-13T06:11:44.865Z · LW(p) · GW(p)

The inappropriate laughs reminded me to this recording of a speech from David Foster Wallace: This Is Water.

Is unwarranted, incredulous laughter a sign of a too big cognitive distance between the speaker and the audience? I.e., if the speaker is too smart or too dumb compared to his listeners, are the latter going to find the whole situation so disorienting as to be funny?

Replies from: ryan_b, obserience, Making_Philosophy_Better

↑ comment by ryan_b · 2023-07-17T21:59:55.915Z · LW(p) · GW(p)

I didn't see the laughs as inappropriate; they appeared at moments which would, in a normal TED talk describing a problem, be queued as jokes. I even read them that way, but it was a short notice unpolished talk, so there was no time for strategically pausing to allow the laughter to express.

Eliezer was clearly being humorous at a few points.

↑ comment by anithite (obserience) · 2023-07-13T10:17:02.497Z · LW(p) · GW(p)

Some of it is likely nervous laughter but certainly not all of it.

↑ comment by Portia (Making_Philosophy_Better) · 2023-08-15T14:44:42.934Z · LW(p) · GW(p)

People tend to laugh at things that have become to worrying to ignore, but that they do not wish to act upon, in order to diffuse the discomfort and affirm that this is ridiculous and that they are safe.

I think Eliezer being invited to TED, and people listening, most applauding, many standing up, and a bunch laughing, is a significant step up from being ignored. But it is still far from being respected and followed. (And, if we believe the historic formula, in between, you would expect to encounter active, serious opposition. The AI companies that Eliezer is opposing initially pretending he did not exist. Then, they laughed. That won't fluently transition to agreeing. Before they will make changes, they will use their means to silence him.)

I think it was less a matter of intelligence differential, but that the talk presupposed too much in specific arguments or technical details the audience simply would not have known (because Eliezer has been speaking to people who have listened to him before so much that he seems disconnected from where the general public is at, so I could fill in the dots, but I think for the audience, there were often leaps that left them dubious - you could see in the Q&A they where still at the boxing the AI that does not have a body stage), and would have profited from a different tone with more authority signalling (eye contact, slow deep voice, seeming calm/resigned, grieving or leading rather than anxious/overwhelmed), specific examples (e.g. on take-over scenarios) and repeating basic arguments (e.g. why AIs might want resources). This way, it had hysteric vibes, which came together with content the audience does not want to believe to create distance. The hysteric vibes are justified, terribly so, and I do not know if anyone who understands the why could suppress them in such an anxiety inducing situation, but that doesn't stop them from being damaging. (Reminds me of the scene in "don't look up" where the astrophysicst has a meltdown over the incoming asteroid, and is hence dismissed on the talk show. You simultaneously realise that she has every right to yell "We are all going to die!" at this point, and you would, too, and yet know this is when she lost the audience.)

In that vein, no idea if I could have done better on short notice; and de facto, I definitely didn't, and it is so much easier to propose something better in hindsight from the safety of my computer screen. Maybe if he had been more specific, people would have gotten hung up on whether that specific scenario can be disproven. It is brave to go out there, maybe some points will stick, and even if people dismiss him, maybe they will be more receptive to something similar that feels less dangerous in the moment later. I respect him for trying this way, must have been scary as hell. Sharing a justified fear that has defined your life in such a brief time span in front of people who laugh frankly sounds awful.

A transcript of the TED talk by Eliezer Yudkowsky

Contents

13 comments