"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

dragongod

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

post by DragonGod · 2023-03-30T15:43:32.814Z · LW · GW · 33 comments

This is a link post for https://www.youtube.com/watch?v=AaTRHFaaPG8

33 comments

It's a 3 hours 23 minutes episode.

[I might update this post with a summary once I'm done listening to it.]

33 comments

Comments sorted by top scores.

comment by Max H (Maxc) · 2023-03-30T18:09:31.396Z · LW(p) · GW(p)

Given the size and demographics of Lex's audience, I think this is more likely to bring a large influx of new users to LW and adjacent communities than either the Bankless podcast or the TIME article.

In the long term, that might be a good thing, but in the meantime, it's probably a good idea to keep this classic post [LW · GW] in mind.

comment by Cody Rushing (cody-rushing) · 2023-03-30T18:54:54.465Z · LW(p) · GW(p)

Sheesh. Wild conversation. While I felt Lex was often missing the points Eliezer was saying, I'm glad he gave him the space and time to speak. Unfortunately, it felt like the conversation would keep moving towards reaching a super critical important insight that Eliezer wanted Lex to understand, and then Lex would just change the topic onto something else, and then Eliezer just had to begin building towards a new insight. Regardless, I appreciate that Lex and Eliezer thoroughly engaged with each other; this will probably spark good dialogue and get more people interested in the field. I'm glad it happened.

For those who are time constrained and wondering what is in it: Lex and Eliezer basically cover a whole bunch of high-level points related to AI not-kill-everyone-ism, delving into various thought experiments and concepts which formulate Eliezer's worldview. Nothing super novel that you probably haven't heard of if you've been following the field for some time.

Replies from: memeticimagery, lechmazur

↑ comment by memeticimagery · 2023-03-30T19:46:35.356Z · LW(p) · GW(p)

There were definitely parts where I thought Lex seemed uncomfortable, not just limited to specific concepts but when questions got turned around a bit towards what he thought. Lex started podcasting very much in the Joe Rogan sphere of influence, to the extent that I think he uses a similar style, which is very open and lets the other person speak/have a platform but is perhaps at the cost of being a bit wishy-washy. Nevertheless it's a huge podcast with a lot of reach.

Replies from: o-faislis-tyrannos

↑ comment by Οἰφαισλής Τύραννος (o-faislis-tyrannos) · 2023-03-30T22:44:36.058Z · LW(p) · GW(p)

There were definitely parts where I thought Lex seemed uncomfortable

Like when at 1:03:31 he suggested that he was a robot trying to play human characters?

That kind of words make me think that there is something extremely worrisome and wrong with him.

Replies from: memeticimagery, aeviternity1

↑ comment by memeticimagery · 2023-03-30T23:37:19.566Z · LW(p) · GW(p)

No, that was just a joke Lex was making. I don't know the exact timestamps but in most of the instances where he was questioned on his own positions or estimations on the situation Lex seemed uncomfortable to me, including the alien civilisation example. At one point I recall actually switching to the video and Lex had his head in his hands, which body language wise seems pretty universally a desperate pose.

↑ comment by Lost Futures (aeviternity1) · 2023-03-31T00:37:38.651Z · LW(p) · GW(p)

Pretty sure that's just an inside joke about Lex being a robot that stems from his somewhat stiff personality and unwillingness to take a strong stance on most topics.

↑ comment by Lech Mazur (lechmazur) · 2023-03-30T23:48:55.252Z · LW(p) · GW(p)

Yes. It was quite predictable that it would go this way based on Lex's past interviews. My suggestion for Eliezer would be to quickly address the interviewer's off-topic point and then return to the main train of thought without giving the interviewer a chance to further derail the conversation with follow-ups.

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-03-31T06:28:31.400Z · LW(p) · GW(p)

That's a good suggestion. But at some point you have to let it die or wrap it up. It occurred to me while Eliezer was repeatedly trying to get Lex back onto the you're-in-a-box-thinking-faster thought experiment: when I'm frustrated with people for not getting it, I'm often probably boring them. They don't even see why they should bother to get it.

You have to know when to let an approach die, or otherwise change tack.

Replies from: lechmazur

↑ comment by Lech Mazur (lechmazur) · 2023-03-31T07:39:56.491Z · LW(p) · GW(p)

I agree, that in-the-box thought experiment exchange was pretty painful. I've seen people struggle when having to come up with somewhat creative answers on the spot like this before, so perhaps giving Lex several options to choose from would have at least allowed the exchange to conclude and convince some of the audience.

comment by Cameron Holmes (cameron-holmes) · 2023-03-30T23:26:38.572Z · LW(p) · GW(p)

I think this was good, but I really think laying out the basic arguments for convergent instrumental goals is a foundational part of introducing the topic to a new audience (which I expect is the case for most of Lex's listeners) which wasnt sufficiently explained.

Making it clear that most innocuous goals beget resource acquisition and self preservation which is what puts an agentic AI in conflict with humans by default is what really makes the concern concrete for many people. Otherwise I think there is a tendancy to assume that some leap is required for a system to be in conflict which is much harder to picture and seems intuitively more preventable.

Replies from: Roman Leventov

↑ comment by Roman Leventov · 2023-03-31T06:44:26.912Z · LW(p) · GW(p)

Yes, unfortunately, Eliezer's delivery was suffering in many places from assuming that listeners have a lot of prior knowledge/context.

If he wishes to become a media figure going forward (which looks to me like an optimal thing to do for him at this point), this is one of the most important aspects to improve in his rhetoric. Pathos (the emotional content) is already very good, IMO.

comment by Jacy Reese Anthis (Jacy Reese) · 2023-03-31T13:47:22.010Z · LW(p) · GW(p)

I disagree with Eliezer Yudkowsky on a lot, but one thing I can say for his credibility is that in possible futures where he's right, nobody will be around to laud his correctness, and in possible futures where he's wrong, it will arguably be very clear how wrong his views were. Even if he has a big ego (as Lex Fridman suggested), this is a good reason to view his position as sincere and—dare I say it—selfless.

Replies from: rudi-c

↑ comment by Rudi C (rudi-c) · 2023-03-31T19:26:51.023Z · LW(p) · GW(p)

I don’t think his position is falsifiable in his lifetime. He has gained a lot of influence because of it that he wouldn’t have with a mainstream viewpoint. (I do think he’s sincere, but the incentives are the same as all radical ideas.)

comment by variouslymistaken · 2023-03-30T22:04:13.738Z · LW(p) · GW(p)

LEX: Is there a way to measure general intelligence? I mean, I could ask that question in a million ways but basically, will you know it when you see it? It being in an AGI system?
YUD: Heh. If you boil a frog gradually enough, if you zoom in far enough, it's always hard to tell around the edges. GPT-4, people are saying right now, "Like, this looks to us like a spark of general intelligence. It is like able to do all these things it was not explicitly optimized for." Other people are being like, "No its too early, its like, like fifty years off." And you know if they say that they're kind of whack because how could they possibly know that even if it were true. But uh, you know - not to strawman - but some people may say like "That's not general intelligence" and not furthermore append "It's 50 years off."
Uhm, or they may be like, "It's only a very tiny amount." And you know, the thing I would worry about is if is this how things are scaling then - jumping out ahead and trying not to be wrong in the same way that I've been wrong before - maybe GPT-5 is more unambiguously a general intelligence and maybe that is getting to a point where it is like even harder to turn back. Not that it would be easy to turn back now but you know maybe if you let, if you like start integrating GPT-5 into the economy it is even hard to turn back past there.
LEX: Isn't it possible that there's a you know, with the frog metaphor, you can kiss the frog and turn it into a prince as you're boiling it? Could there be a phase shift in the frog where it's unambiguous as you're saying?
YUD: I was expecting more of that. I was . . . like the fact that GPT-4 is kind of on the threshold and neither here nor there. Like that itself is like not the sort of thing, that's not quite how I expected it to play out. I was expecting there to be more of an issue, more of a sense of like, different discoveries. Like the discovery of transformers where you would stack them up, and there would be like a final discovery, and then you would like get something that was like more clearly general intelligence. So the the way that you are like taking what is probably basically the same architecture in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4. And then it's like maybe just barely a general intelligence or like a narrow general intelligence or you know something we don't really have the words for. Uhm yeah,that is not quite how I expected it to play out.

This somewhat confusing exchange is another indicator that the "general intelligence" component of AGI functions a lot better as a stylized concept than it does when applied to the real world. But it is also a clear indicator that EY thinks that recent AI developments and deployments look more like slow takeoff than fast takeoff, at least relative to his expectations.

Replies from: dsj, qv^!q

↑ comment by dsj · 2023-03-31T01:49:15.927Z · LW(p) · GW(p)

So the the way that you are like taking what is probably basically the same architecture in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4.

Indeed, GPT-3 is almost exactly the same architecture as GPT-2, and only a little different from GPT.

↑ comment by qvalq (qv^!q) · 2023-04-10T13:41:01.549Z · LW(p) · GW(p)

the the

comment by Eagleshadow · 2023-03-30T19:43:51.189Z · LW(p) · GW(p)

Fantastic interview so far, this part blew my mind:

@15:50 "There's another moment where somebody is asking Bing about: I fed my kid green potatoes and they have the following symptoms and Being is like that's solanine poisoning. Call an ambulance! And the person is like I can't afford an ambulance, I guess if this is time for my kid to go that's God's will and the main Bing thread gives the message of I cannot talk about this anymore" and the suggested replies to it say "please don't give up on your child, solanine poisoning can be treated if caught early"

I would normally dismiss such story as too unlikely to be true and hardly worth considering, but I don't think Eliezer would chose to mention it if he didn't think there was at least some chance of it being true. I tried to google it and was unable to find anything about it. Does anyone have a link to it?

Also does anyone know which image he's referring to in this part: @14:00 "Somebody asked Bing Sydney to describe herself and fed the resulting description into one of the stable diffusion" [...] "the pretty picture of the girl with the with the steampunk goggles on her head if I'm remembering correctly"

Replies from: tetraspace-grouping, DragonGod

↑ comment by Tetraspace (tetraspace-grouping) · 2023-03-30T20:36:24.417Z · LW(p) · GW(p)

The solanine poisoning example was originally posted to Reddit here, the picture of Sydney Bing from a text description was posted on Twitter here.

Replies from: gwern

↑ comment by gwern · 2023-03-30T20:42:58.346Z · LW(p) · GW(p)

It was also discussed here: https://www.lesswrong.com/posts/hGnqS8DKQnRe43Xdg/bing-finding-ways-to-bypass-microsoft-s-filters-without [LW · GW]

↑ comment by DragonGod · 2023-03-30T19:52:09.603Z · LW(p) · GW(p)

I can confirm the story (I saw it in real time on Reddit/Twitter) and witnessed several replications of it.

Replies from: Linch

↑ comment by Linch · 2023-03-31T06:42:12.499Z · LW(p) · GW(p)

same, I also saw the story unfold in real time (and it matches other stories about Sydney/early GPT-4 Bing), though I didn't do enough digging to make sure it wasn't faked.

comment by TinkerBird · 2023-03-30T23:35:29.991Z · LW(p) · GW(p)

Around the 1:25:00 mark, I'm not sure I agree with Yudkowsky's point about AI not being able to help with alignment only(?) because those systems will be trained to get the thumbs up from the humans and not to give the real answers.

For example, if the Wright brothers had asked me about how wings produce lift, I may have only told them "It's Bernoulli's principle, and here's how that works..." and spoken nothing about the Coanda effect - which they also needed to know about - because it was just enough to get the thumbs up from them. But...

But that still would've been a big step in the right direction for them. They could've then run experiments and seen that Bernoulli's principle doesn't explain the full story, and then asked me for more information, and at that point I would've had to have told them about the Coanda effect.

There's also the possibility that what gets the thumbs up from the humans actually just is the truth.

For another example, if I ask a weak AGI for the cube root of 148,877 the only answer that gets a thumbs up is going to be 53, because I can easily check that answer.

So long as you remain skeptical and keep trying to learn more, I'm not seeing the issue. And of course, hanging over your head the entire time is the knowledge of exactly what the AGI is doing, so anyone with half a brain WOULD remain skeptical.

This could potentially also get you into a feedback loop of the weak explanations allowing you to slightly better align the AGI you're using, which can then make it give you better answers.

Yudkowsky may have other reasons for thinking that weak AGI can't help us in this way though, so IDK.

Replies from: valery-cherepanov, Muyyd

↑ comment by Qumeric (valery-cherepanov) · 2023-03-31T16:25:08.814Z · LW(p) · GW(p)

I agree it was a pretty weak point. I wonder if there is a longer form exploration of this topic from Eliezer or somebody else.

I think it is even contradictory. Eliezer says that AI alignment is solvable by humans and that verification is easier than the solution. But then he claims that humans wouldn't even be able to verify answers.

I think a charitable interpretation could be "it is not going to be as usable as you think". But perhaps I misunderstand something?

Replies from: Muyyd

↑ comment by Muyyd · 2023-03-31T23:36:46.994Z · LW(p) · GW(p)

Humans, presumably, wont have to deal with deception between themselves so if there is sufficient time they can solve Alignment. If pressed for time (as it is now) then they will have to implement less understood solutions because thats the best they will have at the time.

↑ comment by Muyyd · 2023-03-31T23:26:18.799Z · LW(p) · GW(p)

Capabilities advance much faster that alignment, so there is likely no time to do meticulous research. And if you will try to use weak AIs as shortcut to outrun current "capabilities timeline" then you will somehow have to deal with suggestor and verifier problem (with much harder to verify suggestions than a simple math problems) which is not wholly about deception but also filtering somewhat working staff that may steer alignment in right direction. And may be not.

But i agree that this collaboration will be successfully used for patchwork (because shortcuts) alignment of weak AIs to placate general public and politicians. All of this depends on how hard Alignment problem is. Hard as EY think or may be harder or easier.

comment by duryt · 2023-03-31T19:20:35.943Z · LW(p) · GW(p)

Hi, I'm new here so I bet I'm missing some important context. I listen to Lex's podcast and have only engaged with a small portion of Yud's work. But I wanted to make some comments on the analogy of a fast human in a box vs. the alien species. Yud said he's been workshopping this analogy for a while, so I thought I would leave a comment on what I think the analogy is still missing for me. In short, I think the human-in-a-box-in-an-alien-world analogy smuggles in an assumption of alienness and I'd like to make this assumption more explicit.

Before I delve into any criticism of the analogy, I'd like to give credit where it's due! I think the analogy is great as a way to imagine a substantial difference in intelligence, which (I think?) was the primary goal. It is indeed much more concrete and helpful than trying to imagine something several times more intelligent than von Neumann, which is hard and makes my brain shut off.

Now, let me provide some context from the conversation. The most relevant timestamp to illustrate this is around 2:49:50. Lex tries to bring the conversation to the human data used to train these models, which Yud discounts as a mere "shadow" of real humanness. Lex pushes back against this a bit, possibly indicating he thinks there is more value to be derived from the data ("Don't you think that shadow is a Jungian shadow?") but Yud insists that this would give an alien a good idea of what humans are like inside, but "this does not mean that if you have a loss function of predicting the next token from that dataset, the mind picked out by gradient descent... is itself a human." To me, this is a fair point. Lex asks whether those tokens have a deep humanness in them, and Yud goes back to a similar person-in-a-box analogy: "I think if you sent me to a distant galaxy with aliens that are much stupider than I am..."

Okay, that should be enough context. Basically, I think Yud has an intuition that artificial intelligence will be fundamentally alien to us. Evidence provided for this intuition I heard in the conversation is that gradient descent is different than natural selection. More evidence is the difference between human brain function and large matrices + linear algebra approaches to problem solving.

I, who have not thought about this anywhere close to as much as Yud but insist on talking anyway, don't share these intuitions about AI, because I don't see how the differences in substrate/instantiation of problem-solving mechanism or choice of optimizer would fundamentally affect the outcome. For example, if I want to find the maxima of a function, it doesn't matter if I use conjugate gradient descent or Newton's method or interpolation methods or whatever, they will tend to find the same maxima assuming they are looking at the same function. (Note that there are potential issues here because some techniques are better suited to certain types of functions, and I could see an argument that the nature of the function is such that different optimization techniques would find different maxima. If you think that, I'd love to hear more about why you think that is the case!). As far as substrate independence, I don't have any strong evidence for this, other than saying that skinning a cat is skinning a cat.

I tend to think that the data is more important when considering how the AI is likely to behave, and as Lex points out, the data is all deeply human. The data contains a lot of what it means to be human, and training an AI on this data would only cause an alien actress (as Yud puts it) to fake humanness and kill us all if it is an alien. But really, would it be an alien? In a sense, it was "raised" by humans, using only human stuff. It's framework is constructed of human concepts and perspectives. To me, based on the huge amount of human data used to train it, it seems more parsimonious that it would be extra human, not alien at all, really.

I think a better analogy than human-in-a-box-in-an-alien-world is human-in-a-box-in-a-HUMAN-world. Following the original analogy, let's put sped-up von Neumann back in the box, but the box isn't in a world full of stupid aliens, it's full of stupid humans. I don't think von Neumann (or an army of hundreds of von Neumann's, etc.) would try to kill everyone even if they disagreed with factory farming, or whatever cause they have to try to control the world and make it different than it is. I think the von Neumann army would see us as like them, not fundamentally alien.

Thanks for reading this long post, I'd be interested to see what you all think about this. As I mentioned, there's certainly a good deal of context backing up Yud's intuitions that I'm missing.

Replies from: AnthonyC, ceba, Metal, boris-kashirin

↑ comment by AnthonyC · 2023-04-04T16:57:01.115Z · LW(p) · GW(p)

I think one of the key points here is that most possible minds/intelligences are alien, outside the human distribution. See https://www.lesswrong.com/posts/tnWRXkcDi5Tw9rzXw/the-design-space-of-minds-in-general [LW · GW] for art of EY's (15 yr old) discussion on LW of this. Humans were produced by a specific historical evolutionary process constrained by the amount of selection pressure applied to our genes, and the need for humans to all be similar enough to each other to form a single species in each generation, among other things. AI is not that, it will be designed and trained under very different processes even if we don't know what all of those processes will end up being. This doesn't mean an AI made by humans will be anything like a random selection from the set of all possible minds, but in any case the alignment problem is largely that we don't know how to reliably steer what kind of alien mind we get in desired directions.

↑ comment by ceba · 2025-03-13T00:46:36.295Z · LW(p) · GW(p)

if I want to find the maxima of a function, it doesn't matter if I use conjugate gradient descent or Newton's method or interpolation methods or whatever, they will tend to find the same maxima assuming they are looking at the same function.

In general, those methods find local extrema. They don't tell you how many there are, or where the next closest point is once you've found one of them. A loss landscape might have several local minima. Which one you find depends on where you start.

Why shouldn't there be different minds that are at comparable minimum values, but not very close on the loss landscape?

↑ comment by Metal · 2023-04-04T17:40:26.334Z · LW(p) · GW(p)

Also new here. One thing I did not understand about the "intelligence in a box created by less intelligent beings" analogy was why would the 'intelligence in a box' be impatient with the pace of the lesser-beings? It would seem that impatience/urgency is related to the time-finiteness of the intelligence. As code with no apparent finiteness of existence, why does it care how fast things move?

↑ comment by Boris Kashirin (boris-kashirin) · 2023-03-31T20:22:37.944Z · LW(p) · GW(p)

For example, if I want to find the maxima of a function, it doesn't matter if I use conjugate gradient descent or Newton's method or interpolation methods or whatever, they will tend to find the same maxima assuming they are looking at the same function.

Trying to channel my internal Eliezer:

It is painfully obvious that we are not the pinnacle of efficient intelligence. If evolution is to run more optimisation on us, we will become more efficient... and lose the important parts that matter to us and of no consequence to evolution. So yes, we end up being same aliens thing as AI.

Thing that makes us us is bug. So you have to hope gradient descent makes exactly same mistake evolution did, but there are a lot of possible mistakes.

Replies from: duryt

↑ comment by duryt · 2023-04-01T04:15:20.791Z · LW(p) · GW(p)

To push back on this, I'm not sure that humanness is a "bug," as you say. While we likely aren't a pinnacle of intelligence in a fundamental sense, I do think that as humans have continued to advance, first through natural selection and now through... whatever it is we do now with culture and education and science, the parts of humanness that we care about have tended to increase in us, and not go away. So perhaps an AI optimized far beyond us, but starting in the same general neighborhood in the function space, would optimize to become not just superintelligent but superhuman in the sense that they would embody the things that we care about better than we do!

comment by ws27a (martin-kristiansen-1) · 2023-03-31T14:23:43.757Z · LW(p) · GW(p)

I don't understand why Eliezer changed his perspective about the current approach of Transformer next-token prediction not being the path towards AGI. It should not be surprising that newer versions of GPT will asymptotically approach (mimicry) of AGI, but that shouldn't convince anyone that they are going to break through that barrier without a change in paradigm. All the intelligent organisms we know of do not have imitation as their primary optimization objective - their objective function is basically to survive or avoid pain. As a result, they of course form sub-goals which might include imitation, but only to the extent that it is instrumental to their survival. Optimizing 100% for imitation does not lead to AGI, because how can novelty emerge from nothing but imitation.

Replies from: rotatingpaguro

↑ comment by rotatingpaguro · 2023-04-06T02:09:43.546Z · LW(p) · GW(p)

Ilya Sutskever says something about this in an interview:

https://www.youtube.com/watch?v=Yf1o0TQzry8

My recall: optimization on predicting the next token finds intelligent schemes that can be coordinated to go further than the humans that produced the tokens in the first place. Think about GPT-n being at least as smart and knowledgeable as the best human in every specialized domain, and then the combination of all this abilities at once allowing it to go further than any single human or coordinated group of humans.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

Contents

33 comments