Yudkowsky on AGI risk on the Bankless podcast

post by Rob Bensinger (RobbBB) · 2023-03-13T00:42:22.694Z · LW · GW · 5 comments

5 comments

Comments sorted by top scores.

comment by Rob Bensinger (RobbBB) · 2023-03-13T02:17:33.652Z · LW(p) · GW(p)

Gratitude to Andrea_Miotti, remember, and vonk for posting more-timely transcripts of this so LW could talk about it at the time -- and for providing a v1 transcript to give me a head start.

Here's a small sample of the edits I made to the previous Bankless transcript [LW · GW] on LW, focusing on ones where someone may have come away from the original transcript with a wrong interpretation or important missing information (as opposed to, e.g., the sentences that are just very hard to parse in the original transcript because too many filler words and false starts to sentences were left in):

  • Predictions are hard, especially about the future. I sure hope that this is where it saturates. This is like the next generation. It goes only this far, it goes no further
    • Predictions are hard, especially about the future. I sure hope that this is where it saturates — this or the next generation, it goes only this far, it goes no further
  • the large language model technologies, basic vulnerabilities, that's not reliable.
    • the large language model technologies’ basic vulnerability is that it’s not reliable
  • So you're saying this is super intelligence, we'd have to imagine something that knows all of the chess moves in advance. But here we're not talking about chess, we're talking about everything.
    • So you're saying [if something is a] superintelligence, we'd have to imagine something that knows all of the chess moves in advance. But here we're not talking about chess, we're talking about everything.
  • Ryan: The dumb way to ask that question too is like, Eliezer, why do you think that the AI automatically hates us? Why is it going to- It doesn't hate you. Why does it want to kill us all?
    • Ryan: The dumb way to ask that question too is like, Eliezer, why do you think that the AI automatically hates us? Why is it going to—

      Eliezer:  It doesn't hate you.

       Ryan: Why does it want to kill us all?
  • That's an irreducible source of uncertainty with respect to superintelligence or anything that's smarter than you. If you could predict exactly what it would do, it'd be that smart. Yourself, it doesn't mean you can predict no facts about it.
    • That's an irreducible source of uncertainty with respect to superintelligence or anything that's smarter than you. If you could predict exactly what it would do, you'd be that smart yourself. It doesn't mean you can predict no facts about it.
  • Eliezer: I mean, I could say something like shut down all the large GPU clusters. How long do I have God mode? Do I get to like stick around?
    • Eliezer: I mean, I could say something like shut down all the large GPU clusters. How long do I have God mode? Do I get to like stick around for seventy years?
  • Ryan: And do you think that's what happens? Yeah, it doesn't help with that. We would see evidence of AIs, wouldn't we?

    Ryan:  Yeah. Yes. So why don't we?
    • Ryan: And do you think that's what happens? Yeah, it doesn't help with that. We would see evidence of AIs, wouldn't we?

      Eliezer: Yeah.

      Ryan:  Yes. So why don't we?
  • It's surprising if the thing that you're wrong about causes the rocket to go twice as high on half the fuel you thought was required and be much easier to steer than you were afraid of. The analogy I usually use for this is, very early on in the Manhattan Project, they were worried about what if the nuclear weapons can ignite fusion in the nitrogen in the atmosphere. 
    • It's surprising if the thing that you're wrong about causes the rocket to go twice as high on half the fuel you thought was required and be much easier to steer than you were afraid of.

      Ryan: So, are you...

      David: Where the alternative was, “If you’re wrong about something, the rocket blows up.”

      Eliezer: Yeah. And then the rocket ignites the atmosphere, is the problem there.

      O rather: a bunch of rockets blow up, a bunch of rockets go places... The analogy I usually use for this is, very early on in the Manhattan Project, they were worried about “What if the nuclear weapons can ignite fusion in the nitrogen in the atmosphere?”
  • But you're saying if we do that too much, all of a sudden the system will ignite the whole entire sky, and then we will all know.

    Eliezer: You can run chatGPT any number of times without igniting the atmosphere.
    • But you're saying if we do that too much, all of a sudden the system will ignite the whole entire sky, and then we will all...

      Eliezer: Well, no. You can run ChatGPT any number of times without igniting the atmosphere.
  • I mean, we have so far not destroyed the world with nuclear weapons, and we've had them since the 1940s. Yeah, this is harder than nuclear weapons. Why is this harder?
    • I mean, we have so far not destroyed the world with nuclear weapons, and we've had them since the 1940s.

      Eliezer: Yeah, this is harder than nuclear weapons. This is a lot harder than nuclear weapons.

      Ryan: Why is this harder?
  • And there's all kinds of, like, fake security. It's got a password file. This system is secure. It only lets you in if you type a password.
    • And there's all kinds of, like, fake security. “It's got a password file! This system is secure! It only lets you in if you type a password!”
  • And if you never go up against a really smart attacker, if you never go far to distribution against a powerful optimization process looking for holes,
    • And if you never go up against a really smart attacker, if you never go far out of distribution against a powerful optimization process looking for holes,
  • Do they do, are we installing UVC lights in public, in, in public spaces or in ventilation systems to prevent the next respiratory born pandemic respiratory pandemic? It is, you know, we, we, we, we lost a million people and we sure did not learn very much as far as I can tell for next time. We could have an AI disaster that kills a hundred thousand people. How do you even do that? Robotic cars crashing into each other, have a bunch of robotic cars crashing into each other.
    • Are we installing UV-C lights in public spaces or in ventilation systems to prevent the next respiratory pandemic? You know, we lost a million people and we sure did not learn very much as far as I can tell for next time.

      We could have an AI disaster that kills a hundred thousand people—how do you even do that? Robotic cars crashing into each other? Have a bunch of robotic cars crashing into each other! It's not going to look like that was the fault of artificial general intelligence because they're not going to put AGIs in charge of cars.
  • Guern
    • Gwern
  • When I dive back into the pool, I don't know, maybe I will go off to conjecture or anthropic or one of the smaller concerns like Redwood Research, being the only ones I really trust at this point, but they're tiny, and try to figure out if I can see anything clever to do with the giant inscrutable matrices of floating point numbers.
    • When I dive back into the pool, I don't know, maybe I will go off to Conjecture or Anthropic or one of the smaller concerns like Redwood Research—Redwood Research being the only ones I really trust at this point, but they're tiny—and try to figure out if I can see anything clever to do with the giant inscrutable matrices of floating point numbers.
  • We have people in crypto who are good at breaking things, and they're the reason why anything is not on fire. Some of them might go into breaking AI systems instead because that's where you learn anything. Any fool can build a crypto system that they think will work. Breaking existing crypto systems, cryptographical systems is how we learn who the real experts are.
    • We have people in crypto[graphy] who are good at breaking things, and they're the reason why anything is not on fire. Some of them might go into breaking AI systems instead, because that's where you learn anything.

      You know: Any fool can build a crypto[graphy] system that they think will work. Breaking existing cryptographical systems is how we learn who the real experts are.
  • And who else disagrees with me? I'm sure Robin Hanson would be happy to come up. Well, I'm not sure he'd be happy to come on this podcast, but Robin Hanson disagrees with me, and I feel like the famous argument we had back in the early 2010s, late 2000s about how this would all play out. I basically feel like this was the Yudkowsky position, this is the Hanson position, and then reality was over here, well to the Yudkowsky side of the Yudkowsky position in the Yudkowsky-Hanson debate.
    • Who else disagrees with me? I'm sure Robin Hanson would be happy to come on... well, I'm not sure he'd be happy to come on this podcast, but Robin Hanson disagrees with me, and I kind of feel like the famous argument we had [? · GW] back in the early 2010s, late 2000s about how this would all play out—I basically feel like this was the Yudkowsky position, this is the Hanson position, and then reality was over here, well to the Yudkowsky side of the Yudkowsky position in the Yudkowsky-Hanson debate.
  • But Robin Hanson does not feel that way. I would probably be happy to expound on that at length.
    • But Robin Hanson does not feel that way, and would probably be happy to expound on that at length. 
  • Open sourcing all the demon summoning circles is not the correct solution. I'm not even using, and I'm using Elon Musk's own terminology here. And they talk about AI is summoning the demon,
    • Open sourcing all the demon summoning circles is not the correct solution. And I'm using Elon Musk's own terminology here. He talked about AI as “summoning the demon”,
  • You know, now, now the stuff that would, that was obvious back in 2015 is, you know, starting to become visible and distance to others and not just like completely invisible. 
    • You know, now the stuff that was obvious back in 2015 is, you know, starting to become visible in the distance to others and not just completely invisible.
  • I, I suspect that if there's hope at all, it comes from a technical solution because the difference between technical solution, technical problems and political problems is at least the technical problems have solutions in principle.
    • I suspect that if there's hope at all, it comes from a technical solution, because the difference between technical problems and political problems is at least the technical problems have solutions in principle.

The Q&A transcript on LW is drastically worse, to the point that it might well reduce the net accuracy of readers' beliefs if they aren't careful? I won't try to summarize all the important fixes I made to that transcript, because there are so many. I also cut out the first 15 minutes of the Q&A, which are Eliezerless and mostly consist of Bankless ads and announcements.

Replies from: Rana Dexsin
comment by Rana Dexsin · 2023-03-13T04:23:52.158Z · LW(p) · GW(p)

We have people in crypto[graphy] who are good at breaking things, and they're the reason why anything is not on fire. Some of them might go into breaking AI systems instead, because that's where you learn anything.

Was there out-of-band clarification that Eliezer meant “cryptography” here (at 01:28:41)? He verbalized “crypto”, and I interpreted it as “cryptocurrency” myself, partly to tie things in with both the overall context of the podcast and the hosts' earlier preemptively-retracted question which was more clearly about cryptocurrency. Certainly I would guess that the first statement there is informally true either way, and there's a lot of overlap. (I don't interpret the “cryptosystem” reference a few sentences later to bias it much, to be clear, due to that overlap.)

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2023-03-13T05:22:15.167Z · LW(p) · GW(p)

The verbatim statement is:

We have people in crypto who are good at breaking things, and they're the reason why anything is not on fire. And some of them might go into breaking AI systems instead, 'cause that's where you learn anything.

You know, you know, any fool can build a crypto system that they think will work. Breaking existing crypto systems -- cryptographical systems -- is how we learn who the real experts are. So maybe the people finding weird stuff to do with AIs, maybe those people will come up with some truth about these systems that makes them easier to align than I suspect.

When he says "cryptographical systems", he's clarifying what he meant by "crypto" in the previous few clauses (this is a bit clearer from the video, where you can hear his tone). He often says stuff like this about cryptography and computer security; e.g., see the article Eliezer wrote on Arbital called Show me what you've broken:

See AI safety mindset. If you want to demonstrate competence at computer security, cryptography, or AI alignment theory, you should first think in terms of exposing technically demonstrable flaws in existing solutions, rather than solving entire problems yourself. Relevant Bruce Schneier quotes: "Good engineering involves thinking about how things can be made to work; the security mindset involves thinking about how things can be made to fail" and "Anyone can invent a security system that he himself cannot break. Show me what you've broken to demonstrate that your assertion of the system's security means something."

See also So Far: Unfriendly AI Edition:

And above all, aligning superhuman AI is hard for similar reasons to why cryptography is hard. If you do everything right, the AI won’t oppose you intelligently; but if something goes wrong at any level of abstraction, there may be powerful cognitive processes seeking out flaws and loopholes in your safety measures.

When you think a goal criterion implies something you want, you may have failed to see where the real maximum lies. When you try to block one behavior mode, the next result of the search may be another very similar behavior mode that you failed to block. This means that safe practice in this field needs to obey the same kind of mindset as appears in cryptography, of “Don’t roll your own crypto” and “Don’t tell me about the safe systems you’ve designed, tell me what you’ve broken if you want me to respect you” and “Literally anyone can design a code they can’t break themselves, see if other people can break it” and “Nearly all verbal arguments for why you’ll be fine are wrong, try to put it in a sufficiently crisp form that we can talk math about it” and so on. (AI safety mindset)

And Security Mindset and Ordinary Paranoia.

Replies from: Rana Dexsin
comment by Rana Dexsin · 2023-03-13T05:41:18.026Z · LW(p) · GW(p)

I did in fact go back and listen to that part, but I interpreted that clarifying expansion as referring to the latter part of your quoted segment only, and the former part of your quoted segment to be separate—using cryptocurrency as a bridging topic to get to cryptography afterwards. Anyway, your interpretation is entirely reasonable as well, and you probably have a much better Eliezer-predictor than I do; it just seemed oddly unconservative to interpolate that much into a transcript proper as part of what was otherwise described as an error correction pass.

comment by Gabriel Mukobi (gabe-mukobi) · 2023-03-13T05:54:03.454Z · LW(p) · GW(p)

Thanks for posting this, I listened to the podcast when it came out but totally missed the Twitter Spaces follow-up Q&A which you linked and summarized [LW · GW] (tip for others: go to 14:50 for when Yudkowsky joins the Twitter Q&A)! I found the follow-up Q&A somewhat interesting (albeit less useful than the full podcast), it might be worth highlighting that more, perhaps even in the title.