Robin Hanson AI X-Risk Debate — Highlights and Analysis

liron

Robin Hanson AI X-Risk Debate — Highlights and Analysis

post by Liron · 2024-07-12T21:31:02.222Z · LW · GW · 7 comments

This is a link post for https://www.youtube.com/watch?v=i0DxeBhjzlY

  Introduction
  Robin's AI Timelines
  Culture vs. Intelligence
  Innovation Accumulation vs. Intelligence
  Optimization = Collecting + Spreading?
  Worldwide Growth
  Extrapolating Robust Trends
  Seeing Optimization-Work
  Exponential Growth With Respect To...
  Can Robin's Methodology Notice Foom In Time?
  Foom Argument As Conjunction
  Headroom Above Human Intelligence
  More on Culture vs. Intelligence
  The Goal-Completeness Dimension
  AI Doom Scenario
  Corporations as Superintelligences
  Monitoring for Signs of Superintelligence
  Will AI Society's Laws Protect Humans?
  Feasibility of ASI Alignment
  Robin's Warning Shot
  The Cruxes
    Crux #1: Can a localized mind be a vastly superhuman optimizer?
    Crux #2: Can we rely on future economic data to be a warning shot?
  How to Present the AI Doom Argument
  About Doom Debates
None
7 comments

This linkpost contains a lightly-edited transcript of highlights of my recent AI x-risk debate with Robin Hanson [LW · GW], and a written version of what I said in the post-debate analysis episode of my Doom Debates podcast.

Introduction

I've poured over my recent 2-hour AI x-risk debate with Robin Hanson [LW · GW] to clip the highlights and write up a post-debate analysis, including new arguments I thought of after the debate was over.

I've read everybody's feedback on YouTube and Twitter, and the consensus seems to be that it was a good debate. There were many topics brought up that were kind of deep cuts into stuff that Robin says.

On the critical side, people were saying that it came off more like an interview than a debate. I asked Robin a lot of questions about how he sees the world and I didn't "nail" him. And people were saying I wasn't quite as tough and forceful as I am on other guests. That's good feedback; I think it could have been maybe a little bit less of a interview, maybe a bit more about my own position, which is also something that Robin pointed out at the end.

There's a reason why the Robin Hanson debate felt more like an interview. Let me explain:

Most people I debate have to do a lot of thinking on the spot because their position just isn't grounded in that many connected beliefs. They have like a few beliefs. They haven't thought that much about it. When I raise a question, they have to think about the answer for the first time.

And usually their answer is weak. So what often happens, my usual MO, is I come in like Kirby. You know, the Nintendo character where I first have to suck up the other person's position, and pass their Ideological Turing test. (Speaking of which, I actually did an elaborate Robin Hanson Ideological Turing Test exercise beforehand, but it wasn't quite enough to fully anticipate the real Robin's answers.)

With a normal guest, it doesn't take me that long because their position is pretty compact; I can kind of make it up the same way that they can. With Robin Hanson, I come in as Kirby. He comes in as a pufferfish. So his position is actually quite complex, connected to a lot of different supporting beliefs. And I asked him about one thing and he's like, ah, well, look at this study. He's got like a whole reinforced lattice of all these different claims and beliefs. I just wanted to make sure that I saw what it is that I'm arguing against.

I was aiming to make this the authoritative followup to the 2008 Foom Debate [? · GW] that he had on Overcoming Bias with Eliezer Yudkowsky. I wanted to kind of add another chapter to that, potentially a final chapter, cause I don't know how many more of these debates he wants to do. I think Eliezer has thrown in the towel on debating Robin again. I think he's already said what he wants to say.

Another thing I noticed going back over the debate is that the arguments I gave over the debate were like 60% of what I could do if I could stop time. I wasn't at 100% and that's simply because realtime debates are hard. You have to think of exactly what you're going to say in realtime. And you have to move the conversation to the right place and you have to hear what the other person is saying. And if there's a logical flaw, you have to narrow down that logical flaw in like five seconds. So it is kind of hard-mode to answer in realtime.

I don't mind it. I'm not complaining. I think realtime is still a good format. I think Robin himself didn't have a problem answering me in realtime. But I did notice that when I went back over the debate, and I actually spent five hours on this, I was able to craft significantly better counterarguments to the stuff that Robin was saying, mostly just because I had time to understand it in a little bit more detail.

The quality of my listening when I'm not inside the debate, when I'm just listening to it on my own, I'm listening like twice as well, twice as closely. I'm pausing and really thinking, why is Robin saying this? Is he referencing something? Is he connecting it to another idea that he's had before? I'm just having more time to process offline.

So you're going to read some arguments now that are better than what I said in the debate. However, I do think my arguments during the debate were good enough that we did expose the crux of our disagreement. I think there's enough back and forth in the debate where you will be able to see that Robin sees the world one way and I see it a different way, and you'll see exactly where it clashes, and exactly which beliefs, if one of us were to change our mind, could change the whole argument.

And that's what rationalists call the crux of disagreement [? · GW]. The crux is not just some random belief you have. It's a particular belief you had that, if you switched it, then you would switch your conclusion.

When I debate, all I do is just look for the crux. I don't even try to "win". I don't try to convince the other person. I just try to get them to agree what the crux is and what they would need to be convinced of. And then if they want to go dig further into that crux, if they want to volunteer to change their mind, that's great. But that's not my goal because I don't think that's a realistic goal.

I think that just identifying the crux is highly productive, regardless. I think it brings out good content, the listeners like it. So that's what I do here at Doom Debates. The other thing to note about the debate is I came in with a big outline. I'd done a ton of research about Robin. I'd read pretty much everything he's ever written about AI doom, I listened to interviews.

So I came in with a big outline and as he was talking, I wasn't just trying to respond to exactly what he was saying. I was also trying to guide the conversation to hit on various topics in the outline. And that's part of why I didn't give the perfect, directed response to exactly what he was saying. But now I'm able to do it.

I think it's going to be pretty rare for me to have such a big outline for different guests, largely because different guests positions haven't been as fleshed out and as interconnected as Robin's. It's interesting to observe how having an outline of topics changes the kind of debate you get.

All right, let's go through the debate. I've clipped it down to the 30% that I think is the most substantive, the most relevant to analyze. And we're just going to go clip by clip, and I'll give you some new thoughts and some new counterarguments.

Robin's AI Timelines

Robin has an unusually long prediction of AI timelines. He says it could take a hundred years to get to AGI, and he bases it on a key metric of human job replacement. He's just trying to extrapolate the trends of AI, taking the job of humans, creating the economic value that humans are creating. That's his key metric.

I have a different key metric because I think of things in terms of optimization power. My key metric is the breadth and depth of optimization power. How many different domains are we seeing AI is entering into? And how strong are they in all these different domains? So when I see self-driving cars, maybe I don't see them displacing that many human employees yet. But I see we can handle more edge cases than ever. We can now drive in the San Francisco bay area, thanks to Waymo. Last I checked, it's something like 10 times safer than a human driver. So that would be depth of optimization: they can drive better than a human at the entire San Francisco bay area. That's what I'm looking at.

That trend seems to be going like a freight train. It seems to be accelerating. It seems to be opening new domains all the time. When you talk about breadth, the fact that LLMs can now handle arbitrary English queries and they can connect together topics in a way that's never been done before, across different domains, they can do a primitive form of reasoning when they give you the answer and they can do, they can essentially solve the symbol grounding problem in these arbitrary domain. So I'm seeing all this smoke coming out in terms of AI is getting better at breadth and depth of their optimization.

But Robin has a totally different key metric, and that's where his estimate is coming from.

Robin: For me, the most relevant metric is when are they able to do most jobs, say, more cost effectively than humans? I mean, what percentage of jobs can they do? Basically, what percentage of the economy does AI or computer automation take?
That's to me the most interesting metric.
Liron: So in your view, it's pretty plausible that we can get to 2100 and there's still jobs that humans are doing better than AIs?
Robin: Right. That's not at all crazy.

To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking, which is on track to get dangerous.

Robin: I have to ask, okay, what's this theory by which something else is going to happen that I need to track other things, and do I believe it?

Here's where I explained to Robin about my alternate metric, which is optimization power. I tell him about natural selection, human brains, AGI.

Culture vs. Intelligence

Robin doesn't even see human brains as the only major milestone in the optimization power story. He talks a lot about culture.

Robin: Let's take the optimization framing. I might say culture deserves to be on the list of historical optimization machines, after brains. And I might object to trying to do a history in terms of optimization without noticing that culture should be on the list. I would find that suspicious if you...
Liron: And why should culture be on the list in your view?
Robin: Because that's humanity's superpower. That's the thing that distinguishes, I mean, we've had brains for half a million years, right? What distinguished humans wasn't so much having bigger brains, it was having the sort of brains that could enable culture to take off and culture is the thing that's allowed us to become optimized much faster than other animals, not merely having bigger brains.
So I think if you talk about human brains as an event in optimization that's just very puzzling because human brains weren't that much bigger than other brains and brains happened a long time ago. So what's the recent event in the optimization story? It wasn't brains. It was culture.

Here I should have done a better job drilling down into culture versus brains, because that's an interesting crux of where I disagree with Robin.

Culture is basically multiple brains passing notes. The ability to understand any individual concept or innovation happens in one brain. Culture doesn't give you that.

Sure, culture collects innovations for you to understand. "Ape culture" by itself, without that brain support, doesn't make any economic progress. But on the other hand, if you just give apes better brains, I'm pretty confident you'll get better ape culture. And you'll get exponential economic ape growth.

Robin is saying, look, we humans have had brains for half a million years. So culture must have been a key thing we had to mix in before we got rapid human progress, right? I agree that there's a cascade where human level brains don't instantly foom. They build supports like culture. But I see a level distinction.

Liron: Let's classify these types of events into three levels. This was originally introduced in your foom debate with Eliezer. The three levels are:
The dominant optimization process, like natural selection, human brains, AGI.
Meta-level improvements to that process, so like cells, sex, writing, science, and as you just say now, culture, I believe, goes on level two.
Object level innovations, like light bulb, automobile, farming
Is that a useful categorization?
Robin: I don't accept your distinction between (1) and (2) levels at the moment. That is, I don't see why culture should be on level two with writing, as opposed to a level one with DNA evolution.

So that's a crux. Robin thinks culture is as fundamental of a force as brains and natural selection. I think it's definitely not. When we get a superintelligent AI that disempowers humanity, it very likely won't even have culture, because culture is only helpful to an agent if that agent is dependent on other agents.

Innovation Accumulation vs. Intelligence

Now we get to Robin's unified model of all the data points that accelerated economic growth. He says it's about the rate of innovation and the diffusion of innovation.

Robin: So, diffusion of innovation was the key thing that allowed farming to innovate fast. So basically in all the eras so far, the key thing was always the innovation rate. That's what allowed growth. And innovation is the combination of invention and diffusion and typically it's diffusion that matters more not invention.

Did you catch that? He said diffusion matters more. But when we have a superintelligent AI, a brain with more computing power and better algorithms than the sum of humanity, diffusion will be trivial.

Diffusion is just one part of this giant AI talking to another part of this giant AI in under a millisecond.

This is an argument for why to expect a foom, a singularity. We're setting diffusion-time to zero in Robin's model. I would argue that the innovation part of the equation will be vastly faster too. But the argument from instant diffusion of innovations seems pretty powerful, especially since Robin actually thinks diffusion matters more.

Another crux is the difference between my notion of optimization power and Robin's notion of accumulation of optimizations.

Robin: Optimization power isn't quite the same as innovation. We have to talk about the accumulation of optimizations.

I don't know, Robin, how different are these notions really: optimization power vs. innovation, or optimization power vs. accumulation of optimizations?

If Albert Einstein invents Special Relativity in 1905, and then Albert Einstein invents General Relativity in 1915, it seems like Einstein's brain is a one-man optimization accumulator. "Innovation accumulation" seems like a weird way to describe the cognitive work being done in the field of physics, the work of mapping observations to mathematical theories.

I wouldn't say "theoretical physicists, thanks to their culture, accumulate innovations that improve their theories". I'd say that Einstein had high optimization power in the domain of theoretical physics. Einstein used that power to map observations to mathematical physics. He was very powerful as an optimizer for a human.

Unfortunately, he is now a corpse, so his brain no longer has optimization power. So we need other brains to step in and continue the work. That's very different from saying, "Hail almighty culture!"

Optimization = Collecting + Spreading?

When Robin says the key to economic growth is to "collect and spread innovations", he's factoring optimization power into these two components that don't have to be factored. He's not seeing that the nature of the work is a fundamentally mental operation and algorithm. It's goal-optimization work.

Imagine it's 1970 when people didn't know if computers would ever beat humans at chess. Robin might argue:

"The key reason humans play chess well is because we have culture. Humans write books of moves and strategies that other humans study. Humans play games with other humans, and they write down lessons from those games."

In 1970, that would seem like a plausible argument. After all, you can't algorithmically solve Chess. There's no special deep insight for Chess, is there?

Today we have AlphaZero, which immediately jumped to human-level play by starting from the rules of chess and running a general-purpose machine learning algorithm. So this decomposition that Robin likes, where instead of talking about optimization power, we talk about accumulating and diffusing innovation, isn't useful to understand what's happening with AI.

Worldwide Growth

Another point Robin makes is that "small parts of the world find it hard to grow faster than the world".

Robin: Small parts of the world find it hard to grow much faster than the world. That is the main way small parts of the world can grow, is to find a way to trade and interact with the rest of the world so that the entire world can grow together. Obviously you have seen some places growing faster than the world for a time, but mostly what we see is a world that grows together.
The longest sustained exponential growth we've seen have been of the entire world over a longer period. Smaller parts of the world have had briefer periods of acceleration, but then they decline. There's been the rise and fall of civilizations, for example, and species, etc. So the safest thing to predict is that in the future, the world might find a way for the entire world to accelerate faster.

Wait a minute, it's not the entire world that's been accelerating. It's humans. Apes aren't accelerating; they're suffering habitat loss and going extinct. Species that get in our way are going extinct.

The human world is growing because (1) humans have something of value to offer other humans and (2) humans care about the welfare of other humans. That's why we're a unified economy.

It's true that the human world grows as a whole because different parts of the human world are useful inputs to one another. I agree that a system grows together with its inputs. It's just worth noticing that the boundary of the system and its inputs can vary. We no longer use horses as inputs or transportation. So horses aren't growing with the human economy anymore. They're not part of our world.

I assume Robin would respond that only entities capable of copying innovations quickly enough are part of the growing world, and in modern times humans are the only entities beyond that threshold of copying ability that lets them be part of the growing world.

But then we have to ask, why exactly are humans with IQ 80 currently growing together with the world of humans with IQ 120? It's because:

Lower-IQ humans can still perform jobs with appreciable market value
Higher-IQ humans value lower-IQ humans; they don't want to split off their own warring faction and commit IQ genocide

What does it matter that IQ 80 humans have some ability to copy innovation? It only matters to the extent it lets them to continue to perform jobs with appreciable market value.

Maybe that's connected to why Robin suspects that humans will keep holding their value in the job market for a long time? If Robin thought automation was coming soon for IQ 80 humans (leaving IQ 120 humans employed for a while), it'd undermine his claim that smarter agents tend to pull other intelligences along for the economic growth ride.

Extrapolating Robust Trends

What kind of trends does Robin focus on exactly? He usually tries to focus on economic growth trends, but he also extrapolates farther back to look at data that was "a foreshadowing of what was to come".

Liron: Can you clarify what metric is doubling? Because I get GDP during the economy, but what about before? What metric is it?
Robin: If you ask what was the thing that was changing that we look back and say that was a foreshadowing of what was to come and that was important, it's brain size. So then that's what I would picked out for looking at before humans would be the brain size.
If you go before the animal brains. You can just look at the size of genomes and there's even slower growth rate in the size of genomes.
Liron: It sounds like you've got a pretty significant change of metric, right? Where with the animal brains, you're just saying, yeah, the neuron count is doubling, but that's a precursor to when the amount of utility is going to double.
Robin: It's just the trend that matters for the next transitions, we think. So we do think one of the things that enabled humans was big enough brains, which enabled strong enough culture.

Robin is focusing on "the trend that matters for the next transitions". While it's nice that we can look back and retroactively see which trends mattered, which trends foreshadowed major transitions, our current predicament is that we're at the dawn of a huge new trend.

We don't have the data for a superintelligent AI foom today. That data might get logged over the course of a year, a month, a day, and then we're dead. We need to understand the mechanism of what's going to spark a new trend.

Liron: So it sounds like you're just applying some judgment here, right? Like there's no one metric that's an absolute, but you're just kind of trying to connect the dots in whatever way seems like you get the most robust trend?
Robin: Well, for the last three periods, we can talk about world GDP. And then if you want to go further, you can't do that. You gotta pick something else, and so I looked for the best thing I could find. Or you could just say, hey, we just can't project before that, and I'm fine if you want to do that too.

Seeing Optimization-Work

Robin is willing to concede that maybe he should narrow his extrapolation down to just world GDP in the human era so that we have a consistent metric. But I actually agree with Robin's hunch that brain size trend was a highly relevant precursor to human economic growth. I agree that there's some deep, common factor to all this big change that's been happening in the historical record.

I don't know why Robin can't get on the same page, that there's a type of work being done by brains when they increase the fitness of an organism. And there's a type of work being done by humans when they create economic value. That what we've seen is not the ideal version of this type of work, but a rough version. And that now for the first time in history, we're setting out to create the ideal version.

Space travel became possible when we understood rocketry and orbital mechanics. Everything animals do to travel on earth is a version of travel that doesn't generalize to space until human technologists set out to make it generalize.

We now understand that an optimization algorithm, one that human brains manage to implement at a very basic level of proficiency (not like humans are even that smart) is both the ultimate source of power in biological ecosystems (since it lets humans win competitions for any niche) and the source of innovations that one can "accumulate and spread" at a faster and faster timescale.

You want to analyze the economy by talking about innovations? How can we even define what an "innovation" is without saying it's something that helps economic growth? You know, a circular definition.

Robin could give a non-circular definition of innovation like "knowledge or processes that let you do things better". But I think any good definition of innovation is just groping toward the more precise notion of helping optimization processes hit goals. An innovation is a novel thing that grows your ability to hit narrow targets in future state space, grows your ability to successfully hit a target outcome in the future by choosing actions in the present that lead there.

Exponential Growth With Respect To...

Robin insists on staying close to the data without trying to impose too much of a deep model on it, but there are equally valid ways to model the same data. In particular, instead of modeling economic output versus elapsed time, you could model economic output versus optimization input.

Liron: If you look at the forager-farming-industry transition, yeah, on the x-axis of time it's an exponential, but that's because you're holding the human brain's optimization power constant, right? So if your model is that the x-axis is actually optimization power, it might turn out that you get a hyper exponential foom once AGI starts modifying the underlying optimizer, right? So your model then diverges, potentially.
Robin: Well, in all the previous growth modes, lots of things changed. And of course, every transition was some limiting factor being released. We have this history of a relatively small number of transitions and between each transition, roughly exponential growth, and then really big jumps in the growth rate at each transition. That's what history roughly looks like.

His point here is that he's already modeling that different eras had a discontinuous change in the doubling time. So when we get higher intelligence, that can just be the next change that bumps us up to a faster doubling time. So his choice of x-axis, which is time, can still keep applying to the trend, even if suddenly there's another discontinuous change. In fact, his mainline scenario is that something in the near future discontinuously pushes the economic doubling time from 15 years down to two months.

I'd still argue it's pretty likely that we'll get an AI foom that's even faster than an exponential with a two-month doubling time. If you plot the exponential with optimization input as the X axis then you might get a hyperexponential foom when you map that back to time, an exponential on a log scale.

But regardless, even if it is just a matter of AI doubling its intelligence a few times, it still leaves my doom claim intact. My doom claim just rests on AI being able to outmaneuver humanity, to out-optimize humanity, to disempower humanity, and for that AI to stop responding to human commands, and for that AI to not be optimizing for human values. It doesn't require the foom to happen at a certain rate, just pretty quickly. Even if it takes years or decades, that's fast enough, unless humans can align it and catch up their own intelligence, which doesn't look likely.

Can Robin's Methodology Notice Foom In Time?

This section of the debate gets to the crux of why I haven't been convinced to adopt Robin's methodology that makes him think P(doom) is low.

Liron: Let's project Robin Hanson's way of thinking about the world to the dawn of humanity. So like a hundred million years ago or whatever, what abstraction, what way of reasoning would have let us correctly predict humans' present level of power and control? What does Robin Hanson of a million years ago explained about the future?
Robin: The key human superpower was culture. So if you could have looked at the proto-humans and saw their early versions of culture and how are they were able to spread innovations among themselves faster than other animals could, through their simpler, early versions of culture, you would then predict that there will be the meta process of culture inventing new cultural processes that allow culture to work better.
And that did in fact happen, slowly, over a very long time. And you might then have predicted an acceleration of growth rates in the long run. The ability of culture to improve the ability of culture to work would allow humans not to just accumulate innovation, but to accumulate innovation faster, which we did.
The harder part might've been to anticipate the stepwise nature of that, that is, looking back we can see, and that happened in some discrete steps, but looking ahead, may, you might not have been able to anticipate those discrete steps. You might've just been able to see the large shape of acceleration of some sort.
Liron: Let's say Devil's Advocate Dan comes and says, look Robin, look at the academic literature. We know that natural selection accumulates genes and then organisms adapt to their niche. How are you proposing that this magical thing called "culture" is going to get you a single species that starts occupying every imaginable niche? How are you supporting that with evidence? What would you say?
Robin: I'm happy to admit that there's just a lot we find it hard to predict about the future of AI. That's one of the defining characteristics of AI. It's one of the hardest things to envision and predict how & where it will go how.
Liron: This is pretty important to me, because it sounds like you agree that a version of your methodology transported to the dawn of humanity would be blind to what's about to happen with humanity, right? So I'm trying to make the analogy where it seems like maybe you today are blind to what's about to happen with AI foom.
Robin: I'm happy to admit that we have a lot of uncertainty, but you'll have to make the argument why uncertainty translates into a 50% chance of us all dying in the next generation.
Liron: It's almost like you're admitting, look, I have a methodology that isn't going to be great at noticing when there's going to be this huge disruption, but I'm mostly going to be right 'cause there mostly aren't huge disruptions. Am I characterizing you right?
Robin: Certainly if we look over history, say the last, you know, million years, we see relatively steady change — punctuated by accelerations of change, but even these accelerations, in the moment, you can see things accelerating. And so that does suggest that in the future we will see accelerations, but then we will see the beginnings of those accelerations, and we will see them speeding up and we will be able to track the acceleration as it happens. That's what history suggests about future accelerations.
Liron: So in this thought experiment, the Robin of the past is going to observe a 10,000-year slice of humans starting to have culture and be like, aha, this is a new dynamic, this is a new regime. You think you'd notice it then?
Robin: Well, it depends on, you know, what stats we're tracking. At the moment, this past trend projected to the future says that the world together would start to accelerate in its growth rate. We are definitely tracking world growth rates in quite a bit of detail. And it wouldn't happen on one weekend. That is, it would be, say, a five year period.
Liron: But there's something else in this thought experiment that I think is alarming, which is that humans are taking over the niches of other animals. So we're the first multi-niche species, or the first niche-general species, right? And that seems like something that you can't extrapolate.
Robin: There have been other species that have been more general than others, but certainly, we were unusually general compared to most species, but there have been many other general species.
Liron: We're unusual, and we're so general that if we really wanted to, technically we haven't done it yet, but we could potentially colonize the moon, right?
Robin: We probably will in the next 10,000 years.
Liron: What's salient to me is, I feel like you're not bringing a methodology that's going to notice when a new type of species is coming that doesn't fit the trends.
Robin: Again, the question is, what are you tracking and will those capture whatever it is you expect to appear? In the past, new growth has happened gradually so that if you were in the right place tracking the right things, you would see the new growth. So the key question is, how sudden or local could the next growth spurt be compared to past growth spurts? That's the question we're asking.
So you might go back and say, well, look, if you're tracking all the species on Earth, except, you know, the big primates, you might've missed the humans growing. So what's the chance you'll not track at a fine enough resolution to see the key change
Even humans, we started a million years ago to sort of double faster, but that was still only doubling every quarter million years. And you would still have to be looking at the granularity of humans and the nearby species and at their doubling time. If you were just looking at larger units, maybe all mammals or larger continents, you might miss that trend.
So you can raise the question, what trends will you need to be looking at? That's why I might say, look, my key priority is when will AIs take most jobs? So if I decide that that's the thing I care about and that's the thing I track, there's much less of a risk I'll miss it.
You might counterargue that that's not the thing to be tracking and I'm happy to hear that argument, but if I think this is the thing I care about and I'm tracking it, I feel like I got my eye on the thing.

Robin is saying, sure, maybe a foom will start, but we'll have time to adjust when we see the job displacement data picking up steam.

But if you're a tiger species used to having your own ecological niche, and suddenly it's the late 1700s and you see the Industrial Revolution starting, and you see the doubling time of the human economy growing, what do you do then?

(I'm just using tigers as an example because humans drove the Tasmanian tiger to extinction in 1936, via hunting and habitat destruction.)

That tiger part of the world won't grow together with the human economy, it's going to go extinct unless tigers can adapt fast enough to maintain their level of tiger power. If you're a tiger species, you get a few decades to react to the faster doubling time you see in the data during the Industrial Revolution. But your adaptation process, your gene-selection process, takes hundreds of thousands of years to react to environmental changes by selecting for new adaptations. So your reaction time is 1000x too slow.

Liron: I just want to throw in the analogy that we talked about before, which is to what humans did in evolutionary time. It looks like very much like a foom, in that usually species stay within a niche, right? They don't explode and go take off other niches because normally when they do that, the other species push back and you have an ecosystem where all the species can react in evolutionary time. You have an arms race in evolutionary time.
The whole idea that humans are part of an ecosystem and are kept in check by the ecosystem, that is no longer correct, right? So would you be open to an analogy where AI becomes not part of the economy anymore?
Robin: But again, we'll need to talk more specifically about how and why. Other species on the planet a million years ago had very poor abilities to monitor and coordinate their actions. They didn't have councils observing cyber attacks and reporting crimes to a central Bureau of Enforcement to look for things that could go wrong and monitor.

What would it look like for a tiger species to successfully react to humanity?

Robin would probably say that the key is for the tiger species to notice an existential threat emerging all the way back before the industrial revolution, before the human farming revolution, at the dawn of human forager tribes that had culture. The key is noticing early enough that disruption is on the way.

We have to be early enough to stop an exponential... but we know that's tricky business, right? Like trying to eliminate COVID in the early stages by having everyone stay home for two weeks. It's theoretically possible, but it's hard, and it's tempting to react way too late.

My disagreement with Robin becomes about how much smoke we're already seeing from a possible fire.

In my view, being a tiger species witnessing the dawn of human culture is analogous to being a human witnessing the dawn of deep learning in the 2010s. Or even being a human witnessing the dawn of electronic computers in the 1950s. I.J. Good already noticed in the late 1960s that based on the progress of these computers, we might be on a path to an intelligence explosion leading to catastrophic risks for humanity.

The only difference between me and Robin is that Robin thinks we have the luxury of waiting until we observe that AI starts automating jobs at a faster rate, while I think the jobs data won't give us that reaction time. I think we're already the tigers watching the humans building the factories. We're already seeing the early stages of an intelligence explosion that's about to disempower us.

We're about to find ourselves like the tigers kicking themselves, saying darn it, we should've put a lid on those humans when their primitive tribes were starting to sit around a campfire and tell stories. That's when we should have acted to save ourselves. We should have noticed that those stories that foragers were telling around a fire, they were a call to action for us tigers. Then we would have had time to evolve our tiger genes to stay competitive with human genes. That's what Robin is saying in this analogy.

Foom Argument As Conjunction

Robin says he arrives at P(doom) < 1% because he multiplies out a conjunction of independent assumptions:

An initially weak system rapidly improves itself to become very strong, improving many orders of magnitude
We fail to monitor and stop it
The system has a broad range of improved capabilities over many domains.
It has goals and plans and it acts on them
It's not subservient to its creators or its human masters
Its values changed from the values of its creators
None of the other AIs notice and oppose this AI's power grab

That's a conjunction of seven assumptions. If each assumption is a bit hard to believe, say only 50% likely, and each is independent of the other assumptions, then the probability of the whole conjunction is below 1%; that's basically what Robin is arguing.

Robin: It's a whole bunch of elements, each of which is kind of unlikely. And then the whole thing adds up to pretty unlikely.

But this kind of conjunction argument is a known trick, and I did call him out on that.

Liron: There's a known trick where you can frame any position as a conjunction to make it seem unlikely, right? So I could do the opposite way, and I could say, hey, if you think foom is not going to happen:
First you have to agree that AI labs have really great security precautions. Then you have to agree that government regulators can pause it at the times when it's unsafe. Etc. I could frame it as a conjunction too, right?
So that is a little bit of a trick. I kind of object to framing it as a big conjunction like that. Because I have a framing where it just doesn't sound like such a big, scary conjunction.
Robin: Okay, so then your framing would have to make what, in my framing, look like independent choices, somehow be natural consequences of each other. So maybe there's an underlying event that would cause these as a correlated event somehow, but then you'll have to tell me what's the underlying event that causes this correlated set of events all to happen together.

Right. For example, to take one of Robin's claims, that we won't effectively monitor and shut down a rogue AGI — that might be a questionable assumption when taken on its own, but if you accept a couple of the other assumptions, like the assumption that a system rapidly improves by orders of magnitude and has goals that don't align with human values, well, entertaining that scenario gets you most of the way toward accepting that monitoring would probably have failed somewhere along the way. So it's not like these assumptions are independent probabilities.

When I reason about a future where superintelligent AI exists, I'm reasoning about likely doom scenarios in a way that simultaneously raises the probability of all those scary assumptions in Robin's list.

Headroom Above Human Intelligence

Now we get into why my mainline future scenario is what it is.

A major load bearing piece of my position is how easy I think it will be for the right algorithm to blow way past human level intelligence. I see the human brain as a primitive implementation of a goal-optimizer algorithm. I'm pretty sure there's a much better goal optimizer algorithm it's possible to implement, and it's only a matter of time before it is implemented.

In Robin's worldview, he agrees there's plenty of "capacity" above the human level, but he's skeptical that a "goal optimizer algorithm" with "higher intelligence" is a key piece of that capacity. That's why I'm asking him here about headroom above human intelligence.

Liron: Do you agree that there's a lot of headroom above human intelligence? Look at the progress from ape to human IQ. Does that dial turn much farther if you want to call it like a thousand IQ, that kind of thing?
Robin: Well, I'm happy to say there's enormous headroom in capacity. What I hesitate is when you want to divide that capacity into IQ versus other things. Definitely our descendants will be vastly more capable than us in an enormous number of ways.
I'm just not sure when you say, "oh, because of intelligence", which part of that vast increase in capacity that we're trying to refer to in that sense.

In my worldview, a single human brain is powerful because it's the best implementation of a goal-optimizer algorithm in the known universe. When I look at how quickly human brains started getting bigger and evolutionary time after branching from other apes and reaching some critical threshold of general intelligence, that's hugely meaningful to me.

There's something called "encephalization quotient" which measures how unusually large and organisms brain is relative to its body mass and humans are the highest on that measure by far. I see this as a highly suggestive clue that something about what makes humans powerful can be traced to the phenotype of a single human brain.

Sure, humans are still dependent on their environment, including human society. So much of their brain function adapts around that. But the human brain reached a point where it's adapted to tackling any problem, even colonizing the moon as possible using our same human brain and body.

Furthermore, the human brain doesn't look like a finished product. While the human brain is off the charts big, it seems like the human brain would have grown even bigger by now if there weren't other massive biological constraints like duration of gestation in the womb, duration of infancy, and the size constraint of the mother's pelvis, which simultaneously has to be small enough for walking and big enough for childbirth.

I see the evolution of human brain size as a big clue that there's a steep gradient of intelligence near the human level; i.e. once we get the first human level AGI, I expect we'll see vastly superhuman AIs come soon after. Let's see what Robin thinks.

Liron: Is there a steep gradient of intelligence increase near the human level? It really seems like natural selection was saying yes, bigger head, bigger head. Oh crap, now the mother can't walk, we better slow our roll.
Robin: We have many kinds of capacities and none of them are obviously near fundamental limits. But I'm just, again, not sure which thing you mean by intelligence exactly, but I'm not sure that matters here.
Liron: Don't you think it was very important that the genetic modifications that happened to the human brain to separate it from the ape brain? Weren't those extremely high return? And doesn't it seem like there's a few more that can be done that are also extremely high return if only that head could fit through that pelvis?
Robin: I'm not sure that there are big gains in architectural restructuring of the human brain at a similar size, but I am sure that humans were enormously more able than other animals, again, primarily because of culture. So, we have been drastically increasing the capacity of humans through culture, which increases the capacity of each human so far, and we are nowhere near limits, certainly.
Liron: When you talk about culture though, you're holding the genetics of the human brain constant, but I'm pointing out that it seems like there was a steep gradient, a high return on investment of changes to the genes and changes to the phenotype.
Like if we just made the pelvis bigger so that a bigger head could fit through, doesn't that trend show that we could have a significantly smarter human the same way humans are smarter than apes?
Robin: I'm just not sure what you're talking about here, but I'm not sure it matters. We have so many ways to improve on humans.
One way might be to have just a bigger physical brain, but that just doesn't stand out as one of the most promising or important ways you could improve a human.
Liron: But I have a specific reason to think it's promising, which is that natural selection tried really hard to keep increasing the human brain until it got a, you know, the fact that so many human babies historically die in childbirth...
Robin: I mean, sure. So imagine a brain, you know, eight times as big, twice as big on each axis, a human brain could, would certainly have more capacity. I don't know how much more because it was just a bigger brain.

More on Culture vs. Intelligence

Robin doesn't see the human brain as having this special "intelligence power" compared to other ape brains. He just thinks the human brain is better at absorbing culture compared to apes. And maybe the human brain has picked up other specific skills that apes don't have. Robin doesn't see a single axis where you can compare humans versus apes as being smarter versus stupider.

Liron: The difference between human intelligence and ape intelligence appears to me very significant, right? Like, culture and science is not going to teach a present day ape to be a useful scientific contributor. Is that fair to say?
Robin: No, actually. That is, if, apes had had culture, then they could have done science. That is, it was culture that made us be able to do science.
Liron: Imagine that an ape is born today, just an actual modern ape, and you put on a VR headset on the ape. So the ape grows up with a perfect simulation of really enlightened ape culture. Could that ape then grow up to be a scientist among human scientists?
Robin: It's not enough to put a VR headset on an ape to make them capable of culture. That's not how it works. Culture is the capacity for humans or whoever to attend to the behavior of others, figure out the relevant things to copy, and then actually successfully copy them. That is what culture is, and that's the thing that we have that apes don't.

Hmm, have you ever heard the phrase "monkey see monkey do"? It seems like that ought to fit Robin's definition of monkeys having culture:

Robin: Culture is the capacity for humans or whoever to attend to the behavior of others, figure out the relevant things to copy, and then actually successfully copy them. That is what culture is, and that's the thing that we have that apes don't.

This is pretty surprising. Robin thinks apes just need to be able to copy one another better, and then they'd get exponential economic growth the way humans have had.

I don't get how apes that are really good at copying each other give you ape scientists. If you have an ape who can copy human physicists really well, can that ape invent the theory of relativity?

I don't get why Robin is so hesitant to invoke the concept of doing general cognitive work, having a certain degree of general intelligence, doing optimization-work on an arbitrary domain. There's obviously more to it than making apes better at copying.

Liron: Do you think that most of the delta of effectiveness in going from the ape brain to the human brain is increased capacity to absorb culture? Do you feel like that's like the key dimension?
Robin: It's not clear that a brain one quarter the size of a human brain couldn't have done it as well, but there were a number of particular features the brain had to have in order to be able to support culture. And that was the human superpower.
Once you had that first set of features that could support culture, then we could use culture to collect lots and lots more features that enabled us to take off compared to the other animals.
Liron: Okay. It seems like I'm pointing out a mystery that I'm not sure if you agree with, like the mystery of why natural selection cared so much to make the human head and brain as big as possible. Like, do you agree that there's something that calls out for explanation?
Robin: There's definitely like, brains were valuable. So clearly at that point in evolution, clearly evolution was going, can I get a bigger brain here? That looks like a good deal. Let's figure out how I can get a bigger brain. I reached the limitations. I just can't get a bigger brain, but really plausibly bigger brains would be valuable.
Liron: It sounds like you're saying, okay, you got culture. You don't need that much bigger of a brain for culture. We've got culture nailed down. So why do you think the brain kept trying to get bigger?
Robin: A standard social brain hypothesis is that we had a complicated social world and there were large returns to more clever analysis of our social strategic situations. And that doesn't seem terribly wrong, but brains also probably let us use culture and, take advantage of it more. So there's just lots of ways brains are good.

So this incredibly valuable thing Robin thinks big brains do is "clever analysis of our social strategic situations". Clever analysis?

How about using clever analysis to design a better tool, instead of just copying a tool you've already seen? This clever analysis power that you think potentially 3/4 of the human brain is for, why isn't that the key explanatory factor in human success? Why do you only want to say that culture is the key as in the capacity to copy others well?

The Goal-Completeness Dimension

A major crux of disagreement for Robin is whether my concept of general intelligence is a key concept with lots of predictive power, and whether we can expect big rapid consequences from dialing up the intelligence level in our own lifetimes. It's interesting to see where Robin objects to my explanation of why we should expect rapid intelligence increases going far beyond the human level.

Liron: Let me introduce a term I coined called "goal-completeness [LW · GW]".
Goal-completeness just means that you have an AI that can accept as input any goal and then go reach that goal. And it's not like that whole architecture of the AI is only for playing chess or only for driving a car. It can just accept arbitrary end states in the physical universe.
Robin: Assume we've got something with goal generality. Then what?
Liron: Now that we've identified a dimension, right, like effectiveness of a goal-complete AI, is it possible that the AI will just get much better on this dimension? The same way that just a few genetic tweaks, just a few scalings in size, I believe made humans much better than apes at this goal-completeness optimization problem.
Robin: I would say eventually our descendants will be better at identifying, understanding, and completing goals. Sure.
Liron: When we identify some dimension that humans and animals have and then we try to build technology to surpass humans and animals on that dimension, we tend to succeed quite rapidly and by quite a lot, right?
Transportation, for example. If you just identify, hey, how do you get somewhere as fast as possible and transport as much weight as possible? At this point, we're far beyond anything biology ever designed, right? Is that a fair robust trend?
Robin: I'll continue to say, yes, our descendants will be vastly better on us in a great many capacities, but...
Liron: But it's not just our descendants. If you look at the actual rate of this kind of stuff, it seems like the progress happens pretty fast. If you got to continue the trend of how the brain of humans differs from the brain of apes, by letting it grow bigger, which evolution didn't have that opportunity, if you just continue that trend, you might have something much smarter.
The only thing that improves is the mapping between a goal you can give it and the quality of the action plan, the effectiveness of the action plan to achieve that goal. The optimization power.
Robin: Why isn't that just a summary of our overall capacities?
Liron: I'm trying to convince you that there's a dimension, the optimization power dimension. And I'm trying to show you arguments why I think that dimension could be scaled very, very rapidly. Within, let's say a year, something like that. Much less than a 20 year economic doubling time. Are you buying that at all?
Robin: You're trying somehow to factor this thing out from the general capacity of the world economy.
Liron: Yeah, exactly, because I think it naturally does factor out the same way you can. It's like transportation, right? Optimizing the speed of transportation. Doesn't that factor out from the world economy?
Robin: Well, you can identify the speed of transportation, but just making transportation twice as good doesn't make the world economy twice as good. So, you're trying to argue that there's this factor that has a self improvement element to it, the way transportation does it, right? If you make transportation factor, that doesn't make it that much easier to make transportation easier.
Liron: Let's not even talk about self-improvement. If evolution got to make the brain twice as big, that's not even self-improvement. That's just a path of tweaking genes and getting a little bit of consequentialist feedback, I guess. But I mean, it's almost like copying and pasting the algorithm to generate brain regions might've worked.
So what worries me is, I see a lot of headroom above intelligence, and I'm expecting that within a few years — maybe 20 years, if we're lucky, or even 30, but very plausibly 3-10 years — within a few years, we're just going to have these machines that play the real world as a strategy game, the same way Stockfish plays chess. They just tell you what to do to get an outcome, or they do it themselves. And at that point, no matter what happens, no matter what regulation you do, no matter how companies cooperate, no matter whether it's multipolar, whatever it is, you now just have these agents of chaos that are unstoppable if they try. That's roughly what I'm expecting.
Robin: I have doubts about that, but I still haven't seen you connect to that to the foom scenario.

Well, I tried, but I couldn't convince Robin that we're about to rapidly increase machine intelligence beyond the human level. He didn't buy my argument from the recent history of human brain evolution, or from looking at how quickly human technological progress surpasses nature on various dimensions. Robin knows it didn't take us that long to get to the point where an AI pilot can fly an F-16 fighter plane and dogfight better than a human pilot. But he doesn't expect something like general intelligence or optimization power to get cranked up the way so many specific skills have been getting cranked up.

AI Doom Scenario

We moved on to talk about what a foom scenario looks like, and why I think it happens locally instead of pulling along the whole world economy.

Robin: How do agents of chaos make foom?

Liron: Imagine that GPT-10 says, Oh, you want to make more money for your business? Great, here's a script. You should run the script. And then they run the script. But of course, it has all these other ideas about what it should do, and now it's unstoppable. And it wasn't what you truly meant. That's the misalignment scenario.
That's not "Can it end the world?", it's "Will it end the world?" when we're talking about alignment.

When we were having a discussion about alignment, which generally pre assumes strong AI capabilities, Robin didn't want to run with the premise that you have a single AI which is incredibly powerful and can go rogue and outmaneuver humanity, and that's the thing you have to align. So he kept trying to compare it to humans giving advice to other humans, which I don't even think is comparable.

Robin: Why is it such a problem if we have not entirely trustworthy, but pretty good advisors?
Liron: So the problem is that these advisors output these point-and-shoot plans, that you can just press enter to execute the plan. And the plan is just so consequential. It, you know, it's, it's this runaway plan and it makes sub agents and it has huge impacts and you just, you basically just have to decide, am I going forward with this or not? And you get disempowered in the process.
Robin: Businesses already have many consultants they could hire each of who could give them advice on various things, some of which advice could go wrong. Why will that whole process suddenly go wrong when AIs are the advisors?
Liron: If you're already accepting the premise that they're that powerful, you really just need one person to run an agent that's not great for whatever reason. And then you're screwed. Even if it just pulls forward a few decades of progress into a few months, that's already a very chaotic environment.
Robin: But all these other people with their AIs could be pushing similar buttons, creating their increasing capacities, but they are, you know, rivalrous capacities.

So now we get to Robin's argument that it'll probably be fine, as long as everyone is getting an increasingly powerful AI at the same time.

In my view, there's going to be some short period of time, say a year, when suddenly the latest AI's are all vastly smarter than humans. We're going to see that happen in our lifetimes and be stuck in a world where our human brains no longer have a vote in the future, unless the AI still want to give us a vote.

In Robin's view, it's just going to be teams of humans and AI is working together to have increasingly complicated strategic battles, but somehow no terrifying scenario where a rogue AI permanently disempowers humanity.

Liron: I agree this crazy situation can happen, but I don't see how that works out with humanity still maintaining a reasonable standard of living.
Robin: I'm really failing to understand what you're worried about here. You're just imagining a world where there's lots of advisors and consultants available and that that goes wrong because one of the things these advisors could do is advise you to push a button that then increases the capacity of something you control, which most people would typically do now, and probably do then.
And that sounds good because we just have all these increases in capacity of all these things in the world. Is this your main point? That a world of people listening to AI advisors could go wrong because they could get bad advice?
Liron: I think that's not a good characterization because "get bad advice" seems like there's going to be a step where the human slowly takes the advice, but more realistically, it's more like the human...
Robin: Authorizes bad projects under the advice of an AI, say.
Liron: Sure, but the authorization step is trivial, right? It's just somebody pressing enter, essentially. So the issue is what the AI does when it has all this capacity to decide what to do and then do it.
Robin: So AI agents who need to get approval from humans for budget and powers of authorization. Sometimes they get powers and authorization that maybe would not be good advice, but some humans are stupid. But then what? So, some projects go wrong. But there's a world of all the other people with their AIs protecting themselves against that, right?
Liron: If the scenario is that the AI gets it into this head of like, okay, I need to go build myself the most energy to support my project. So I'm just going to beg, borrow or steal. I'm just going to do whatever it takes to get energy. And if an AI is so much smarter than humanity, you can imagine simultaneously manipulating everybody on the internet, right? So suddenly it's like winning over humans to its cause, right? It's, it's doing massive effects.
Robin: Well if there was only one of them. But if there's billions of them at similar levels...

Corporations as Superintelligences

Robin makes a common claim of AI non-doomers:

Today's corporations are already superintelligences, yet humans managed to benefit from coexisting with corporations. Shouldn't that make us optimistic about coexisting with superintelligent AIs?

Robin: It's important to notice that we live in a world of corporations which, in an important sense, are superintelligent. That is, compared to any one of us, corporations can run circles around us in analysis of marketing, product design and all sorts of things.
So each of us, as an ordinary human, are subject to all these potential super intelligences trying to sell us products and services or hire us for jobs. And how is it that we can at all survive in this world of these very powerful super intelligences trying to trick us?
And they do, they do try to trick us. They try to trick us into bad jobs, buying bad products, et cetera, all the time. And they way outclass us. That is whenever a corporation sitting there thinking about how to make a commercial, how to design a product to trick us into wanting it. They're just so much better than we are when we think and reason about gee do I want this product, right?

Of course, the key difference is that corporations are only mildly superintelligent. If they tried to overthrow the government, they'd still be bottlenecked by the number of humans on their team and by the optimization power of the brains of their human employees. Still, Robin argues that competition can keep superintelligent AIs in check.

Robin: Competition is a great discipline of not only corporations but also governments. Democratic competition lets us try to judge among different people will try to run the government. Commercial competition lets us judge among different people who we might buy from or be employed by.
The thing that makes our world work today is competition among superintelligences. Why can't AIs function similarly?
Liron: In our world today, the reason why a corporation doesn't run away and start destroying the world in order to get another dollar out of you, one reason is just because they know that the combination of other actors are then going to push back.
My scenario is that that balance is going to be disturbed. We're going to have agents that they think that they can go do something which they know that humans would love to push back on, but they also know that it's too late and the humans can't push back on it.
Robin: Are there such things in our world, or if not, why does this world have such things and our world now does not?
Liron: The key difference is just that you're going to have an agent that has more of this optimization power, right? Like bringing a band of modern humans into like an ancient tribe or an ape tribe.
Robin: But these superintelligent corporations already have vastly more optimization power than we do.

There's a difference between mildly superintelligent and very superintelligent. It's a big difference. When I talk about a superintelligent AI, I'm talking about something that can copy itself a billion times and each copy is better than the best human at everything. Much better.

Einstein was an impressive human physicist because he came up with Special Relativity and General Relativity in the same decade. I'm expecting superintelligent AI to be able to spit out the next century's worth of human theoretical physics, the Grand Unified Theory of Everything, the moment it's turned on. We're not dealing with Walmart here.

Monitoring for Signs of Superintelligence

Next, we come back to the question of how fast a single AI will increase its capabilities, and how humans can monitor for that, the same way tigers would've liked to monitor for a new kind of threatening species.

Robin: We should continue to monitor for ways that could go wrong with the AIs, but I don't see a better solution than to wait until they're here and start to look for such things. I think it'll be very hard to anticipate them far in advance, to be able to guess how they could do things wrong and try to prevent them.
For most technologies, the time to deal with their problems is when they are realized in a concrete enough form that you can see concretely what sort of things go wrong and track statistics about them and do tests about various scenarios. And that's how you keep technology in check, is through testing actual concrete versions.
The problem at the moment is AI is just too far away from us to do that. We have this abstract conception of what it might eventually become, but we can't use that abstract conception to do very much now about the problems that might arise. We'll need to wait until they are realized more.

Again, I don't know how this kind of thinking lets tiger species survive the human foom in evolutionary time, because by the time they concretely observe the Industrial Revolution, it's way too late for their genes to adapt.

Robin's position is that if I'm right, and superhuman machine intelligence is a much bigger threat to humanity than he thinks, we still shouldn't hope to stop it in advance of seeing it be smarter than it is today. I think he's making a very optimistic assumption about how much time we'll have at that point. He's banking on the hope that there won't be a rapid intelligence increase, or that a rapid intelligence increase is an incoherent concept.

Will AI Society's Laws Protect Humans?

Robin thinks humans can hope to participate in AI's society, even if the AIs are much smarter than we are.

Robin: Mostly when we travel around the world, the thing we check for in whether we'll be safe somewhere isn't some poll of the people and their values and how aligned those values are with our values. That's that's not how we do it, right? We have systems of law and we check for statistics about how law often is violated. And then we think I might be at risk if law is violated here a lot, i.e. there's a high crime rate.
And so, to the extent that you can assure that the kinds of laws there are the kinds that would punish people for hurting you, let you sue for damages, then low crime damage rates would be enough to convince you that it looks okay there.
Liron: So what about a scenario like my grandpa's family, where they were Jews in Poland? It seems like the idea that the society is going to respect the rule of law for them sometimes breaks.
Robin: Sure, there are sudden changes in law that you might have to look out for.
Liron: How do we characterize when a weaker demographic tends to get cut out?
Robin: If you're planning to visit somewhere for a week as a travel, that's less of an issue. If you're planning on moving there from retirement, you'll need to make longer term estimates about the rule of law.
Liron: That's true, that a holocaust is only going to be a longer term effect, so you can always kind of dip in and out to places that have the rule of law for a long time.
Robin: Even in 1935, it might've been okay to just visit Germany for a week on a trip.
Liron: Fair enough. But you really think we're going to be able to have money that the AIs accept, and have property that they respect, that's your mainline scenario?
Robin: It's true for ems as well as AI. I think one of the biggest risks is that because we think of them as a separate category, we end up creating separate systems for them, separate legal systems, separate financial systems, separate political systems, and in a separate systems, then they are less tied to us. And then when they get more powerful than us, then they may feel less about disrupting us.
I think the more we are sharing systems of governance, finance, employment, relationships, et cetera, then the more plausible it is that they would feel ties to us and not try to kill us all. So, I don't want these separate systems. I want shared mixed up systems. That's just a general best strategy through all of human history for limiting harm between groups of people.

If you're a Jew in Poland in 1940, if you don't have somewhere else to escape to, you're not going to be saved by the rule of law anywhere. I should have clarified that Poland was supposed to represent the whole world of AI society, not just a place you can decide whether or not to visit.

If you're in a weak faction and there's a stronger faction than you, it's up to them whether they want to cut you out of their legal system, their justice system, their economy, and so on. In human society, there are occasionally faction-on-faction civil wars, but it's nothing like an AI vs. humanity scenario where one faction (AI) is vastly overpowered compared to all the other factions combined.

Robin is generally great at thinking about human society, but he's just not accepting the premise there's going to be a vastly higher intelligence than humanity, and it's not useful for that intelligence to reason about the optimization-work it's doing by invoking the concept of being in a society with you and me.

I guess it was pointless to even bring up the rule of law as a topic in this debate, when the only crux between Robin and me is whether there'll be a huge-intelligence-gap scenario in the first place.

Feasibility of ASI Alignment

Lastly, we talk about the feasibility of aligning superintelligent AI.

Liron: OpenAI is admitting that RLHF is not going to cut it when the AI becomes superintelligent. Anthropic is admitting that. Does it concern you at all that they're admitting that alignment is an open problem and RLHF doesn't scale?
Robin: Not especially. That's the easy thing to predict. Yes, of course, all of these firms are going to admit any sort of criticism. They are eager to show that they are concerned. I mean, in the modern world, almost no firm with a product that some people think is risky wants to pretend oh, there's nothing to worry about there. That's just not a winning PR stance for any company in our modern world.
So yes, of course, they're going to admit to whatever problems other people have, as long as that's not a current problem with their current product to worry about, it's some hypothetical future version of the product, they're happy to say, yeah, you know, that'll be something we'll be watching out for and worrying about.

Ok, you can argue why we shouldn't take the AI labs' words on this topic as evidence, but it's pretty obvious why RLHF really won't scale to superintelligence.

The feedback loop of whether a chatbot has a good answer doesn't scale when the topic at hand is something that AI is much better at than you'll ever be. Or when the AI shows you a giant piece of code with a huge manual explaining it and asks you if you want to give it a thumbs up or a thumbs down. That's not going to cut it as a strong enough feedback loop for super-intelligent alignment.

If I pressed Robin to give me a more substantive answer about RLHF, I think he would've just said: "It doesn't matter if it's not a perfect technique. We'll just augment it. Or we'll find some sort of iterative process to make each version of AI be adequate for our needs." That's what Robin would probably claim, even though the safety teams at the AI labs are raising the alarm that superalignment is an important unsolved problem.

But I think Robin would acknowledge that many of his views are outside the current mainstream. Like, he doesn't mind predicting that AGI might still be a century away when most other experts are predicting 5-20 years. So, again, it comes down to the crux where Robin just doesn't think there's going to be a huge intelligence gap between AIs and humans at any point in time. So he's just not on the same page that ASI alignment is a huge open problem.

Robin's Warning Shot

Liron: What is the event or threshold or warning shot that would make you concerned about rapid human extinction from AI?
Robin: The main thing I'm going to be tracking is automation of jobs. So if I start to see a substantial uptick in the rate at which job tasks automated happens, that will correlate with, of course, with the substantial tick up in the amount of revenue going to the company supplying that automation and the supporting infrastructure that they pay for, then that would show a deviation from trend.
And then I want to look more closely to see how fast that's accelerating and in what areas that's accelerating, and that would be the places we should all watch more carefully. Watch carefully wherever things are deviating from trend.
But again, I still think there won't be much we can do at the abstract level to constrain AI to prevent problems. We'll mostly have to wait until concrete systems have concrete behaviors and monitor those and test them. And the more capable systems get, the more we should do that.

Well, I hope we get to see 10x more jobs being done by AI while still leaving an extra decade or two of time before AI is truly superintelligent and there's no taking back power from the AIs. I think that's a reckless approach to hope things play out that way, though.

Robin doesn't see it as reckless because he doesn't see intelligence as a single trait that can suddenly get vastly higher within a single mind, so he doesn't imagine any particular AI doom scenario being particularly plausible in the first place.

The Cruxes

In my opinion, this debate successfully identified the main cruxes of disagreement between our two views:

Crux #1: Can a localized mind be a vastly superhuman optimizer?

I think intelligence, i.e. optimization power, is a single dimension that can be rapidly increased far beyond human level, all within a single mind. Robin has a different model where there's no single localizable engine of optimization power, but capabilities come from a global culture of accumulating and diffusing innovations.

Crux #2: Can we rely on future economic data to be a warning shot?

Robin thinks we can look at data from trends, such as job replacement, to predict if and when we should be worried about doom. I think it'll be too late by the time we see such data, unless you count the kinds of early data that we're seeing right now.

How to Present the AI Doom Argument

Lastly, we reflect on how I presented my side of the argument.

Robin: It would have been good ahead of time if you had presented the scenario you're most worried about. Maybe I could have read about that ahead of time, thought about it, and then we could have dived into the particular scenario because I'm still not very clear about it. It isn't the same scenario as I thought you might be focused on.
There's a value in general in all these conversations where somebody summarizes their point of view as concisely as they can and thoughtfully, and then the other party reads it.
Maybe you should write a little essay with the scenario you're most worried about, lay it out as clearly as you can. And maybe that would be a way to refine your thinking and then be easier for other people to talk to.

Eliezer has pointed out a few times that from the doomers' point of view, doomers are just taking the simple default position, and all we can hope to do is respond with counterarguments tailored to a particular non-doomer's objections, or else write up a giant fractal of counter-arguments.

The giant fractal write-up has been done; it's called AISafety.info. Check it out.

The simple default position is what I said to Robin is my opening statement: we're close to building superintelligent AI, but we're not close to understanding how to make it aligned or controllable, and that doesn't bode well for our species.

Robin's particular objection turned out to be that intelligence isn't a thing that can run out of control, in his view. And that mainstream talk of a rapid path to superintelligence is wrong in his view. I think our debate did a solid job hitting on those particular objections. I'm not sure if explaining my view further would have helped, but I'll keep thinking about it.

And I'm still open to everyone's feedback about how to improve my approach to these doom debates. I love reading your comments, critique of my debate style, recommendations for how to do better, suggestions for who to invite, intros, and any other engagement you have to offer.

About Doom Debates

My podcast, Doom Debates, hosts high-quality debates between people who don't see eye-to-eye on the urgent issue of AI extinction risk.

All kinds of guests are welcome, from luminaries to curious randos. If you're interested to be part of an episode, DM me here or contact me via Twitter or email.

If you're interested in the content, please subscribe and share it to help grow its reach.

7 comments

Comments sorted by top scores.

comment by Chris_Leong · 2024-07-13T12:57:14.472Z · LW(p) · GW(p)

Cool to see you doing this!

comment by nsokolsky (nikita-sokolsky) · 2024-07-15T00:20:50.003Z · LW(p) · GW(p)

I've watched the debate and read your analysis. The Youtube channel is great, doubly so given that you're just starting out and it will only get better from here.

Do you imagine there could be someone out there who could possibly persuade you to lower your P(doom)? In other words, do you think there could be a collection of arguments that are so convincing and powerful taken together that you'll change your mind significantly about the risks of AGI, at least when it comes to this century?

Replies from: Liron

↑ comment by Liron · 2024-07-15T00:37:22.179Z · LW(p) · GW(p)

Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.

I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.

Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.

Replies from: Writer

↑ comment by Writer · 2024-07-17T17:21:42.628Z · LW(p) · GW(p)

I think it would be very interesting to see you and @TurnTrout [LW · GW] debate with the same depth, preparation, and clarity that you brought to the debate with Robin Hanson.

Edit: Also, tentatively, @Rohin Shah [LW · GW] because I find this point [? · GW] he's written about quite cruxy.

Replies from: Liron

↑ comment by Liron · 2024-07-19T00:54:13.886Z · LW(p) · GW(p)

I'm happy to have that kind of debate.

My position is "goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it's somewhat beyond human-level in the near future".

The form of those arguments seems to be like "technically it doesn't have to be". But realistically it will be lol. Not sure how much more there will be to say.

comment by Daniel V · 2024-07-13T12:34:02.673Z · LW(p) · GW(p)

To get Robin worried about AI doom, I'd need to convince him that there's a different metric he needs to be tracking

That, or explain the factors/why the Robin should update his timeline for AI/computer automation taking "most" of the jobs.

AI Doom Scenario

Robin's take here strikes me both as an uncooperative thought-experiment participant and as a decently considered position. It's like he hasn't actually skimmed the top doom scenarios discussed in this space (and that's coming from me...someone who has probably thought less about this space than Robin) (also see his equating corporations with superintelligence - he's not keyed into the doomer use of the term and not paying attention to the range of values it could take).

On the other hand, I find there is some affinity with my skepticism of AI doom, with my vibe being it's in the notion that authorization lines will be important.

On the other other hand, once the authorization bailey is under siege by the superhuman intelligence aspect of the scenario, Robin retreats to the motte that there will be billions of AIs and (I guess unlike humans?) they can't coordinate. Sure, corporations haven't taken over the government and there isn't one world government, but in many cases, tens of millions of people coordinate to form a polity, so why would we assume all AI agents will counteract each other?

It was definitely a fun section and I appreciate Robin making these points, but I'm finding myself about as unassuaged by Robin's thoughts here as I am by my own.

Robin: We have this abstract conception of what it might eventually become, but we can't use that abstract conception to do very much now about the problems that might arise. We'll need to wait until they are realized more.

When talking about doom, I think a pretty natural comparison is nuclear weapon development. And I believe that analogy highlights how much more right Robin is here than doomers might give him credit for. Obviously a lot of abstract thinking and scenario consideration went into developing the atomic bomb, but also a lot of safeguards were developed as they built prototypes and encountered snags. If Robin is so correct that no prototype or abstraction will allow us address safety concerns, so we need to be dealing with the real thing to understand it, then I think a biosafety analogy still helps his point. If you're dealing with GPT-10 before public release, train it, give it no authorization lines, and train people (plural) studying it to not follow its directions. In line with Robin's competition views, use GPT-9 agents to help out on assessments if need be. But again, Robin's perspective here falls flat and is of little assurance if it just devolves into "let it into the wild, then deal with it."

A great debate and post, thanks!

Replies from: Liron

↑ comment by Liron · 2024-07-13T14:26:11.821Z · LW(p) · GW(p)

Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).

Robin Hanson AI X-Risk Debate — Highlights and Analysis

Contents

Introduction

Robin's AI Timelines

Culture vs. Intelligence

Innovation Accumulation vs. Intelligence

Optimization = Collecting + Spreading?

Worldwide Growth

Extrapolating Robust Trends

Seeing Optimization-Work

Exponential Growth With Respect To...

Can Robin's Methodology Notice Foom In Time?

Foom Argument As Conjunction

Headroom Above Human Intelligence

More on Culture vs. Intelligence

The Goal-Completeness Dimension

AI Doom Scenario

Corporations as Superintelligences

Monitoring for Signs of Superintelligence

Will AI Society's Laws Protect Humans?

Feasibility of ASI Alignment

Robin's Warning Shot

The Cruxes

How to Present the AI Doom Argument

About Doom Debates

7 comments