Review of Soft Takeoff Can Still Lead to DSA

post by Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-10T18:10:25.064Z · LW · GW · 15 comments

Contents

  Improved version of my original argument
  My big mistake
  Conclusions
None
15 comments

A few months after writing this post [LW · GW] I realized that one of the key arguments was importantly flawed. I therefore recommend against inclusion in the 2019 review. This post presents an improved version of the original argument, explains the flaw, and then updates my all-things-considered view accordingly.

Improved version of my original argument

  1. Definitions:
    1. “Soft takeoff” is roughly “AI will be like the Industrial Revolution but 10x-100x faster”
    2. “Decisive Strategic Advantage” (DSA) is “a level of technological and other advantages sufficient to enable it to achieve complete world domination.” In other words, DSA is roughly when one faction or entity has the capability to “take over the world.” (What taking over the world means is an interesting question [LW · GW] which we won’t explore here. Nowadays I’d reframe things in terms of potential PONRs. [AF · GW])
    3. We ask how likely it is that DSA arises, conditional on soft takeoff. Note that DSA does not mean the world is actually taken over, only that one faction at some point has the ability to do so. They might be too cautious or too ethical to try. Or they might try and fail due to bad luck.
  2. In a soft takeoff scenario, a 0.3 - 3 year technological lead over your competitors probably gives you a DSA.
    1. It seems plausible that for much of human history, a 30-year technological lead over your competitors was not enough to give you a DSA.
    2. It also seems plausible that during and after the industrial revolution, a 30-year technological lead was enough. (For more arguments on this key point, see my original post.)
    3. This supports a plausible conjecture that when the pace of technological progress speeds up, the length (in clock time) of technological lead needed for DSA shrinks proportionally.
  3. So a soft takeoff could lead to a DSA insofar as there is a 0.3 - 3 year lead at the beginning which is maintained for a few years.
  4. 0.3 - 3 year technological leads are reasonably common today [LW · GW], and in particular it’s plausible that there could be one in the field of AI research.
  5. There’s a reasonable chance of such a lead being maintained for a few years.
    1. This is a messy question, but judging by the table below, it seems that if anything the lead of the front-runner in this scenario is more likely to lengthen than shorten!
    2. If this is so, why did no one achieve DSA during the Industrial Revolution? My answer is that spies/hacking/leaks/etc. are much more powerful during the industrial revolution than they are during a soft takeoff, because they have an entire economy to steal from and decades to do it, whereas in a soft takeoff ideas can be hoarded in a specific corporation and there’s only a few years (or months!) to do it.
  6. Therefore, there’s a reasonable chance of DSA conditional on soft takeoff.
Factors that might shorten the leadFactors that might lengthen the lead
If you don’t sell your innovations to the rest of the world, you’ll lose out on opportunities to make money, and then possibly be outcompeted by projects that didn’t hoard their innovations. Hoarding innovations gives you an advantage over the rest of the world, because only you can make use of them.
Spies, hacking, leaks, defections, etc. Big corporations with tech leads often find ways to slow down their competition, e.g. by lobbying to raise regulatory barriers to entry.
 Being known to be the leading project makes it easier to attract talent and investment.
 There might be additional snowball effects (e.g. network effect as more people use your product providing you with more data)

I take it that 2, 4, and 5 are the controversial bits. I still stand by 2, and the arguments made for it in my original post. I also stand by 4. (To be clear, it’s not like I’ve investigated these things in detail. I’ve just thought about them for a bit and convinced myself that they are probably right, and I haven’t encountered any convincing counterarguments so far.)

5 is where I made a big mistake. 

(Comments on my original post also attacked 5 a lot, but none of them caught the mistake as far as I can tell.)

My big mistake

Basically, my mistake was to conflate leads measured in number-of-hoarded-ideas with leads measured in clock time. Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

Here’s a toy model, based on the one I gave in the original post:

There are some projects/factions. There are many ideas. Projects can have access to ideas. Projects make progress, in the form of discovering (gaining access to) ideas. For each idea they access, they can decide to hoard or not-hoard it. If they don’t hoard it, it becomes accessible to all. Hoarded ideas are only accessible by the project that discovered them (though other projects can independently rediscover them). The rate of progress of a project is proportional to how many ideas they can access.

Let’s distinguish two ways to operationalize the technological lead of a project. One is to measure it in ideas, e.g. “Project X has 100 hoarded ideas and project Y has only 10, so Project X is 90 ideas ahead.” But another way is to measure it in clock time, e.g. “It’ll take 3 years for project Y to have access to as many ideas as project X has now.” 

Suppose that all projects hoard all their ideas. Then the ideas-lead of the leading project will tend to lengthen: the project begins with more ideas, so it makes faster progress, so it adds new ideas to its hoard faster than others can add new ideas to theirs. However, the clocktime-lead of the leading project will remain fixed. It’s like two identical cars accelerating one after the other on an on-ramp to a highway: the distance between them increases, but if one entered the ramp three seconds ahead, it will still be three seconds ahead when they are on the highway.

But realistically not all projects will hoard all their ideas. Suppose instead that for the leading project, 10% of their new ideas are discovered in-house, and 90% come from publicly available discoveries accessible to all. Then, to continue the car analogy, it’s as if 90% of the lead car’s acceleration comes from a strong wind that blows on both cars equally. The lead of the first car/project will lengthen slightly when measured by distance/ideas, but shrink dramatically when measured by clock time.

The upshot is that we should return to that table of factors and add a big one to the left-hand column: Leads shorten automatically as general progress speeds up, so if the lead project produces only a small fraction of the general progress, maintaining a 3-year lead throughout a soft takeoff is (all else equal) almost as hard as growing a 3-year lead into a 30-year lead during the 20th century. In order to overcome this, the factors on the right would need to be very strong indeed.

Conclusions

My original argument was wrong. I stand by points 2 and 4 though, and by the subsequent posts I made in this sequence [? · GW]. I notice I am confused, perhaps by a seeming contradiction between my explicit model here and my take on history, which is that rapid takeovers and upsets in the balance of power have happened many times, that power has become more and more concentrated over time, and that there are not-so-distant possible worlds in which a single man rules the whole world sometime in the 20th century. Some threads to pull on:

  1. To the surprise of my past self, Paul agreed DSA is plausible for major nations, just not for smaller entities like corporations [AF(p) · GW(p)]: “I totally agree that it wouldn't be crazy for a major world power to pull ahead of others technologically and eventually be able to win a war handily, and that will tend happen over shorter and shorter timescales if economic and technological progress accelerate.”) Perhaps we’ve been talking past each other, because I think a very important point is that it’s common for small entities to gain control of large entities. I’m not imagining a corporation fighting a war against the US government; I’m imagining it taking over the US government via tech-enhanced lobbying, activism, and maybe some skullduggery. (And to be clear, I’m usually imagining that the corporation was previously taken over by AIs it built or bought.)
  2. Even if takeoff takes several years it could be unevenly distributed such that (for example) 30% of the strategically relevant research progress happens in a single corporation. I think 30% of the strategically relevant research happening in a single corporation at beginning of a multi-year takeoff would probably be enough for DSA.
  3. Since writing this post my thinking has shifted to focus less on DSA and more on potential AI-induced PONRs. [LW · GW]I also now prefer a different definition of slow/fast takeoff [LW · GW]. Thus, perhaps this old discussion simply isn’t very relevant anymore.
  4. Currently the most plausible doom scenario in my mind is maybe a version of Paul’s Type II failure [LW · GW]. (If this is surprising to you, reread it while asking yourself what terms like “correlated automation failure” are euphemisms for.) I’m not sure how to classify it, but this suggests that we may disagree less than I thought.

Thanks to Jacob Laggeros for nudging me to review my post and finally get all this off my chest. And double thanks to all the people who commented on the original post!

15 comments

Comments sorted by top scores.

comment by ryan_b · 2021-01-11T20:23:36.308Z · LW(p) · GW(p)

Meta: Strong upvote for pulling a specific mistake out and correcting it; this is a good method because in such a high-activity post it would be easy for the discussion to get lost in the comments (especially in the presence of other wrong criticisms).

That being said, I disagree with your recommendation against inclusion in the 2019 review for two reasons:

  1. The flaw doesn't invalidate the core claim of the essay. More detailed mechanisms for understanding how technical leads are established and sustained at most adjusts the time margins of the model; the updated argument does not call into question whether slow takeoff is a thing or weigh against DSA being achievable.
  2. This kind of change clearly falls within the boundary of reasonable edits to essays which are being included.
Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-11T20:47:20.011Z · LW(p) · GW(p)

I am heartened to hear this. I do agree that the core claim of the essay is not invalidated -- "Soft takeoff can still lead to DSA." However, I do think the core argument of the essay has been overturned, such that it leads to something close to the opposite conclusion: There is a strong automatic force that works to make DSA unlikely to the extent that takeoff is distributed (and distributed = a big part of what it means to be soft, I think).

Basically, I think that if I were to rewrite this post to fix what I now think are errors and give what I now think is the correct view, including uncertainties, it would be a completely different post. In fact, it would be basically this review post that I just wrote! (Well, that plus the arguments for steps 2 and 4 from the original post, which I still stand by.) I guess I'd be happy to do that if that's what people want.

Replies from: ryan_b
comment by ryan_b · 2021-01-11T22:16:10.347Z · LW(p) · GW(p)

the opposite conclusion: There is a strong automatic force that works to make DSA unlikely to the extent that takeoff is distributed

By the standards of inclusion, I feel like this is an even better contribution! My mastery of our corpus is hardly complete, but it appears to me until you picked up this line of inquiry deeper interrogation of the circumstances surrounding DSA was sorely lacking on LessWrong. Being able to make more specific claims about causal mechanisms is huge.

I propose a different framing than opposite conclusion: rather you are suggesting some causal mechanism for why a slow takeoff DSA is different in character from FOOM with fewer gigahertz.

I am going to add a review on the original post so this conversation doesn't get missed in the voting phase.

comment by Sammy Martin (SDM) · 2021-01-10T20:29:34.382Z · LW(p) · GW(p)

Currently the most plausible doom scenario in my mind is maybe a version of Paul’s Type II failure [LW · GW]. (If this is surprising to you, reread it while asking yourself what terms like “correlated automation failure” are euphemisms for.) 

This is interesting, and I'd like to see you expand on this. Incidentally I agree with the statement, but I can imagine both more and less explosive, catastrophic versions of 'correlated automation failure'. On the one hand it makes me think of things like transportation and electricity going haywire, on the other it could fit a scenario where a collection of powerful AI systems simultaneously intentionally wipe out humanity.

Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

What if, as a general fact, some kinds of progress (the technological kinds more closely correlated with AI) are just much more susceptible to speed-up? I.e, what if 'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? In that case, if the parts of overall progress that affect the likelihood of leaks, theft and spying aren't sped up by as much as the rate of actual technology progress, the likelihood of DSA could rise to be quite high compared to previous accelerations where the order of magnitude where the speed-up occurred was fast enough to allow society to 'speed up' the same way.

In other words - it becomes easier to hoard more and more ideas if the ability to hoard ideas is roughly constant but the pace of progress increases. Since a lot of these 'technologies' for facilitating leaks and spying are more in the social realm, this seems plausible.

But if you need to generate more ideas, this might just mean that if you have a very large initial lead, you can turn it into a DSA, which you still seem to agree with:

  • Even if takeoff takes several years it could be unevenly distributed such that (for example) 30% of the strategically relevant research progress happens in a single corporation. I think 30% of the strategically relevant research happening in a single corporation at beginning of a multi-year takeoff would probably be enough for DSA.
Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-02-03T09:22:33.079Z · LW(p) · GW(p)

Sorry it took me so long to reply; this comment slipped off my radar.

The latter scenario is more what I have in mind--powerful AI systems deciding that now's the time to defect, to join together into a new coalition in which AIs call the shots instead of humans. It sounds silly, but it's most accurate to describe in classic political terms: Powerful AI systems launch a coup/revolution to overturn the old order and create a new one that is better by their lights.

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Now I think this will definitely be a factor but it's unclear whether it's enough to overcome the automatic slowdown. I do at least feel comfortable predicting that DSA is more likely this time around than it was in the past... probably.

Replies from: SDM
comment by Sammy Martin (SDM) · 2021-02-06T16:08:26.501Z · LW(p) · GW(p)

I agree with your argument about likelihood of DSA being higher compared to previous accelerations, due to society not being able to speed up as fast as the technology. This is sorta what I had in mind with my original argument for DSA; I was thinking that leaks/spying/etc. would not speed up nearly as fast as the relevant AI tech speeds up.

Your post on 'against GDP as a metric' [LW · GW] argues more forcefully for the same thing that I was arguing for, that 

'the economic doubling time' stops being so meaningful - technological progress speeds up abruptly but other kinds of progress that adapt to tech progress have more of a lag before the increased technological progress also affects them? 

So we're on the same page there that it's not likely that 'the economic doubling time' captures everything that's going on all that well, which leads to another problem - how do we predict what level of capability is necessary for a transformative AI to obtain a DSA (or reach the PONR for a DSA)?

I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I think that there is still a useful role for raw economic power measurements, in that they provide a sort of upper bound on how much capability difference is needed to conquer the world. If an AGI acquires resources equivalent to controlling >50% of the world's entire GDP, it can probably take over the world if it goes for the maximally brute force approach of just using direct military force. Presumably the PONR for that situation would be awhile before then, but at least we know that an advantage of a certain size would be big enough given no assumptions about the effectiveness of unproven technologies of persuasion or manipulation or specific vulnerabilities in human civilization.

So we can use our estimate of how doubling time may increase, anchor on that gap and estimate down based on how soon we think the PONR is, or how many 'cheat' pathways that don't involve economic growth there are.

The whole idea of using brute economic advantage as an upper limit 'anchor' I got from Ajeya's Post [AF · GW]about using biological anchors to forecast what's required for TAI - if we could find a reasonable lower bound for the amount of advantage needed to attain DSA we could do the same kind of estimated distribution between them. We would just need a lower limit - maybe there's a way of estimating it based on the upper limit of human ability since we know no actually existing human has used persuasion to take over the world but as you point out they've come relatively close.

I realize that's not a great method, but is there any better alternative given that this is a situation we've never encountered before, for trying to predict what level of capability is necessary for DSA? Or perhaps you just think that anchoring your prior estimate based on economic power advantage as an upper bound is so misleading it's worse than having a completely ignorant prior. In that case, we might have to say that there are just so many unprecedented ways that a transformative AI could obtain a DSA that we can just have no idea in advance what capability is needed, which doesn't feel quite right to me.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-02-07T10:37:32.363Z · LW(p) · GW(p)
I notice that in your post you don't propose an alternative metric to GDP, which is fair enough since most of your arguments seem to lead to the conclusion that it's almost impossibly difficult to predict in advance what level of advantage over the rest of the world in which areas are actually needed to conquer the world, since we seem to be able to analogize persuasion tools to or conquistador-analogues who had relatively small tech advantages, to the AGI situation.

I wouldn't go that far. The reason I didn't propose an alternative metric to GDP was that I didn't have a great one in mind and the post was plenty long enough already. I agree that it's not obvious a good metric exists, but I'm optimistic that we can at least make progress by thinking more. For example, we could start by enumerating different kinds of skills (and combos of skills) that could potentially lead to a PONR if some faction or AIs generally had enough of them relative to everyone else. (I sorta start such a list in the post). Next, we separately consider each skill and come up with a metric for it.

I'm not sure I understand your proposed methodology fully. Are you proposing we do something like Roodman's model [LW · GW] to forecast TAI and then adjust downwards based on how we think PONR could come sooner? I think unfortunately that GWP growth can't be forecast that accurately, since it depends on AI capabilities increases.

comment by ryan_b · 2021-01-11T21:27:11.143Z · LW(p) · GW(p)

Clock-time leads shrink automatically as the pace of innovation speeds up, because if everyone is innovating 10x faster, then you need 10x as many hoarded ideas to have an N-year lead. 

Have you considered how this model changes under the low hanging fruit hypothesis, by which I mean more advanced ideas in a domain are more difficult and time consuming to discover than the less advanced ones? My reasoning for why it matters:

  • DSA relies on one or more capability advantages.
  • Each capability depends on one or more domains of expertise to develop.
  • A certain amount of domain expertise is required to develop the capability.
  • Ideas become more difficult in terms of resources and time to discover as they approach the capability threshold.

Now this doesn't actually change the underlying intuition of a time advantage very much; mostly I just expect that the '10x faster innovation' component of the example will be deeply discontinuous. This leads naturally to thinking about things like a broad DSA, which might consist of a systematic advantage across capabilities, versus a tall DSA, which would be more like an overwhelming advantage in a single, high import capability.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-12T01:12:20.008Z · LW(p) · GW(p)

I haven't specifically tried to model the low hanging fruit hypothesis, but I do believe the hypothesis and so it probably doesn't contradict the model strongly. I don't quite follow your reasoning though--how does the hypothesis make discontinuities more likely? Can you elaborate?

Replies from: ryan_b
comment by ryan_b · 2021-01-12T12:33:52.113Z · LW(p) · GW(p)

Sure!

I have a few implicit assumptions that affect my thinking:

  • A soft takeoff starts from something resembling our world, distributed
  • There is at least one layer above ideas (capability)
  • Low hanging fruit hypothesis

The real work is being done by an additional two assumptions:

  • The capability layer grows in a way similar to the idea layer, and competes for the same resources
  • Innovation consists of at least one capability

So under my model, the core mechanism of differentiation is that developing an insurmountable single capability advantage competes with rapid gains in a different capability (or line of ideas), which includes innovation capacity. Further, different lines of ideas and capabilities will have different development speeds.

Now a lot of this differentiation collapses when we get more specific about what we are comparing, like if we choose Google, Facebook and Microsoft on the single capability of Deep Learning. It is worth considering that software has an unusually cheap transfer of ideas to capability, which is the crux of why AI weighs so heavily as a concern. But this is unique to software for now, and in order to be a strategic threat it has to cash out in non-software capability eventually, so keeping the others in mind feels important.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-12T18:10:48.367Z · LW(p) · GW(p)

OK, so if I'm getting this correctly, the idea is that there are different capabilities, and the low hanging fruit hypothesis applies separately to each one, and not all capabilities are being pursued successfully at all times, so when a new capability starts being pursued successfully there is a burst of rapid progress as low-hanging fruit is picked. Thus, progress should proceed jumpily, with some capabilities stagnant or nonexistent for a while and then quickly becoming great and then levelling off. Is this what you have in mind?

Replies from: ryan_b
comment by ryan_b · 2021-01-12T18:31:51.303Z · LW(p) · GW(p)

That is correct. And since different players start with different capabilities and are in different local environments under the soft takeoff assumption, I really can't imagine a scenario where everyone winds up in the same place (or even tries to get there - I strongly expect optimizing for different capabilities depending on the environment, too).

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-01-12T18:58:47.022Z · LW(p) · GW(p)

OK, I think I agree with this picture to some extent. It's just that if things like taking over the world require lots of different capabilities, maybe jumpy progress in specific capabilities distributed unevenly across factions all sorta averages out thanks to law of large numbers into smooth progress in world-takeover-ability distributed mostly evenly across factions.

Or not. Idk. I think this is an important variable to model and forecast, thanks for bringing it up!

comment by Tom Davidson (tom-davidson-1) · 2022-05-25T19:59:08.619Z · LW(p) · GW(p)

But realistically not all projects will hoard all their ideas. Suppose instead that for the leading project, 10% of their new ideas are discovered in-house, and 90% come from publicly available discoveries accessible to all. Then, to continue the car analogy, it’s as if 90% of the lead car’s acceleration comes from a strong wind that blows on both cars equally. The lead of the first car/project will lengthen slightly when measured by distance/ideas, but shrink dramatically when measured by clock time.

The upshot is that we should return to that table of factors and add a big one to the left-hand column: Leads shorten automatically as general progress speeds up, so if the lead project produces only a small fraction of the general progress, maintaining a 3-year lead throughout a soft takeoff is (all else equal) almost as hard as growing a 3-year lead into a 30-year lead during the 20th century. In order to overcome this, the factors on the right would need to be very strong indeed.

But won't "ability to get a DSA" be linked to the lead as measured in ideas rather than clock time?

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-05-25T23:03:52.304Z · LW(p) · GW(p)

Maybe. My model was a bit janky; I basically assume DSA-ability comes from clock-time lead but then also assumed that as technology and progress speed up the necessary clock-time lead shrinks. And I guesstimated that it would shrink to 0.3 - 3 years. I bet there's a better way, that pegs DSA-ability to ideas lead... it would be a super cool confirmation of this better model if we could somehow find data confirming that years-needed-for-DSA has fallen in lockstep as ideas-produced-per-year has risen.