Forecasting Transformative AI, Part 1: What Kind of AI?

post by HoldenKarnofsky · 2021-09-24T00:46:49.279Z · LW · GW · 17 comments

Contents

    PASTA: Process for Automating Scientific and Technological Advancement.
  Making PASTA
  Impacts of PASTA
    Explosive scientific and technological advancement
    Misaligned AI: mysterious, potentially dangerous objectives
  Conclusion
None
17 comments

PASTA: Process for Automating Scientific and Technological Advancement.

Audio also available by searching Stitcher, Spotify, Google Podcasts, etc. for "Cold Takes Audio"

This is the first of four posts summarizing hundreds of pages of technical reports focused almost entirely on forecasting one number. It's the single number I'd probably most value having a good estimate for: the year by which transformative AI will be developed.1 [LW(p) · GW(p)]

By "transformative AI," I mean "AI powerful enough to bring us into a new, qualitatively different future." The Industrial Revolution is the most recent example of a transformative event; others would include the Agricultural Revolution and the emergence of humans.2 [LW(p) · GW(p)]

This piece is going to focus on exploring a particular kind of AI I believe could be transformative: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I will call this sort of technology Process for Automating Scientific and Technological Advancement, or PASTA.3 [LW(p) · GW(p)] (I mean PASTA to refer to either a single system or a collection of systems that can collectively do this sort of automation.)

PASTA could resolve the same sort of bottleneck discussed in The Duplicator [? · GW] and This Can't Go On [? · GW] - the scarcity of human minds (or something that plays the same role in innovation).

PASTA could therefore lead to explosive science [? · GW], culminating in technologies as impactful as digital people [? · GW]. And depending on the details, PASTA systems could have objectives of their own, which could be dangerous for humanity and could matter a great deal for what sort of civilization ends up expanding through the galaxy [? · GW].

By talking about PASTA, I'm partly trying to get rid of some unnecessary baggage in the debate over "artificial general intelligence." I don't think we need artificial general intelligence in order for this century to be the most important in history. Something narrower - as PASTA might be - would be plenty for that.

To make this idea feel a bit more concrete, the rest of this post will discuss:

Future pieces will discuss how soon we might expect something like PASTA to be developed.

Making PASTA

I'll start with a very brief, simplified characterization of machine learning.

There are essentially two ways to "teach" a computer to do a task:

Traditional programming. In this case, you code up extremely specific, step-by-step instructions for completing the task. For example, the chess-playing program Deep Blue is essentially executing instructions4 [LW(p) · GW(p)] along the lines of:

Machine learning. This is essentially "training" an AI to do a task by trial and error, rather than by giving it specific instructions. Today, the most common way of doing this is by using an "artificial neural network" (ANN), which you might think of sort of like a "digital brain" that starts in an empty (or random) state: it hasn't yet been wired to do specific things.

For example, AlphaZero - an AI that has been used to master multiple board games including chess and Go - does something more like this (although it has important elements of "traditional programming" as well, which I'm ignoring for simplicity):

The latter approach is central for a lot of the recent progress in AI. This is especially true for tasks that are hard to “write down all the instructions” for. For example, humans are able to write down some reasonable guidelines for succeeding at chess, but we know very little about how we ourselves classify images (determine whether some image is of a dog, cat, or something else). So machine learning is particularly essential for tasks like classifying images.

Could PASTA be developed via machine learning? One obvious (but unrealistic) way of doing this might be something like this:

This would be wildly impractical, at least compared to how I think things are more likely to play out, but it hopefully gives a starting intuition for what a training process could be trying to accomplish: by providing a signal of "how the AI is doing," it could allow an AI to get good at the goal via trial-and-error and tweaking its internal wiring.

In reality, I'd expect training to be faster and more practical due to things like:

Developing PASTA will almost certainly be hugely harder and more expensive than it was for AlphaZero. It may require a lot of ingenuity to get around obstacles that exist today (the picture above is surely radically oversimplified, and is there to give basic intuitions). But AI research is simultaneously getting cheaper8 [LW(p) · GW(p)] and better-funded. I'll argue in future pieces that the odds of developing PASTA in the coming decades are substantial.

Impacts of PASTA

Explosive scientific and technological advancement

I've previously talked about the idea of a potential explosion in scientific and technological advancement [? · GW], which could lead to a radically unfamiliar future [? · GW].

I've emphasized that such an explosion could be caused by a technology that "dramatically increased the number of 'minds' (humans, or digital people [? · GW], or advanced AIs) pushing forward scientific and technological advancement."

PASTA would fit this bill well, particularly if it were as good as humans (or better) at finding better, cheaper ways to make more PASTA systems. PASTA would have all of the tools for a productivity explosion that I previously laid out for digital people [? · GW]:

 

 

 

 

 

Thanks to María Gutiérrez Rojas for these graphics, a variation on similar graphics from The Duplicator [? · GW] and Digital People Would Be An Even Bigger Deal [? · GW] illustrating the dynamics of explosive growth. Here, instead of people having ideas that increase productivity, it's AI algorithms (denoted by neural network icons).

 

Why doesn't this feedback loop apply to today's computers and AIs? Because today's computers and AIs aren't able to do all of the things required to have new ideas and get themselves copied more efficiently. They play a role in innovation, but innovation is ultimately bottlenecked by humans, whose population is only growing so fast. This is what PASTA would change (it is also what digital people [? · GW] would change).

Additionally: unlike digital copies of humans, PASTA systems might not be attached to their existing identity and personality. A PASTA system might quickly make any edits to its "mind" that made it more effective at pushing science and technology forward. This might (or might not, depending on a lot of details) lead to recursive self-improvement and an "intelligence explosion." But even if this didn't pan out, simply being as good as humans at making more PASTA systems could cause explosive advancement for the same reasons the digital people could [? · GW].

Misaligned AI: mysterious, potentially dangerous objectives

If PASTA were developed as outlined above [LW · GW], it's possible that we might know extremely little about its inner workings.

AlphaZero - like other modern deep learning systems - is in a sense very poorly understood. We know that it "works." But we don't really know "what it's thinking."

If we want to know why AlphaZero made some particular chess move, we can't look inside its code to find ideas like "Control the center of the board" or "Try not to lose my queen." Most of what we see is just a vast set of numbers, denoting the strengths of connections between different artificial neurons. As with a human brain, we can mostly only guess at what the different parts of the "digital brain" are doing9 [LW(p) · GW(p)] (although there are some early attempts to do what one might call "digital neuroscience.")

The "designers" of AlphaZero (discussed above) didn't need much of a vision for how its thought processes would work. They mostly just set it up so that it would get a lot of trial and error, and evolve to get a particular result (win the game it’s playing). Humans, too, evolved primarily through trial and error, with selection pressure to get particular results (survival and reproduction - although the selection worked differently).

Like humans, PASTA systems might be good at getting the results they are under pressure to get. But like humans, they might learn along the way to think and do all sorts of other things, and it won't necessarily be obvious to the designers whether this is happening.

Perhaps, due to being optimized for pushing forward scientific and technological advancement, PASTA systems will be in the habit of taking every opportunity to do so. This could mean that they would - given the opportunity - seek to fill the galaxy with long-lasting space settlements [? · GW] devoted to science.

Perhaps PASTA will emerge as some byproduct of another objective. For example, perhaps humans will be trying to train systems to make money or amass power and resources, and setting them up to do scientific and technological advancement will just be part of that. In which case, perhaps PASTA systems will just end up as power-and-resources seekers, and will seek to bring the whole galaxy under their control.

Or perhaps PASTA systems will end up with very weird, "random" objectives. Perhaps some PASTA system will observe that it "succeeds" (gets a positive training signal) whenever it does something that causes it to have direct control over an increased amount of electric power (since this is often a result of advancing technology and/or making money), and it will start directly aiming to increase its supply of electric power as much as possible - with the difference between these two objectives not being noticed until it becomes quite powerful. (Analogy: humans have been under selection pressure to pass their genes on, but many have ended up caring more about power, status, enjoyment, etc. than about genes.)

These are scary possibilities if we are talking about AI systems (or collections of systems) that may be more capable than humans in at least some domains.

If you're interested in more discussion of whether an AI could or would have its own goals, I'd suggest checking out Superintelligence (book), The case for taking AI seriously as a threat to humanity (Vox article), Draft report on existential risk from power-seeking AI (Open Philanthropy analysis) [AF · GW] or one of the many other pieces on this topic.10 [LW(p) · GW(p)]

Conclusion

It's hard to predict what a world with PASTA might look like, but two salient possibilities would be:

The next 3 posts will argue that PASTA is more likely than not to be developed this century.
 

17 comments

Comments sorted by top scores.

comment by avturchin · 2021-09-24T15:35:52.131Z · LW(p) · GW(p)

When you say "the year of PASTA", you probably mean the year than AI appears with 50 per cent probability. But why "50 per cent probability"? 10 per cent seems to be more important. For example, when we say "human life expectancy is 75 years", it means that in the half of the worlds I will die before 75.  The same way, by using the median year as a measure of AI timing, you already accept the loss of the half of human future when AI will appear before that date.

More generally, speaking about the "year of AI" is meaningful only if the dispersion of the Probability-of-AI-appearance(t) is small. If 10 per cent is 2030, 50 per cent is in 2100 and 90 per cent is in the year 3000, than saying that AI will appear in 2100 is a completely misleading picture. 

That is, there are two problem in using year as a way to estimate AI-timing: 1) humanity will go extinct in the half of cases before this year 2) it creates a false impression that AI probability of appearance is a bell-like curve with a small deviation from the mean. 

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2021-09-25T02:23:05.661Z · LW(p) · GW(p)

when we say "human life expectancy is 75 years", it means that in the half of the worlds I will die before 75

Life expectancy is the mean value of the distribution, not the median value. Half of the worlds are below the median value, which is not life expectancy.

Replies from: avturchin
comment by avturchin · 2021-09-25T10:20:15.513Z · LW(p) · GW(p)

Yes. I found that "male life expectancy at birth is 76 years. That’s the mean value. The median life expectancy is just past age 80. And the mode (i..e, most common) age at death is age 86."

comment by HoldenKarnofsky · 2021-09-23T00:41:33.081Z · LW(p) · GW(p)

Footnotes Container

Replies from: HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky, HoldenKarnofsky
comment by HoldenKarnofsky · 2021-09-24T00:00:38.648Z · LW(p) · GW(p)

1. Of course, the answer could be "A kajillion years from now" or "Never."

comment by HoldenKarnofsky · 2021-09-24T00:00:11.463Z · LW(p) · GW(p)

2. See this section of "Forecasting TAI with Biological Anchors" (Cotra (2020)) for a more full definition of "transformative AI."

comment by HoldenKarnofsky · 2021-09-23T23:59:51.803Z · LW(p) · GW(p)

3. I'm sorry. But I do think the rest of the series will be slightly more fun to read this way.

comment by HoldenKarnofsky · 2021-09-23T23:59:34.720Z · LW(p) · GW(p)

4. The examples here are of course simplified. For example, both Deep Blue and AlphaGo incorporate substantial amounts of "tree search," a traditionally-programmed algorithm that has its own "trial and error" process.

comment by HoldenKarnofsky · 2021-09-23T23:59:14.191Z · LW(p) · GW(p)

5. And they can include simulating long chains of future game states.

comment by HoldenKarnofsky · 2021-09-23T23:58:56.266Z · LW(p) · GW(p)

6. Some AIs could be used to determine whether papers are original contributions based on how they are later cited; others could be used to determine whether papers are original contributions based only on the contents of the paper and on previous literature. The former could be used to train the latter, by providing a "That's correct" or "That's wrong" signal for judgments of originality. Similar methods could be used for training AIs to assess the correctness of papers.

comment by HoldenKarnofsky · 2021-09-23T23:57:34.627Z · LW(p) · GW(p)

8. Due to improvements in hardware and software.

comment by HoldenKarnofsky · 2021-09-23T23:57:10.696Z · LW(p) · GW(p)

9. It's even worse than spaghetti code.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-09-24T01:50:32.521Z · LW(p) · GW(p)

This is not very important, but: What was your thought process behind the acronym PASTA? It sounds kinda silly, and while I don't mind that myself I feel like that makes it harder to pitch to other people new to the topic. You could have said something like "R&D Automation."

Replies from: JBlack
comment by JBlack · 2021-09-24T08:01:55.157Z · LW(p) · GW(p)

I think the rather small image gives the game away there. 

comment by subhashishcode · 2021-10-06T18:41:37.936Z · LW(p) · GW(p)

The scientific method, that is followed to create new knowledge in science, is basically four steps - Observe, Hypothesize, Test, Prove/Disprove.  How is PASTA going to do each of these steps ?