# Christiano, Cotra, and Yudkowsky on AI progress

post by Eliezer Yudkowsky (Eliezer_Yudkowsky), Ajeya Cotra (ajeya-cotra) · 2021-11-25T16:45:32.482Z · LW · GW · 95 comments

## Contents

  8. September 20 conversation
8.1. Chess and Evergrande
9. September 21 conversation
9.1. AlphaZero, innovation vs. industry, the Wright Flyer, and the Manhattan Project
9.2. AI alignment vs. biosafety, and measuring progress
9.3. Requirements for FOOM
9.4. AI-driven accelerating economic growth
9.5. Brain size and evolutionary history
9.6. Architectural innovation in AI and in evolutionary history
9.7. Styles of thinking in forecasting
9.8. Moravec's prediction
9.9. Prediction disagreements and bets
9.10. Prediction disagreements and bets: Standard superforecaster techniques
9.11. Prediction disagreements and bets: Late-stage predictions, and betting against superforecasters
9.12. Self-duplicating factories, AI spending, and Turing test variants
9.13. GPT-n and small architectural innovations vs. large ones
None


This post is a transcript of a discussion between Paul Christiano, Ajeya Cotra, and Eliezer Yudkowsky on AGI forecasting, following up on Paul and Eliezer's "Takeoff Speeds" discussion [? · GW].

Color key:

# 9. September 21 conversation

## 9.13. GPT-n and small architectural innovations vs. large ones

comment by jessicata (jessica.liu.taylor) · 2021-11-25T19:02:43.628Z · LW(p) · GW(p)

A bunch of this was frustrating to read because it seemed like Paul was yelling "we should model continuous changes!" and Eliezer was yelling "we should model discrete events!" and these were treated as counter-arguments to each other.

It seems obvious from having read about dynamical systems that continuous models still have discrete phase changes. E.g. consider boiling water. As you put in energy the temperature increases until it gets to the boiling point, at which point more energy put in doesn't increase the temperature further (for a while), it converts more of the water to steam; after all the water is converted to steam, more energy put in increases the temperature further.

So there are discrete transitions from (a) energy put in increases water temperature to (b) energy put in converts water to steam to (c) energy put in increases steam temperature.

In the case of AI improving AI vs. humans improving AI, a simple model to make would be one where AI quality is modeled as a variable, , with the following dynamical equation:

where is the speed at which humans improve AI and is a recursive self-improvement efficiency factor. The curve transitions from a line at early times (where ) to an exponential at later times (where ). It could be approximated as a piecewise function with a linear part followed by an exponential part, which is a more-discrete approximation than the original function, which has a continuous transition between linear and exponential.

This is nowhere near an adequate model of AI progress, but it's the sort of model that would be created in the course of a mathematically competent discourse on this subject on the way to creating an adequate model.

Dynamical systems contains many beautiful and useful concepts like basins of attraction which make sense of discrete and continuous phenomena simultaneously (i.e. there are a discrete number of basins of attraction which points fall into based on their continuous properties).

I've found Strogatz's book, Nonlinear Dynamics and Chaos, helpful for explaining the basics of dynamical systems.

Replies from: paulfchristiano, paulfchristiano, matthew-barnett
comment by paulfchristiano · 2021-11-25T23:21:33.854Z · LW(p) · GW(p)

I don’t really feel like anything you are saying undermines my position here, or defends the part of Eliezer’s picture I’m objecting to.

(ETA: but I agree with you that it's the right kind of model to be talking about and is good to bring up explicitly in discussion. I think my failure to do so is mostly a failure of communication.)

I usually think about models that show the same kind of phase transition you discuss, though usually significantly more sophisticated models and moving from exponential to hyperbolic growth (you only get an exponential in your model because of the specific and somewhat implausible functional form for technology in your equation).

With humans alone I expect efficiency to double roughly every year based on the empirical returns curves, though it depends a lot on the trajectory of investment over the coming years. I've spent a long time thinking and talking with people about these issues.

At the point when the work is largely done by AI, I expect progress to be maybe 2x faster, so doubling every 6 months. And them from there I expect a roughly hyperbolic trajectory over successive doublings.

If takeoff is fast I still expect it to most likely be through a similar situation, where e.g. total human investment in AI R&D never grows above 1% and so at the time when takeoff occurs the AI companies are still only 1% of the economy.

Replies from: conor-sullivan
comment by Conor Sullivan (conor-sullivan) · 2021-11-27T01:49:08.901Z · LW(p) · GW(p)

Excuse my ignorance, what does a hyperbolic function look like? If an exponential is f(x) = r^x, what is f(x) for a hyperbolic function?

Replies from: paulfchristiano, So8res
comment by paulfchristiano · 2021-11-27T06:24:24.271Z · LW(p) · GW(p)

. It's the solution to the differential equation  instead of . I usually use it more broadly for , which is the solution to

Replies from: TekhneMakre, rohinmshah
comment by TekhneMakre · 2021-11-27T09:45:11.197Z · LW(p) · GW(p)

Why do you use this form? Do you lean more on:
1. Historical trends that look hyperbolic;
2. Specific dynamical models like: let α be the synergy between "different innovations" as they're producing more innovations; this gives f'(x) = f(x)^(1+α) *; or another such model?;
3. Something else?

I wonder if there's a Paul-Eliezer crux here about plausible functional forms. For example, if Eliezer thinks that there's very likely also a tech tree of innovations that change the synergy factor α, we get something like e.g. (a lower bound of) f'(x) = f(x)^f(x). IDK if there's any help from specific forms; just that, it's plausible that there's forms that are (1) pretty simple, pretty straightforward lower bounds from simple (not necessarily high confidence) considerations of the dynamics of intelligence, and (2) look pretty similar to hyperbolic growth, until they don't, and the transition happens quickly. Though maybe, if Eliezer thinks any of this and also thinks that these superhyperbolic synergy dynamics are already going on, and we instead use a stochastic differential equation, there should be something more to say about variance or something pre-End-times.

*ETA: for example, if every innovation combines with every other existing innovation to give one unit of progress per time, we get the hyperbolic f'(x) = f(x)^2; if innovations each give one progress per time but don't combine, we get the exponential f'(x) = f(x).

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-27T17:00:21.538Z · LW(p) · GW(p)

I think there are two easy ways to get hyperbolic growth:

• As long as there is free energy in the environment, without any technological change you can grow like . Then if there is any technological progress that can be driven by your expanding physical civilization, then you get , where  depends on how fast the returns to technology diminish.
• Even without physical growth, if you have sufficiently good returns to technology (as we observe for historical technologies, if you treat doubling food as doubling output, or for modern information technology) then you end up with a similar functional form.

That would feel more like "plausible guess" if we didn't have any historical data, but given that historical growth has in fact accelerated a huge amount it seems like a solid best guess to me. There's been a bunch of debate about whether the historical data implies something kind of like this kind of functional form, or merely implies some kind of dramatic acceleration and is consistent with this functional form. But either way, it seems like the good bet is further dramatic acceleration if we either start returning energy capture to output (via AI) or start getting overall technological progress that is similar to existing rates of progress in computer hardware and software (via AI).

comment by rohinmshah · 2021-11-30T13:19:35.759Z · LW(p) · GW(p)

Nitpick: Isn't  the solution for  modulo constants? Or equivalently,  is the solution to .

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-30T16:38:18.780Z · LW(p) · GW(p)

Yep, will fix.

comment by So8res · 2021-11-27T02:16:00.810Z · LW(p) · GW(p)

-r/x

Replies from: conor-sullivan
comment by Conor Sullivan (conor-sullivan) · 2021-11-27T02:59:06.461Z · LW(p) · GW(p)

Finally a definitely of The Singularity that actually involves a mathematical singularity! Thank you.

comment by paulfchristiano · 2021-11-26T08:00:29.684Z · LW(p) · GW(p)

(I'm interested in which of my claims seem to dismiss or not adequately account for the possibility that continuous systems have phase changes.)

Replies from: jessica.liu.taylor
comment by jessicata (jessica.liu.taylor) · 2021-11-26T18:34:49.603Z · LW(p) · GW(p)

This section seemed like an instance of you and Eliezer talking past each other in a way that wasn't locating a mathematical model containing the features you both believed were important (e.g. things could go "whoosh" while still being continuous):

[Christiano][13:46]

Even if we just assume that your AI needs to go off in the corner and not interact with humans, there’s still a question of why the self-contained AI civilization is making ~0 progress and then all of a sudden very rapid progress

[Yudkowsky][13:46]

unfortunately a lot of what you are saying, from my perspective, has the flavor of, “but can’t you tell me about your predictions earlier on of the impact on global warming at the Homo erectus level”

you have stories about why this is like totally not a fair comparison

I do not share these stories

[Christiano][13:46]

I don’t understand either your objection nor the reductio

like, here’s how I think it works: AI systems improve gradually, including on metrics like “How long does it take them to do task X?” or “How high-quality is their output on task X?”

[Yudkowsky][13:47]

I feel like the thing we know is something like, there is a sufficiently high level where things go whooosh humans-from-hominids style

[Christiano][13:47]

We can measure the performance of AI on tasks like “Make further AI progress, without human input”

Any way I can slice the analogy, it looks like AI will get continuously better at that task

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T20:07:47.181Z · LW(p) · GW(p)

My claim is that the timescale of AI self-improvement, at the point it takes over from humans, is the same as the previous timescale of human-driven AI improvement. If it was a lot faster, you would have seen a takeover earlier instead.

This claim is true in your model. It also seems true to me about hominids, that is I think that cultural evolution took over roughly when its timescale was comparable to the timescale for biological improvements, though Eliezer disagrees

I thought Eliezer's comment "there is a sufficiently high level where things go whooosh humans-from-hominids style" was missing the point. I think it might have been good to offer some quantitative models at that point though I haven't had much luck with that.

I can totally grant there are possible models for why the AI moves quickly from "much slower than humans" to "much faster than humans," but I wanted to get some model from Eliezer to see what he had in mind.

(I find fast takeoff from various frictions more plausible, so that the question mostly becomes one about how close we are to various kinds of efficient frontiers, and where we respectively predict civilization to be adequate/inadequate or progress to be predictable/jumpy.)

Replies from: conor-sullivan, JBlack
comment by Conor Sullivan (conor-sullivan) · 2021-11-27T02:03:18.704Z · LW(p) · GW(p)

It seems to me that Eliezer's model of AGI is bit like an engine, where if any important part is missing, the entire engine doesn't move. You can move a broken steam locomotive as fast as you can push it, maybe 1km/h. The moment you insert the missing part, the steam locomotive accelerates up to 100km/h. Paul is asking "when does the locomotive move at 20km/h" and Eliezer says "when the locomotive is already at full steam and accelerating to 100km/h." There's no point where the locomotive is moving at 20km/h and not accelerating, because humans can't push it that fast, and once the engine is working, it's already accelerating to a much faster speed.

In Paul's model, there IS such a thing as 95% AGI, and it's 80% or 20% or 2% as powerful on some metric we can measure, whereas in Eliezer's model there's no such thing as 95% AGI. The 95% AGI is like a steam engine that's missing it's pistons, or some critical valve, and so it doesn't provide any motive power at all. It can move as fast as humans can push it, but it doesn't provide any power of it's own.

Replies from: TekhneMakre
comment by TekhneMakre · 2021-11-27T10:22:48.930Z · LW(p) · GW(p)

And then Paul's response to Eliezer is like "but engines don't just appear without precedent, there's worse partial versions of them beforehand, much more so if people are actually trying to do locomotion; so even if knocking out a piece of the AI that FOOMs would make it FOOM much slower, that doesn't tell us much about the lead-up to FOOM, and doesn't tell us that the design considerations that go into the FOOMer are particularly discontinuous with previously explored design considerations"?

Replies from: conor-sullivan
comment by Conor Sullivan (conor-sullivan) · 2021-11-28T04:11:14.680Z · LW(p) · GW(p)

Right, and history sides with Paul. The earliest steam engines were missing key insights and so operated slowly, used their energy very inefficiently, and were limited in what they could do. The first steam engines were used as pumps, and it took a while before they were powerful enough to even move their own weight (locomotion). Each progressive invention, from Savery to Newcomen to Watt dramatically improved the efficiency of the engine, and over time engines could do more and more things, from pumping to locomotion to machining to flight. It wasn't just one sudden innovation and now we have an engine that can do all the things including even lifting itself against the pull of Earth's gravity. It took time, and progress on smooth metrics, before we had extremely powerful and useful engines that powered the industrial revolution. That's why the industrial revolution(s) took hundreds of years. It wasn't one sudden insight that made it all click.

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-11-30T17:34:51.059Z · LW(p) · GW(p)

To which my Eliezer-model's response is "Indeed, we should expect that the first AGI systems will be pathetic in relative terms, comparing them to later AGI systems. But the impact of the first AGI systems in absolute terms is dependent on computer-science facts, just as the impact of the first nuclear bombs was dependent on facts of nuclear physics. Nuclear bombs have improved enormously since Trinity and Little Boy, but there is no law of nature requiring all prototypes to have approximately the same real-world impact, independent of what the thing is a prototype of."

comment by JBlack · 2021-11-27T03:28:06.498Z · LW(p) · GW(p)

My main concern is that progress on the frontier tends to be bursty.

There are many metrics of AI performance on particular tasks where performance does indeed increase fairly continuously on the larger scale, but not in detail. Over the scale of many years it goes from abysmal to terrible to merely bad to nearly human to worse than human in some ways but better than human in others, and then to superhuman. Each of these transitions is often a sharp jump, but you see steady progress if you plot it on a graph. When you combine with having thousands of types of tasks, you end up with an overview of even smoother progress over the whole field.

There are three problems I'm worried about.

The first is that "designing better AIs" may turn out to be a relatively narrow task, and subject to a lot more burstiness than broad spectrum performance that could steadily increase world GDP.

The second is that for purposes of the future of humanity, only the last step from human-adjacent to strictly superhuman really matters. On the scale of intelligence for all the beings we know about, chimpanzees are very nearly human, but the economic effect of chimpanzees is essentially zero.

The third is that we are nowhere near fully exploiting the hardware we have for AI, and I expect that to continue for quite a while.

I think any two of these three are enough for a fast takeoff with little warning.

comment by Matthew Barnett (matthew-barnett) · 2021-11-25T19:29:07.079Z · LW(p) · GW(p)

+1 on using dynamical systems models to try to formalize the frameworks in this debate. I also give Eliezer points for trying to do something similar in Intelligence Explosion Microeconomics (and to people who have looked at this from the macro perspective).

comment by Rob Bensinger (RobbBB) · 2021-11-30T18:44:54.874Z · LW(p) · GW(p)

Found two Eliezer-posts from 2016 (on Facebook) that I feel helped me better grok his perspective.

It is amazing that our neural networks work at all; terrifying that we can dump in so much GPU power that our training methods work at all; and the fact that AlphaGo can even exist is still blowing my mind. It's like watching a trillion spiders with the intelligence of earthworms, working for 100,000 years, using tissue paper to construct nuclear weapons.

And earlier, Jan. 27, 2016:

People occasionally ask me about signs that the remaining timeline might be short. It's very easy for nonprofessionals to take too much alarm too easily. Deep Blue beating Kasparov at chess was not such a sign. Robotic cars are not such a sign.

This is.

"Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves... Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0."

Repeat: IT DEFEATED THE EUROPEAN GO CHAMPION 5-0.

As the authors observe, this represents a break of at least one decade faster than trend in computer Go.

This matches something I've previously named in private conversation as a warning sign - sharply above-trend performance at Go from a neural algorithm. What this indicates is not that deep learning in particular is going to be the Game Over algorithm. Rather, the background variables are looking more like "Human neural intelligence is not that complicated and current algorithms are touching on keystone, foundational aspects of it." What's alarming is not this particular breakthrough, but what it implies about the general background settings of the computational universe.

To try spelling out the details more explicitly, Go is a game that is very computationally difficult for traditional chess-style techniques. Human masters learn to play Go very intuitively, because the human cortical algorithm turns out to generalize well. If deep learning can do something similar, plus (a previous real sign) have a single network architecture learn to play loads of different old computer games, that may indicate we're starting to get into the range of "neural algorithms that generalize well, the way that the human cortical algorithm generalizes well".

This result also supports that "Everything always stays on a smooth exponential trend, you don't get discontinuous competence boosts from new algorithmic insights" is false even for the non-recursive case, but that was already obvious from my perspective. Evidence that's more easily interpreted by a wider set of eyes is always helpful, I guess.

Next sign up might be, e.g., a similar discontinuous jump in machine programming ability - not to human level, but to doing things previously considered impossibly difficult for AI algorithms.

I hope that everyone in 2005 who tried to eyeball the AI alignment problem, and concluded with their own eyeballs that we had until 2050 to start really worrying about it, enjoyed their use of whatever resources they decided not to devote to the problem at that time.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T18:18:27.534Z · LW(p) · GW(p)

I feel like the biggest subjective thing is that I don't feel like there is a "core of generality" that GPT-3 is missing

I just expect it to gracefully glide up to a human-level foom-ing intelligence

This is a place where I suspect we have a large difference of underlying models.  What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers?  Particularly if you have an answer to anything that sounds like it's in the style of Gwern's questions [LW(p) · GW(p)], because I think those are the things that actually matter and which are hard to predict from trendlines and which ought to depend on somebody's model of "what kind of generality makes it into GPT-3's successors".

Replies from: paulfchristiano, paulfchristiano
comment by paulfchristiano · 2021-11-25T23:28:39.427Z · LW(p) · GW(p)

If you give me 1 or 10 examples of surface capabilities I'm happy to opine. If you want me to name industries or benchmarks, I'm happy to opine on rates of progress. I don't like the game where you say "Hey, say some stuff. I'm not going to predict anything and I probably won't engage quantitatively with it since I don't think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3."

I don't even know which of Gwern's questions you think are interesting/meaningful. "Good meta-learning"--I don't know what this means but if actually ask a real question I can guess. Qualitative descriptions---what is even a qualitative description of GPT-3? "Causality"---I think that's not very meaningful and will be used to describe quantitative improvements at some level made up by the speaker.  The spikes in capabilities Gwern talks about seem to be basically measurement artifacts, but if you want to describe a particular measurements I can tell you whether they will have similar artifacts. (How much economic value I can talk about, but you don't seem interested.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-26T00:13:02.571Z · LW(p) · GW(p)

Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day.  The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" and then I say "Starting in 2022 would not surprise me" by way of making an antiprediction that contradicts them.  It may sound bold and startling to them, but from my own perspective I'm just expressing my ignorance.  That's one reason why I keep saying, if you think the world more orderly than that, why not opine on it yourself to get the Bayes points for it - why wait for me to ask you?

If you ask me to extend out a rare tendril of guessing, I might guess, for example, that it seems to me that GPT-3's current text prediction-hence-production capabilities are sufficiently good that it seems like somewhere inside GPT-3 must be represented a level of understanding which seems like it should also suffice to, for example, translate Chinese to English or vice-versa in a way that comes out sounding like a native speaker, and being recognized as basically faithful to the original meaning.  We haven't figured out how to train this input-output behavior using loss functions, but gradient descent on stacked layers the size of GPT-3 seems to me like it ought to be able to find that functional behavior in the search space, if we knew how to apply the amounts of compute we've already applied using the right loss functions.

So there's a qualitative guess at a surface capability we might see soon - but when is "soon"?  I don't know; history suggests that even what predictably happens later is extremely hard to time.  There are subpredictions of the Yudkowskian imagery that you could extract from here, including such minor and perhaps-wrong but still suggestive implications like, "170B weights is probably enough for this first amazing translator, rather than it being a matter of somebody deciding to expend 1.7T (non-MoE) weights, once they figure out the underlying setup and how to apply the gradient descent" and "the architecture can potentially look like somebody Stacked More Layers and like it didn't need key architectural changes like Yudkowsky suspects may be needed to go beyond GPT-3 in other ways" and "once things are sufficiently well understood, it will look clear in retrospect that we could've gotten this translation ability in 2020 if we'd spent compute the right way".

It is, alas, nowhere written in this prophecy that we must see even more un-Paul-ish phenomena, like translation capabilities taking a sudden jump without intermediates.  Nothing rules out a long wandering road to the destination of good translation in which people figure out lots of little things before they figure out a big thing, maybe to the point of nobody figuring out until 20 years later the simple trick that would've gotten it done in 2020, a la ReLUs vs sigmoids.  Nor can I say that such a thing will happen in 2022 or 2025, because I don't know how long it takes to figure out how to do what you clearly ought to be able to do.

I invite you to express a different take on machine translation; if it is narrower, more quantitative, more falsifiable, and doesn't achieve this just by narrowing its focus to metrics whose connection to the further real-world consequences is itself unclear, and then it comes true, you don't need to have explicitly bet against me to have gained more virtue points.

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T06:58:16.252Z · LW(p) · GW(p)

I'm mostly not looking for virtue points, I'm looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.

I don't think it's surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren't happy just predicting numbers for overall value added from machine translation, I'd kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.

It seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn't seem like you'll be able to find (ii) by looking at predictions for the near future.

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T08:05:04.988Z · LW(p) · GW(p)

It seems to me like Eliezer rejects a lot of important heuristics like "things change slowly" and "most innovations aren't big deals" and so on. One reason he may do that is because he literally doesn't know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he'd see that actual gradualists are much better predictors than he imagines.

That seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I'd guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don't apply, and that this is foreseeable because of e.g. the nature of recursion. I'd love to hear more about what sort of knowledge about "operating these heuristics" you think he's missing!

Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be "shaken out" of his fast-takeoff view due to successful future predictions (until it's too late).

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T19:32:16.949Z · LW(p) · GW(p)

He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight.

I agree that after shaking out the other disagreements, we could just end up with Eliezer saying "yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we've applied AI" (or "AI improving AI will be fundamentally unlike automating humans improving AI") but I don't think that's the core of his position right now.

comment by paulfchristiano · 2021-11-26T07:53:17.980Z · LW(p) · GW(p)

I agree we seem to have some kind of deeper disagreement here.

I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn't use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever.

I think these won't get to human level in the next 5 years. We'll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren't currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won't just say "the Future is hard to predict." (Though separately I expect to make somewhat better predictions than you in most of these domains.)

A plausible example is that I think it's pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) "on track" to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I'd guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that's 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.

Replies from: soren-elverlin-1
comment by Søren Elverlin (soren-elverlin-1) · 2021-11-26T14:42:35.840Z · LW(p) · GW(p)

How long time do you see between "1 AI clearly on track to Foom" and "First AI to actually Foom"? My weak guess is Eliezer would say "Probably quite little time", but your model of the world requires the GWP to double over a 4 year period, and I'm guessing that period probably starts later than 2026.

I would be surprised if by 2027, I could point to an AI that for a full year had been on track to Foom, without Foom happening.

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T19:29:49.221Z · LW(p) · GW(p)

I think "on track to foom" is a very long way before "actually fooms."

comment by Lanrian · 2021-11-25T23:02:05.145Z · LW(p) · GW(p)
and some of my sense here is that if Paul offered a portfolio bet of this kind, I might not take it myself, but EAs who were better at noticing their own surprise might say, "Wait, that's how unpredictable Paul thinks the world is?"

If Eliezer endorses this on reflection, that would seem to suggest that Paul actually has good models about how often trend breaks happen, and that the problem-by-Eliezer's-lights is relatively more about, either:

• that Paul's long-term predictions do not adequately take into account his good sense of short-term trend breaks.
• that Paul's long-term predictions are actually fine and good, but that his communication about it is somehow misleading to EAs.

That would be a very different kind of disagreement than I thought this was about. (Though actually kind-of consistent with the way that Eliezer previously didn't quite diss Paul's track-record, but instead dissed "the sort of person who is taken in by this essay [is the same sort of person who gets taken in by Hanson's arguments in 2008 and gets caught flatfooted by AlphaGo and GPT-3 and AlphaFold 2]"?)

Also, none of this erases the value of putting forward the predictions mentioned in the original quote, since that would then be a good method of communicating Paul's (supposedly miscommunicated) views.

Replies from: conor-sullivan
comment by Conor Sullivan (conor-sullivan) · 2021-11-27T02:44:06.677Z · LW(p) · GW(p)

Apologies for my ignorance, does EA mean Effective Altruist?

Replies from: sil-ver
comment by Rafael Harth (sil-ver) · 2021-11-27T03:53:33.825Z · LW(p) · GW(p)

Yup. Both Effective Altruism and Effective Altruist are abbreviated as EA.

comment by johnswentworth · 2021-11-25T22:16:54.716Z · LW(p) · GW(p)

Some thinking-out-loud on how I'd go about looking for testable/bettable prediction differences here...

I think my models overlap mostly with Eliezer's in the relevant places, so I'll use my own models as a proxy for his, and think about how to find testable/bettable predictions with Paul (or Ajeya, or someone else in their cluster).

One historical example immediately springs to mind where something-I'd-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve. The original Philips curve was based on just fitting a curve to inflation-vs-unemployment data; Friedman and Phelps both independently came up with theoretical models for that relationship in the late sixties ('67-'68), and Friedman correctly forecasted that the curve would break down in the next recession (i.e. the "stagflation" of '73-'75). This all led up to the Lucas Critique, which I'd consider the canonical case-against-what-I'd-call-Paul-esque-worldviews within economics. The main idea which seems transportable to other contexts is that surface relations (like the Philips curve) break down under distribution shifts in the underlying factors.

So, how would I look for something analogous to that situation in today's AI? We need something with an established trend, but where a distribution shift happens in some underlying factor. One possible place to look: I've heard that OpenAI plans to make the next generation of GPT not actually much bigger than the previous generation; they're trying to achieve improvement through strategies other than Stack More Layers. Assuming that's true, it seems like a naive Paul-esque model would predict that the next GPT is relatively unimpressive compared to e.g. the GPT2 -> GPT 3 delta? Whereas my models (or I'd guess Eliezer's models) would predict that it's relatively more impressive, compared to the expectations of Paul-esque models (derived by e.g. extrapolating previous performance as a function of model size and then plugging in actual size of the next GPT)? I wouldn't expect either view to make crisp high-certainty predictions here, but enough to get decent Bayesian evidence.

Other than distribution shifts, the other major place I'd look for different predictions is in the extent to which aggregates tell us useful things. The post got into that in a little detail, but I think there's probably still room there. For instance, I recently sat down and played with some toy examples of GDP growth induced by tech shifts, and I was surprised by how smooth GDP was even in scenarios with tech shifts which seemed very impactful to me [LW · GW]. I expect that Paul would be even more surprised by this if he were to do the same exercise. In particular, this quote seems relevant:

the point is that housing and healthcare are not central examples of things that scale up at the beginning of explosive growth, regardless of whether it's hard or soft

It is surprisingly difficult to come up with a scenario where GDP growth looks smooth AND housing+healthcare don't grow much AND GDP growth accelerates to a rate much faster than now. If everything except housing and healthcare are getting cheaper, then housing and healthcare will likely play a much larger role in GDP (and together they're 30-35% already), eventually dominating GDP. This isn't a logical necessity; in principle we could consume so much more of everything else that the housing+healthcare share shrinks, but I think that would probably diverge from past trends (though I have not checked). What I actually expect is that as people get richer, they spend a larger fraction on things which have a high capacity to absorb marginal income, of which housing and healthcare are central examples.

If housing and healthcare aren't getting cheaper, and we're not spending a smaller fraction of income on them (by buying way way more of the things which are getting cheaper), then that puts a pretty stiff cap on how much GDP can grow.

Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps. I think Paul would probably update toward that view if he spent some time playing around with examples (similar to this post [LW · GW]).

Similarly, I've heard that during training of GPT-3, while aggregate performance improves smoothly, performance on any particular task (like e.g. addition) is usually pretty binary - i.e. performance on any particular task tends to jump quickly from near-zero to near-maximum-level. Assuming this is true, presumably Paul already knows about it, and would argue that what matters-for-impact is ability at lots of different tasks rather than one (or a few) particular tasks/kinds-of-tasks? If so, that opens up a different line of debate, about the extent to which individual humans' success today hinges on lots of different skills vs a few, and in which areas.

Replies from: rohinmshah, Eliezer_Yudkowsky, amc
comment by rohinmshah · 2021-11-30T10:13:28.568Z · LW(p) · GW(p)

The "continuous view" as I understand it doesn't predict that all straight lines always stay straight. My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).

In its application to AI, this is combined with a prediction that people will in fact be putting in lots of effort into making AI systems intelligent / powerful / able to automate AI R&D / etc, before AI has reached a point where it can execute a pivotal act. This second prediction comes for totally different reasons, like "look at what AI researchers are already trying to do" combined with "it doesn't seem like AI is anywhere near the point of executing a pivotal act yet".

(I think on Paul's view the second prediction is also bolstered by observing that most industries / things that had big economic impacts also seemed to have crappier predecessors. This feels intuitive to me but is not something I've checked and so isn't my personal main reason for believing the second prediction.)

One historical example immediately springs to mind where something-I'd-consider-a-Paul-esque-model utterly failed predictively: the breakdown of the Philips curve.

I'm not very familiar with this (I've only seen your discussion and the discussion in IEM) but it does not seem like the sort of thing where the argument I laid out above would have had a strong opinion. Was the y-axis of the straight line graph a metric that people were trying to optimize? If so, did the change in policy not represent a change in the amount of effort put into optimizing the metric? (I haven't looked at the details here, maybe the answer is yes to both, in which case I would be interested in looking at the details.)

Zooming out a meta-level, I think GDP is a particularly good example of a big aggregate metric which approximately-always looks smooth in hindsight, even when the underlying factors of interest undergo large jumps.

This seems plausible but it also seems like you can apply the above argument to a bunch of other topics besides GDP, like the ones listed in this comment [LW(p) · GW(p)], so it still seems like you should be able to exhibit a failure of the argument on those topics.

Replies from: johnswentworth, SDM
comment by johnswentworth · 2021-11-30T16:39:40.420Z · LW(p) · GW(p)

My version of it (which may or may not be Paul's version) predicts that in domains where people are putting in lots of effort to optimize a metric, that metric will grow relatively continuously. In other words, the more effort put in to optimize the metric, the more you can rely on straight lines for that metric staying straight (assuming that the trends in effort are also staying straight).

This is super helpful, thanks. Good explanation.

With this formulation of the "continuous view", I can immediately think of places where I'd bet against it. The first which springs to mind is aging: I'd bet that we'll see a discontinuous jump in achievable lifespan of mice. The gears here are nicely analogous to AGI too: I expect that [? · GW] there's a "common core" (or shared cause) underlying all the major diseases of aging, and fixing that core issue will fix all of them at once, in much the same way that figuring out the "core" of intelligence will lead to a big discontinuous jump in AI capabilities. I can also point to current empirical evidence for the existence of a common core in aging, which might suggest analogous types of evidence to look at in the intelligence context.

Thinking about other analogous places... presumably we saw a discontinuous jump in flight range when Sputnik entered orbit. That one seems extremely closely analogous to AGI. There it's less about the "common core" thing, and more about crossing some critical threshold. Nuclear weapons and superconductors both stand out a-priori as places where we'd expect a critical-threshold-related discontinuity, though I don't think people were optimizing hard enough in superconductor-esque directions for the continuous view to make a strong prediction there (at least for the original discovery of superconductors).

Replies from: rohinmshah, Vaniver
comment by rohinmshah · 2021-12-01T10:21:35.609Z · LW(p) · GW(p)

I agree that when you know about a critical threshold, as with nukes or orbits, you can and should predict a discontinuity there. (Sufficient specific knowledge is always going to allow you to outperform a general heuristic.) I think that (a) such thresholds are rare in general and (b) in AI in particular there is no such threshold. (According to me (b) seems like the biggest difference between Eliezer and Paul.)

Some thoughts on aging:

• It does in fact seem surprising, given the complexity of biology relative to physics, if there is a single core cause and core solution that leads to a discontinuity.
• I would a priori guess that there won't be a core solution. (A core cause seems more plausible, and I'll roll with it for now.) Instead, we see a sequence of solutions that intervene on the core problem in different ways, each of which leads to some improvement on lifespan, and discovering these at different times leads to a smoother graph.
• That being said, are people putting in a lot of effort into solving aging in mice? Everyone seems to constantly be saying that we're putting in almost no effort whatsoever. If that's true then a jumpy graph would be much less surprising.
• As a more specific scenario, it seems possible that the graph of mouse lifespan over time looks basically flat, because we were making no progress due to putting in ~no effort. I could totally believe in this world that someone puts in some effort and we get a discontinuity, or even that the near-zero effort we're putting in finds some intervention this year (but not in previous years) which then looks like a discontinuity.

If we had a good operationalization, and people are in fact putting in a lot of effort now, I could imagine putting my $100 to your$300 on this (not going beyond 1:3 odds simply because you know way more about aging than I do).

Replies from: johnswentworth
comment by johnswentworth · 2021-12-03T16:50:50.560Z · LW(p) · GW(p)

I'm not particularly enthusiastic about betting at 75%, that seems like it's already in the right ballpark for where the probability should be. So I guess we've successfully Aumann agreed on that particular prediction.

comment by Vaniver · 2021-11-30T17:44:11.264Z · LW(p) · GW(p)

presumably we saw a discontinuous jump in flight range when Sputnik entered orbit.

While I think orbit is the right sort of discontinuity for this, I think you need to specify 'flight range' in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke.

[But in terms of something like "altitude achieved", I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]

Replies from: johnswentworth
comment by johnswentworth · 2021-11-30T17:54:48.269Z · LW(p) · GW(p)

My understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.

Replies from: Vaniver
comment by Vaniver · 2021-11-30T18:07:34.965Z · LW(p) · GW(p)

So it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I'm not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.)

[And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France's distance instead, then the same strategic considerations would have been hit much sooner.]

comment by Sammy Martin (SDM) · 2021-11-30T17:18:50.819Z · LW(p) · GW(p)

One of the problems here is that, as well as disagreeing about underlying world models and about the likelihoods of some pre-AGI events, Paul and Eliezer often just make predictions about different things by default. But they do (and must, logically) predict some of the same world events differently.

My very rough model of how their beliefs flow forward is:

## Paul

Low initial confidence on truth/coherence of 'core of generality'

Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loosely analogous individual historical example. Natural selection wasn't intelligently aiming for powerful world-affecting capabilities, and so stumbled on them relatively suddenly with humans. Therefore, we learn very little about whether there will/won't be a spectrum of powerful intermediately general AIs from the historical case of evolution - all we know is that it didn't happen during evolution, and we've got good reasons to think it's a lot more likely to happen for AI. For other reasons (precedents already exist - MuZero is insect-brained but better at chess or go than a chimp, plus that's the default with technology we're heavily investing in), we should expect there will be powerful, intermediately general AIs by default (and our best guess of the timescale should be anchored to the speed of human-driven progress, since that's where it will start) - No core of generality

Then, from there:

No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class → Qualitative prediction of more common continuous progress on the 'intelligence' of narrow AI and prediction of continuous takeoff

## Eliezer

High initial confidence on truth/coherence of 'core of generality'

Even though there are some disanalogies between Evolution and AI progress, the exact details of how closely analogous the two situations are don't matter that much. Rather, we learn a generalizable fact about the overall cognitive landscape from human evolution - that there is a way to reach the core of generality quickly. This doesn't make it certain that AGI development will go the same way, but it's fairly strong evidence. The disanalogies between evolution and ML are indeed a slight update in Paul's direction and suggest that AI could in principle take a smoother route to general intelligence, but we've never historically seen this smoother route (and it has to be not just technically 'smooth' but sufficiently smooth to give us a full 4-year economic doubling) or these intermediate powerful agents, so this correction is weak compared to the broader knowledge we gain from evolution. In other words, all we know is that there is a fast route to the core of generality but that it's imaginable that there's a slow route we've not yet seen - Core of generality

Then, from there:

Core of generality and very common presence of huge secrets in relevant tech progress reference class → Qualitative prediction of less common continuous progress on the 'intelligence' of narrow AI and prediction of discontinuous takeoff

Eliezer doesn’t have especially divergent views about benchmarks like perplexity because he thinks they're not informative, but differs from Paul on qualitative predictions of how smoothly various practical capabilities/signs of 'intelligence' will emerge - he's getting his qualitative predictions about this ultimately from interrogating his 'cognitive landscape' abstraction, while Paul is getting his from trend extrapolation on measures of practical capabilities and then translating those to qualitative predictions. These are very different origins, but they do eventually give different predictions about the likelihood of the same real-world events.

Since they only reach the point of discussing the same things at a very vague, qualitative level of detail, in order to get to a bet you have to back-track from both of their qualitative predictions of how likely the sudden emergence of various types of narrow intelligent behaviour are, find some clear metric for the narrow intelligent behaviour that we can apply fairly, and then there should be a difference in beliefs about the world before AI takeoff.

Replies from: SDM
comment by Sammy Martin (SDM) · 2021-12-02T18:55:48.636Z · LW(p) · GW(p)

Updates on this after reflection and discussion (thanks to Rohin):

Human Evolution tells us very little about the 'cognitive landscape of all minds' (if that's even a coherent idea) - it's simply a loosely analogous individual historical example

Saying Paul's view is that the cognitive landscape of minds might be simply incoherent isn't quite right - at the very least you can talk about the distribution over programs implied by the random initialization of a neural network.

I could have just said 'Paul doesn't see this strong generality attractor in the cognitive landscape' but it seems to me that it's not just a disagreement about the abstraction, but that he trusts claims made on the basis of these sorts of abstractions less than Eliezer.

Also, on Paul's view, it's not that evolution is irrelevant as a counterexample. Rather, the specific fact of 'evolution gave us general intelligence suddenly by evolutionary timescales' is an unimportant surface fact, and the real truth about evolution is consistent with the continuous view.

No core of generality and extrapolation of quantitative metrics for things we care about and lack of common huge secrets in relevant tech progress reference class

These two initial claims are connected in a way I didn't make explicit - No core of generality and lack of common secrets in the reference class together imply that there are lots of paths to improving on practical metrics (not just those that give us generality), that we are putting in lots of effort into improving such metrics and that we tend to take the best ones first, so the metric improves continuously, and trend extrapolation will be especially correct.

Core of generality and very common presence of huge secrets in relevant tech progress reference class

The first clause already implies the second clause (since "how to get the core of generality" is itself a huge secret), but Eliezer seems to use non-intelligence related examples of sudden tech progress as evidence that huge secrets are common in tech progress in general, independent of the specific reason to think generality is one such secret.

## Nate's Summary [? · GW]

... Eliezer was saying something like "the fact that humans go around doing something vaguely like weighting outcomes by possibility and also by attractiveness, which they then roughly multiply, is quite sufficient evidence for my purposes, as one who does not pay tribute to the gods of modesty", while Richard protested something more like "but aren't you trying to use your concept to carry a whole lot more weight than that amount of evidence supports?"..

And, ofc, at this point, my Eliezer-model is again saying "This is why we should be discussing things concretely! It is quite telling that all the plans we can concretely visualize for saving our skins, are scary-adjacent; and all the non-scary plans, can't save our skins!"

Nate's summary brings up two points I more or less ignored in my summary because I wasn't sure what I thought - one is, just what role do the considerations about expected incompetent response/regulatory barriers/mistakes in choosing alignment strategies play? Are they necessary for a high likelihood of doom, or just peripheral assumptions? Clearly, you have to posit some level of "civilization fails to do the x-risk-minimizing thing" if you want to argue doom, but how extreme are the scenarios Eliezer is imagining where success is likely?

The other is the role that the modesty worldview plays in Eliezer's objections.

I feel confused/suspect we might have all lost track of what Modesty epistemology is supposed to consist of - I thought it was something like "overuse of the outside view, especially in a social cognition context".

Which of the following is:

a) probably the product of a Modesty world-view?

b) no good reason to think comes from a Modesty world-view but still bad epistemology?

c) good epistemology?

1. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious
2. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence
3. Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.
4. As a general matter, accepting that there are lots of cases of theories which are knowably true independent of any new testable predictions they make because of features of the theory. Things like the implication of general relativity from the equivalence principle, or the second law of thermodynamics from Noether’s theorem, or many-worlds from QM are real, but you’ll only believe you’ve found a case like this if you’re walked through to the conclusion [LW(p) · GW(p)], so you're sure that the underlying concepts are clear and applicable, or there’s already a scientific consensus behind it.
Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-12-03T02:17:04.006Z · LW(p) · GW(p)

Not believing theories which don’t make new testable predictions just because they retrodict lots of things in a way that the theories proponents claim is more natural, but that you don’t understand, because that seems generally suspicious

My Eliezer-model doesn't categorically object to this. See, e.g., Fake Causality [LW · GW]:

[Phlogiston] feels like an explanation. It’s represented using the same cognitive data format. But the human mind does not automatically detect when a cause has an unconstraining arrow to its effect. Worse, thanks to hindsight bias, it may feel like the cause constrains the effect, when it was merely fitted [? · GW] to the effect.

[...] Thanks to hindsight bias, it’s also not enough to check how well your theory “predicts” facts you already know. You’ve got to predict for tomorrow, not yesterday.

Nineteenth century evolutionism made no quantitative predictions. It was not readily subject to falsification. It was largely an explanation of what had already been seen. It lacked an underlying mechanism, as no one then knew about DNA. It even contradicted the nineteenth century laws of physics. Yet natural selection was such an amazingly good post facto explanation that people flocked to it, and they turned out to be right. Science, as a human endeavor, requires advance prediction. Probability theory, as math, does not distinguish between post facto and advance prediction, because probability theory assumes that probability distributions are fixed properties of a hypothesis.

The rule about advance prediction is a rule of the social process of science—a moral custom and not a theorem. The moral custom exists to prevent human beings from making human mistakes that are hard to even describe in the language of probability theory, like tinkering after the fact with what you claim your hypothesis predicts. People concluded that nineteenth century evolutionism was an excellent explanation, even if it was post facto. That reasoning was correct as probability theory, which is why it worked despite all scientific sins. Probability theory is math. The social process of science is a set of legal conventions to keep people from cheating on the math.

Yet it is also true that, compared to a modern-day evolutionary theorist, evolutionary theorists of the late nineteenth and early twentieth century often went sadly astray. Darwin, who was bright enough to invent the theory, got an amazing amount right. But Darwin’s successors, who were only bright enough to accept the theory, misunderstood evolution frequently and seriously. The usual process of science was then required to correct their mistakes.

My Eliezer-model does object to things like 'since I (from my position as someone who doesn't understand the model) find the retrodictions and obvious-seeming predictions suspicious, you should share my worry and have relatively low confidence in the model's applicability'. Or 'since the case for this model's applicability isn't iron-clad, you should sprinkle in a lot more expressions of verbal doubt'. My Eliezer-model views these as isolated demands for rigor, or as isolated demands for social meekness.

Part of his general anti-modesty and pro-Thielian-secrets view is that it's very possible for other people to know things that justifiably make them much more confident than you are. So if you can't pass the other person's ITT / you don't understand how they're arriving at their conclusion (and you have no principled reason to think they can't have a good model here), then you should be a lot more wary of inferring from their confidence that they're biased.

Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because you don’t trust your own assessments of naturalness that much in the absence of discriminating evidence

My Eliezer-model thinks it's possible to be so bad at scientific reasoning that you need to be hit over the head with lots of advance predictive successes in order to justifiably trust a model. But my Eliezer-model thinks people like Richard are way better than that, and are (for modesty-ish reasons) overly distrusting their ability to do inside-view reasoning, and (as a consequence) aren't building up their inside-view-reasoning skills nearly as much as they could. (At least in domains like AGI, where you stand to look a lot sillier to others if you go around expressing confident inside-view models that others don't share.)

Not believing theories which don’t make new testable predictions just because they retrodict lots of things in the world naturally (in a way you sort of get intuitively), because most powerful theories which cause conceptual revolutions also make new testable predictions, so it’s a bad sign if the newly proposed theory doesn’t.

My Eliezer-model thinks this is correct as stated, but thinks this is a claim that applies to things like Newtonian gravity and not to things like probability theory [LW(p) · GW(p)]. (He's also suspicious that modest-epistemology pressures have something to do with this being non-obvious — e.g., because modesty discourages you from trusting your own internal understanding of things like probability theory, and instead encourages you to look at external public signs of probability theory's impressiveness, of a sort that could be egalitarianly accepted even by people who don't understand probability theory.)

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T22:45:42.059Z · LW(p) · GW(p)

I don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level.  They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that.  As for their ability to then make algorithmic progress, depends on how good their researchers are, I expect; most algorithmic tricks you try in ML won't work, but maybe they've got enough people trying things to find some?  But it's hard to outpace a field that way without supergeniuses, and the modern world has forgotten how to rear those.

Replies from: Lanrian
comment by Lanrian · 2021-11-25T23:10:26.977Z · LW(p) · GW(p)

While GPT-4 wouldn't be a lot bigger than GPT-3, Sam Altman did indicate that it'd use a lot more compute. That's consistent with Stack More Layers still working; they might just have found an even better use for compute.

(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T23:39:41.658Z · LW(p) · GW(p)

If they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.

Replies from: calef, RomanS
comment by calef · 2021-11-26T07:36:25.040Z · LW(p) · GW(p)

I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.

Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly

Replies from: Lanrian
comment by Lanrian · 2021-11-26T09:54:48.255Z · LW(p) · GW(p)

Say more about the source for this claim? I'm pretty sure he didn't say that during the Q&A I'm sourcing my info from. And my impression is that they're doing something more than this, both on priors (scaling laws says that optimal compute usage means you shouldn't train to convergence — why would they start now?) and based on what he said during that Q&A.

Replies from: calef
comment by calef · 2021-11-26T19:41:06.524Z · LW(p) · GW(p)

This is based on:

1. The Q&A you mention
2. GPT-3 not being trained on even one pass of its training dataset
3. “Use way more compute” achieving outsized gains by training longer than by most other architectural modifications for a fixed model size (while you’re correct that bigger model = faster training, you’re trading off against ease of deployment, and models much bigger than GPT-3 become increasingly difficult to serve at prod. Plus, we know it’s about the same size, from the Q&A)
4. Some experience with undertrained enormous language models underperforming relative to expectation

This is not to say that GPT-4 wont have architectural changes. Sam mentioned a longer context at the least. But these sorts of architectural changes probably qualify as “small” in the parlance of the above conversation.

Replies from: Lanrian
comment by Lanrian · 2021-11-26T20:04:53.553Z · LW(p) · GW(p)

To be clear: Do you remember Sam Altman saying that "they’re simply training a GPT-3-variant for significantly longer", or is that an inference from ~"it will use a lot more compute" and ~"it will not be much bigger"?

Because if you remember him saying that, then that contradicts my memory (and, uh, the notes that people took that I remember reading), and I'm confused.

While if it's an inference: sure, that's a non-crazy guess, and I take your point that smaller models are easier to deploy. I just want it to be flagged as a claimed deduction, not as a remembered statement.

(And I maintain my impression that something more is going on; especially since I remember Sam generally talking about how models might use more test-time compute in the future, and be able to think for longer on harder questions.)

Replies from: calef
comment by calef · 2021-11-26T20:10:20.126Z · LW(p) · GW(p)

Honestly, at this point, I don’t remember if it’s inferred or primary-sourced. Edited the above for clarity.

comment by RomanS · 2021-11-26T15:10:48.665Z · LW(p) · GW(p)

One way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota).

From the minimizing-X-risk perspective, it might be the worst possible way to train AIs.

As Jeff Clune (Uber AI) put it:

[O]ne can imagine that some ways of configuring AI-GAs (i.e. ways of incentivizing progress) that would make AI-GAs more likely to succeed in producing general AI also make their value systems more dangerous. For example, some researchers might try to replicate a basic principle of Darwinian evolution: that it is ‘red in tooth and claw.’

If a researcher tried to catalyze the creation of an AI-GA by creating conditions similar to those on Earth, the results might be similar. We might thus produce an AI with human vices, such as violence, hatred, jealousy, deception, cunning, or worse, simply because those attributes make an AI more likely to survive and succeed in a particular type of competitive simulated world. Note that one might create such an unsavory AI unintentionally by not realizing that the incentive structure they defined encourages such behavior.

Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.

comment by amc · 2021-11-27T23:03:13.362Z · LW(p) · GW(p)

I was under the impression that GPT-4 would be gigantic, according to this quote from this Wired article:

“From talking to OpenAI, GPT-4 will be about 100 trillion parameters,” Feldman says. “That won’t be ready for several years.”

Replies from: Lanrian
comment by Lanrian · 2021-11-28T13:16:29.046Z · LW(p) · GW(p)

comment by Matthew Barnett (matthew-barnett) · 2021-11-25T21:31:49.676Z · LW(p) · GW(p)

superforecasters were claiming that AlphaGo had a 20% chance of beating Lee Se-dol and I didn't disagree with that at the time

Good Judgment Open had the probability at 65% on March 8th 2016, with a generally stable forecast since early February (Wikipedia says that the first match was on March 9th).

Metaculus had the probability at 64% with similar stability over time. Of course, there might be another source that Eliezer is referring to, but for now I think it's right to flag this statement as false.

Replies from: matthew-barnett, Eliezer_Yudkowsky
comment by Matthew Barnett (matthew-barnett) · 2021-11-25T21:59:17.240Z · LW(p) · GW(p)

A note I want to add, if this fact-check ends up being valid:

It appears that a significant fraction of Eliezer's argument relies on AlphaGo being surprising. But then his evidence for it being surprising seems to rest substantially on something that was misremembered. That seems important if true.

I would point to, for example, this quote, "I mean the superforecasters did already suck once in my observation, which was AlphaGo, but I did not bet against them there, I bet with them and then updated afterwards." It seems like the lesson here, if indeed superforecasters got AlphaGo right and Eliezer got it wrong, is that we should update a little bit towards superforecasting, and against Eliezer.

Replies from: Benito
comment by Ben Pace (Benito) · 2021-11-25T22:07:16.121Z · LW(p) · GW(p)

Adding my recollection of that period: some people made the relevant updates when DeepMind's system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going "Oh huh, I think this is going to happen" and then when AlphaGo beat Lee Sedol (in March 2016) everyone said "Now it is happening".

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2021-11-25T22:27:20.470Z · LW(p) · GW(p)

It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).

It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability.

Perhaps a better source of evidence of AlphaGo's surprisingness comes from Nick Bostrom's 2014 book Superintelligence in which he says, "Go-playing amateur programs have been improving at a rate of about 1 level dan/year in recent years. If this rate of improvement continues, they might beat the human world champion in about a decade." (Chapter 1).

This vindicates AlphaGo being an impressive discontinuity from pre-2015 progress. Though one can reasonably dispute whether superforecasters thought that the milestone was still far away after being told that Google and Facebook made big investments into it (as was the case in late 2015).

Replies from: Benito
comment by Ben Pace (Benito) · 2021-11-25T22:51:32.596Z · LW(p) · GW(p)

Wow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote.

Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]

Replies from: greg-colbourn
comment by Greg C (greg-colbourn) · 2021-12-03T12:18:31.365Z · LW(p) · GW(p)

There was still a big update from ~20%->90%, which is what is relevant for Eliezer's argument, even if he misremembered the timing. The fact that the update was from the Fan Hui match rather than the Lee Sedol match doesn't seem that important to the argument [for superforecasters being caught flatfooted by discontinuous AI-Go progress].

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T22:37:31.386Z · LW(p) · GW(p)

My memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was.

Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%.  Here's an example of one such, which I have a potentially false memory of having maybe read at the time: https://www.gjopen.com/comments/118530

Replies from: matthew-barnett
comment by Matthew Barnett (matthew-barnett) · 2021-11-25T22:44:36.454Z · LW(p) · GW(p)

Thanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/is a way to restrict to "superforecasters" in the UI).

I will add the following piece of evidence, which I don't think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question, a relatively high fraction of people put a 20% chance, but it also looks like over 80% of forecasters put higher credences.

comment by landfish (jeff-ladish) · 2021-11-29T21:49:20.949Z · LW(p) · GW(p)

After reading these two Eliezer <> Paul discussions, I realize I'm confused about what the importance of their disagreement is.

It's very clear to me why Richard & Eliezer's disagreement is important. Alignment being extremely hard suggests AI companies should work a lot harder to avoid accidentally destroying the world, and suggests alignment researchers should be wary of easy-seeming alignment approaches.

But it seems like Paul & Eliezer basically agree about all of that. They disagree about... what the world looks like shortly before the end? Which, sure, does have some strategic implications. You might be able to make a ton of money by betting on AI companies and thus have a lot of power in the few years before the world drastically changes. That does seem important, but it doesn't seem nearly as important as the difficulty of alignment.

I wonder if there are other things Paul & Eliezer disagree about that are more important. Or if I'm underrating the importance of the ways they disagree here. Paul wants Eliezer to bet on things so Paul can have a chance to update to his view in the future if things end up being really different than he thinks. Okay, but what will he do differently in those worlds? Imo he'd just be doing the same things he's trying now if Eliezer was right. And maybe there is something implicit in Paul's "smooth line" forecasting beliefs that makes his prosaic alignment strategy more likely to work in world's where he's right, but I currently don't see it.

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-12-01T20:24:42.739Z · LW(p) · GW(p)

I would frame the question more as 'Is this question important for the entire chain of actions humanity needs to select in order to steer to good outcomes?', rather than 'Is there a specific thing Paul or Eliezer personally should do differently tomorrow if they update to the other's view?' (though the latter is an interesting question too).

Some implications of having a more Eliezer-ish view include:

• In the Eliezer-world, humanity's task is more foresight-loaded. You don't get a long period of time in advance of AGI where the path to AGI is clear; nor do you get a long period of time of working with proto-AGI or weak AGI where we can safely learn all the relevant principles and meta-principles via trial and error. You need to see far more of the bullets coming in advance of the experiment, which means developing more of the technical knowledge to exercise that kind of foresight, and also developing more of the base skills of thinking well about AGI even where our technical models and our data are both thin.
• My Paul-model says: 'Humans are just really bad at foresight, and it seems like AI just isn't very amenable to understanding; so we're forced to rely mostly on surface trends and empirical feedback loops. Fortunately, AGI itself is pretty simple and obvious (just keep scaling stuff similar to GPT-3 and you're done), and progress is likely to be relatively slow and gradual, so surface trends will be a great guide and empirical feedback loops will be abundant.'
• My Eliezer-model says: 'AI foresight may be hard, but it seems overwhelmingly necessary; either we see the bullets coming in advance, or we die. So we need to try to master foresight, even though we can't be sure [LW · GW] of success in advance. In the end, this is a novel domain, and humanity hasn't put much effort into developing good foresight here; it would be foolish to despair before we've made a proper attempt. We need to try to overcome important biases, think like reality [LW · GW], and become capable of good inside-view reasoning about AGI. We need to hone and refine our gut-level pattern-matching, as well as our explicit models of AGI, as well as the metacognition that helps us improve the former capacities.'
• In the Eliezer-world, small actors matter more in expectation; there's no guarantee that the largest and most well-established ML groups will get to AGI first. Governments in particular matter less in expectation.
• In the Eliezer-world, single organizations matter more: there's more potential for a single group to have a lead, and for other groups to be passive or oblivious. This means that you can get more bang for your buck by figuring out how to make a really excellent organization full of excellent people; and you get comparatively less bang for your buck from improving relations between organizations, between governments, etc.
• The Eliezer-world is less adequate overall, and also has more capabilities (and alignment) secrets.
• So, e.g., research closure matters more — both because more secrets [LW · GW] exist, and because it's less likely that there will be multiple independent discoveries of any given secret at around the same time.
• Also, if your background view of the world is more adequate, you should be less worried about alignment (both out of deference to the ML mainstream that is at least moderately less worried about alignment; and out of expectation that the ML mainstream will update and change course as needed).
• Relatedly, in Eliezer-world you have to do more work to actively recruit the world's clearest and best thinkers to helping solve alignment. In Paul-world, you can rely more on future AI progress, warning shots, etc. to naturally grow the alignment field.
• In the Eliezer-world, timelines are both shorter and less predictable. There's more potential for AGI to be early-paradigm rather than late-paradigm; and even if it's late-paradigm, it may be late into a paradigm that doesn't look very much like GPT-3 or other circa-2021 systems.
• In the Eliezer-world, there are many different paths to AGI, and it may be key to humanity's survival that we pick a relatively good path years in advance, and deliberately steer toward more alignable approaches to AGI. In the Paul-world, there's one path to AGI, and it's big and obvious.
comment by landfish (jeff-ladish) · 2021-12-01T20:47:56.788Z · LW(p) · GW(p)

Thanks this is helpful! I'd be very curious to see where Paul agreed / disagree with the summary / implications of his view here.

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-12-01T21:23:38.871Z · LW(p) · GW(p)

(I'll emphasize again, by the way, that this is a relative comparison of my model of Paul vs. Eliezer. If Paul and Eliezer's views on some topic are pretty close in absolute terms, the above might misleadingly suggest more disagreement than there in fact is.)

comment by Rob Bensinger (RobbBB) · 2021-11-28T04:50:56.366Z · LW(p) · GW(p)

Transcript error fixed -- the line that previously read

should be

comment by Vanessa Kosoy (vanessa-kosoy) · 2021-12-01T19:47:48.231Z · LW(p) · GW(p)

Christiano predicts progress will be (approximately) a smooth curve, whereas Yudkowsky predicts there will be discontinuous-ish "jumps", but there's another thing that can happen that both of them seem to dismiss: progress hitting a major obstacle and plateauing for a while (i.e. the progress curve looking locally like a sigmoid). I guess that the reason they dismiss it is related to this quote [AF · GW] by Soares:

I observe that, 15 years ago, everyone was saying AGI is far off because of what it couldn't do -- basic image recognition, go, starcraft, winograd schemas, programmer assistance. But basically all that has fallen. The gap between us and AGI is made mostly of intangibles.

However, I think this is not entirely accurate. Some games are still unsolved without "cheating", where by cheating I mean using human demonstrations or handcrafted rewards, and that includes Montezuma's Revenge, StarCraft II and Dota 2 (and Dota 2 with unlimited hero selection is even more unsolved). Moreover, we haven't seen RL show superhuman performance on any task in which the environment is substantially more complex than the agent in important ways (this rules out all video games, unless if winning the game requires a good theory of mind of your opponents[1], which is arguably never the case for zero-sum two-player games). Language models made impressive progress, but I don't think they are superhuman along any interesting dimension. Classifiers still struggle with adversarial examples (although, this is not necessarily an important limitation, maybe humans have "adversarial examples" too).

So, it is certainly possible that it's a "clear runway" from here to superintelligence. But I don't think it's obvious.

1. I know there are strong poker AIs, but I suspect they win via something other than theory of mind. Maybe someone who knows the topic can comment. ↩︎

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-12-01T20:09:52.244Z · LW(p) · GW(p)

My Eliezer-model is a lot less surprised by lulls than my Paul-model (because we're missing key insights for AGI, progress on insights is jumpy and hard to predict, the future is generally very unpredictable, etc.). I don't know exactly how large of a lull or winter would start to surprise Eliezer (or how much that surprise would change if the lull is occurring two years from now, vs. ten years from now, for example).

In Yudkowsky and Christiano Discuss "Takeoff Speeds" [LW · GW], Eliezer says:

I have a rough intuitive feeling that it [AI progress] was going faster in 2015-2017 than 2018-2020.

So in that sense Eliezer thinks we're already in a slowdown to some degree (as of 2020), though I gather you're talking about a much larger and more long-lasting slowdown.

Replies from: paulfchristiano, vanessa-kosoy
comment by paulfchristiano · 2021-12-02T06:26:23.972Z · LW(p) · GW(p)

I generally expect smoother progress, but predictions about lulls are probably dominated by Eliezer's shorter timelines. Also lulls are generally easier than spurts, e.g. I think that if you just slow investment growth you get a lull and that's not too unlikely (whereas part of why it's hard to get a spurt is that investment rises to levels where you can't rapidly grow it further).

comment by Vanessa Kosoy (vanessa-kosoy) · 2021-12-01T21:37:10.466Z · LW(p) · GW(p)

Makes some sense, but Yudkowsky's prediction that TAI will arrive before AI has large economic impact does forbid a lot of plateau scenarios. Given a plateau that's sufficiently high and sufficiently long, AI will land in the market, I think. Even if regulatory hurdles are the bottleneck for a lot of things atm, eventually in some country AI will become important and the others will have to follow or fall behind.

comment by RomanS · 2021-11-25T21:08:43.960Z · LW(p) · GW(p)

why aren't elephants GI?

As Herculano-Houzel called it, the human brain is a remarkable, yet not extraordinary, scaled-up primate brain. It seems that our main advantage in hardware is quantitative: more cortical columns to process more reference frames to predict more stuff.

And the primate brain is mostly the same as of other mammals (which shouldn't be surprising, as the source code is mostly the same).

And the intelligence of mammals seems to be rather general. It allows them to solve a highly diverse set of cognitive tasks, including the task of learning to navigate at the Level 5 autonomy in novel environments (which is still too hard for the most general of our AIs).

One may ask: why aren't elephants making rockets and computers yet?

But one may ask the same question about any uncontacted human tribe.

Thus, it seems to me that the "elephants are not GI" part of the argument is incorrect. Elephants (and also chimps, dolphins etc) seem to possess a rather general but computationally capped intelligence.

Replies from: Eliezer_Yudkowsky, RobbBB, amc
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T21:14:57.682Z · LW(p) · GW(p)

Somebody tries to measure the human brain using instruments that can only detect numbers of neurons and energy expenditure, but not detect any difference of how the fine circuitry is wired; and concludes the human brain is remarkable only in its size and not in its algorithms.  You see the problem here?  The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute (namely: poorly), while measuring the number of neurons in a human brain tells you nothing about that at all.

Replies from: RomanS
comment by RomanS · 2021-11-25T22:11:34.549Z · LW(p) · GW(p)

Jeff Hawkins provided a rather interesting argument on the topic:

The scaling of the human brain has happened too fast to implement any deep changes in how the circuitry works. The entire scaling process was mostly done by the favorite trick of biological evolution: copy and paste existing units (in this case - cortical columns).

Jeff argues that there is no change in the basic algorithm between earlier primates and humans. It's the same reference-frames processing algo distributed across columns. The main difference is, humans have much more columns.

I've found his arguments convincing for two reasons:

• his neurobiological arguments are surprisingly good (to the point of being surprisingly obvious in hindsight)
• It's the same "just add more layers" trick we reinvented in ML

The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute

Are we sure about the low intelligence of dinosaurs?

Judging by the living dinos (e.g. crows), they are able to pack a chimp-like intelligence into a 0.016 kg brain.

And some of the dinos have had x60 more of it (e.g. the brain of Tyrannosaurus rex weighted about 1 kg, which is comparable to Homo erectus).

And some of the dinos have had a surprisingly large encephalization quotient, combined with bipedalism, gripping hands, forward-facing eyes, omnivorism, nest building, parental care, and living in groups (e.g. troodontids).

Maybe it was not an asteroid after all...

(Very unlikely, of course. But I find the idea rather amusing)

comment by Rob Bensinger (RobbBB) · 2021-11-25T23:37:11.387Z · LW(p) · GW(p)

One may ask: why aren't elephants making rockets and computers yet?

But one may ask the same question about any uncontacted human tribe.

Seems more surprising for elephants, by default: elephants have apparently had similarly large brains for about 20 million years, which is far more time than uncontacted human tribes have had to build rockets. (~100x as long as anatomically modern humans have existed at all, for example.)

Replies from: RomanS
comment by RomanS · 2021-11-26T07:33:47.631Z · LW(p) · GW(p)

I agree. Additionally, the life expectancy of elephants is significantly higher than of paleolithic humans (1, 2). Thus, individual elephants have much more time to learn stuff.

In humans, technological progress is not a given. Across different populations, it seems to be determined by the local culture, and not by neurobiological differences. For example, the ancestors of Wernher von Braun have left their technological local minimum thousands of years later than Egyptians or Chinese. And the ancestors of Sergei Korolev lived their primitive lives well into the 8th century C.E. If a Han dynasty scholar had visited the Germanic and Slavic tribes, he would've described them as hopeless barbarians, perhaps even as inherently predisposed to barbarism.

Maybe if we give elephants more time, they will overcome their biological limitations (limited speech, limited "hand", fewer neurons in neocortex etc), and will escape the local minimum. But maybe not.

comment by amc · 2021-11-27T23:27:34.034Z · LW(p) · GW(p)

I think Herculano-Houzel would want to mention that humans have 3x (iirc) more neurons in their cerebral cortex than even the elephant species with the biggest brains. Those elephants have more total neurons because their cerebellar cortices have like 200 billion neurons. Humans have more cortical neurons than any animal, including blue whales, because neuron sizes scale differently for different Orders and primates specifically scale well.

Crucially, people have thought human brains were special among primates but she makes the point that it's the other great apes that are special in having smaller brains according to primate brain scaling laws. This is because humans either had a unique incentive to keep up with the costs of scaling or because they had a unique ability to keep up with the costs (due to e.g. cooking).

Having better algorithms that could take advantage of scale fits with her views, I think.

comment by Zach Stein-Perlman · 2021-11-25T19:00:04.625Z · LW(p) · GW(p)

since you disagree with them eventually, e.g. >2/3 doom by 2030

This apparently refers to Yudkowsky's credences, and I notice I am surprised — has Yudkowsky said this somewhere? (Edit: the answer is no, thanks for responses.)

Replies from: RobbBB
comment by Rob Bensinger (RobbBB) · 2021-11-25T19:38:44.338Z · LW(p) · GW(p)

I think Ajeya is inferring this from Eliezer's 2017 bet with Bryan Caplan. The bet was jokey and therefore (IMO) doesn't deserve much weight, though Eliezer comments that it's maybe not totally unrelated to timelines he'd reflectively endorse:

[T]he generator of this bet does not necessarily represent a strong epistemic stance on my part, which seems important to emphasize. But I suppose one might draw conclusions from the fact that, when I was humorously imagining what sort of benefit I could get from exploiting this amazing phenomenon, my System 1 thought that having the world not end before 2030 seemed like the most I could reasonably ask.

In general, my (maybe-partly-mistaken) Eliezer-model...

• thinks he knows very little about timelines (per the qualitative reasoning in There's No Fire Alarm For AGI and in Nate's recent post [LW(p) · GW(p)] -- though not necessarily endorsing Nate's quantitative probabilities);
• and is wary of trying to turn 'I don't know' into a solid, stable number for this kind of question (cf. When (Not) To Use Probabilities [LW · GW]);
• but recognizes that his behavior at any given time, insofar as it is coherent, must reflect some implicit probabilities. Quoting Eliezer back in 2016:

[... T]imelines are the hardest part of AGI issues to forecast, by which I mean that if you ask me for a specific year, I throw up my hands and say “Not only do I not know, I make the much stronger statement that nobody else has good knowledge either.” Fermi said that positive-net-energy from nuclear power wouldn’t be possible for 50 years, two years before he oversaw the construction of the first pile of uranium bricks to go critical. The way these things work is that they look fifty years off to the slightly skeptical, and ten years later, they still look fifty years off, and then suddenly there’s a breakthrough and they look five years off, at which point they’re actually 2 to 20 years off.

If you hold a gun to my head and say “Infer your probability distribution from your own actions, you self-proclaimed Bayesian” then I think I seem to be planning for a time horizon between 8 and 40 years, but some of that because there’s very little I think I can do in less than 8 years, and, you know, if it takes longer than 40 years there’ll probably be some replanning to do anyway over that time period.

And then how *long* takeoff takes past that point is a separate issue, one that doesn’t correlate all that much to how long it took to start takeoff. [...]

Replies from: Eliezer_Yudkowsky, sil-ver
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-25T20:40:24.735Z · LW(p) · GW(p)

Furthermore 2/3 doom is straightforwardly the wrong thing to infer from the 1:1 betting odds, even taking those at face value and even before taking interest rates into account; Bryan gave me $100 which gets returned as$200 later.

(I do consider this a noteworthy example of 'People seem systematically to make the mistake in the direction that interprets Eliezer's stuff as more weird and extreme' because it's a clear arithmetical error and because I saw a recorded transcript of it apparently passing the notice of several people I considered usually epistemically strong.)

(Though it's also easier than people expect to just not notice things; I didn't realize at the time that Ajeya was talking about a misinterpretation of the implied odds from the Caplan bet, and thought she was just guessing my own odds at 2/3, and I didn't want to argue about that because I don't think it valuable to the world or maybe even to myself to go about arguing those exact numbers.)

Replies from: ajeya-cotra, RobbBB
comment by Ajeya Cotra (ajeya-cotra) · 2021-11-25T21:44:53.649Z · LW(p) · GW(p)

Yes, Rob is right about the inference coming from the bet and Eliezer is right that the bet was actually 1:1 odds but due to the somewhat unusual bet format I misread it as 2:1 odds.

comment by Rob Bensinger (RobbBB) · 2021-11-25T21:03:20.108Z · LW(p) · GW(p)

Maybe I'm wrong about her deriving this from the Caplan bet? Ajeya hasn't actually confirmed that, it was just an inference I drew. I'll poke her to double-check.

comment by Rafael Harth (sil-ver) · 2021-11-25T22:48:34.720Z · LW(p) · GW(p)

I think the bet is a bad idea if you think in terms of Many Worlds. Say 55% of all worlds end by 2030. Then, even assuming that value-of-$-in-2017 = value-of-$-in-2030, Eliezer personally benefited from the bet. However, the epistemic result is Bryan getting prestige points in 45% of worlds, Eliezer getting prestige points in 0% of worlds.

The other problem with the bet is that, if we adjust for inflation and returns of money, the bet is positive EV for Eliezer even given P(world-ends-by-2030) << .

comment by paulfchristiano · 2021-11-26T07:57:11.897Z · LW(p) · GW(p)

(ETA: this wasn't actually in this log but in a future part of the discussion.)

I found the elephants part of this discussion surprising. It looks to me like human brains are better than elephant brains at most things, and it's interesting to me that Eliezer thought otherwise. This is one of the main places where I couldn't predict what he would say.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2021-11-26T09:01:31.822Z · LW(p) · GW(p)

I also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?

Replies from: paulfchristiano
comment by paulfchristiano · 2021-11-26T16:45:25.007Z · LW(p) · GW(p)

Oops, this was in reference to the later part of the discussion where you disagreed with "a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]".

comment by bfinn · 2021-12-01T16:05:26.204Z · LW(p) · GW(p)

On a detail:

what would the chess graph look like if it was measuring pawn handicaps?

I figured out from a paper a while back (sorry, can't recall where!) that 1 pawn = 100 elo points, at least at high levels of play. Grandmaster Larry Kaufman suggests the elo value e.g. of a knight handicap varies with the playing level:

https://en.wikipedia.org/wiki/Handicap_(chess)#Rating_equivalent

comment by ADifferentAnonymous · 2021-11-28T19:51:28.754Z · LW(p) · GW(p)

An interesting parallel might be a parallel Earth making nanotechnology breakthroughs instead of AI breakthroughs, such that it's apparent they'll be capable of creating gray goo and not apparent they'll be able to avoid creating gray goo.

I guess a slow takeoff could be if, like, the first self-replicators took a day to double, so if somebody accidentally made a gram of gray goo you'd have weeks to figure it out and nuke the lab or whatever, but self-replication speed went down as technology improved, and so accidental unconstrained replicators happened periodically but could be contained until one couldn't be.

Whereas hard takeoff could be if you had nanobots that built stuff in seconds but couldn't self-replicate using random environmental mass, and then the first nanobot that can do that, can do it in seconds and eats the planet.

Should we consider the second scenario less likely because of smooth trend lines? Does Paul think we should? (I'm pretty sure Eliezer thinks that Paul thinks we should)

comment by tailcalled · 2021-11-26T08:59:30.374Z · LW(p) · GW(p)

I don't know much about chess, so maybe this is wrong, but I would tend to think of Elo ratings as being more like a logarithmic scale of ability than like a linear scale of ability. In the sense that e.g. probability of winning changes exponentially with Elo difference, so a linear trend on an Elo graph translates to an exponential trend in competitiveness. "The chances of an AI solving the tasks better than a human are increasing exponentially" sounds more like fast takeoff than slow takeoff to me.

Replies from: Lanrian
comment by Lanrian · 2021-11-26T09:58:34.428Z · LW(p) · GW(p)

I think everyone in the discussion expects AI progress to be at least exponentially fast. See all of Paul's mention of hyperbolic growth — that's faster than an exponential.

The discussion is more about continuous vs discontinuous takeoff, or centralised vs decentralised takeoff. (The slow/fast terminology isn't great.)

comment by Greg C (greg-colbourn) · 2021-12-03T12:26:36.972Z · LW(p) · GW(p)

Curious about Eliezer's and Paul's takes on the Netflix series neXt as a plausible future scenario. My guess:

too Eliezer-ish for Paul; too Paul-ish for Eliezer.

comment by ifalpha · 2021-11-27T00:53:36.383Z · LW(p) · GW(p)

Eliezer should have taken Cotra up on that bet about "will someone train a 10T param model before end days" considering one already exists.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-11-27T20:18:56.968Z · LW(p) · GW(p)

Is that one dense or sparse/MoE? How many data points was it trained for? Does it set SOTA on anything? (I'm skeptical; I'm wondering if they only trained it for a tiny amount, for example.)