Posts

Comments

Comment by deep on ryan_greenblatt's Shortform · 2025-01-25T07:01:56.226Z · LW · GW

Neat, thanks a ton for the algorithmic-vs-labor update -- I appreciated that you'd distinguished those in your post, but I forgot to carry that through in mine! :) 

And oops, I really don't know how I got to 1.6 instead of 1.5 there. Thanks for the flag, have updated my comment accordingly! 

The square relationship idea is interesting -- that factor of 2 is a huge deal. Would be neat to see a Guesstimate or Squiggle version of this calculation that tries to account for the various nuances Tom mentions, and has error bars on each of the terms, so we both get a distribution of r and a sensitivity analysis. (Maybe @Tom Davidson already has this somewhere? If not I might try to make a crappy version myself, or poke talented folks I know to do a good version :) 

Comment by deep on ryan_greenblatt's Shortform · 2025-01-25T06:57:52.832Z · LW · GW

Really appreciate you covering all these nuances, thanks Tom!

Can you give a pointer to the studies you mentioned here?

There are various sources of evidence on how much capabilities improve every time training efficiency doubles: toy ML experiments suggest the answer is ~1.7; human productivity studies suggest the answer is ~2.5. I put more weight on the former, so I’ll estimate 2. This doubles my median estimate to r = ~2.8 (= 1.4 * 2).

Comment by deep on ryan_greenblatt's Shortform · 2025-01-22T20:20:35.435Z · LW · GW

Hey Ryan! Thanks for writing this up -- I think this whole topic is important and interesting.

I was confused about how your analysis related to the Epoch paper, so I spent a while with Claude analyzing it. I did a re-analysis that finds similar results, but also finds (I think) some flaws in your rough estimate. (Keep in mind I'm not an expert myself, and I haven't closely read the Epoch paper, so I might well be making conceptual errors. I think the math is right though!)

I'll walk through my understanding of this stuff first, then compare to your post. I'll be going a little slowly (A) to help myself refresh myself via referencing this later, (B) to make it easy to call out mistakes, and (C) to hopefully make this legible to others who want to follow along.

Using Ryan's empirical estimates in the Epoch model

The Epoch model

The Epoch paper models growth with the following equation:
1. 

where A = efficiency and E = research input. We want to consider worlds with a potential software takeoff, meaning that increases in AI efficiency directly feed into research input, which we model as . So the key consideration seems to be the ratio . If it's 1, we get steady exponential growth from scaling inputs; greater, superexponential; smaller, subexponential.[1]
 

Fitting the model 
How can we learn about this ratio from historical data? 

Let's pretend history has been convenient and we've seen steady exponential growth in both variables, so  and . Then has been constant over time, so by equation 1,  has been constant as well. Substituting in for A and E, we find that  is constant over time, which is only possible if  and the exponent is always zero. Thus if we've seen steady exponential growth, the historical value of our key ratio is: 

2. 

Intuitively, if we've seen steady exponential growth while research input has increased more slowly than research output (AI efficiency), there are superlinear returns to scaling inputs. 

Introducing the Cobb-Douglas function

But wait! , research input, is an abstraction that we can't directly measure. Really there's both compute and labor inputs. Those have indeed been growing roughly exponentially, but at different rates. 

Intuitively, it makes sense to say that "effective research input" has grown as some kind of weighted average of the rate of compute and labor input growth. This is my take on why a Cobb-Douglas function of form (3) , with a weight parameter , is useful here: it's a weighted geometric average of the two inputs, so its growth rate is a weighted average of their growth rates. 

Writing that out: in general, say both inputs have grown exponentially, so  and . Then E has grown as  , so  is the weighted average (4)  of the growth rates of labor and capital. 

Then, using Equation 2, we can estimate our key ratio  as .

Let's get empirical! 

Plugging in your estimates: 

  • Historical compute scaling of 4x/year gives ;
  • Historical labor scaling of 1.6x gives ;
  • Historical compute elasticity on research outputs of 0.4 gives ;
  • Adding these together, .[2] 
  • Historical efficiency improvement of 3.5x/year gives .
  • So   [3]

Adjusting for labor-only scaling

But wait: we're not done yet! Under our Cobb-Douglas assumption, scaling labor by a factor of 2 isn't as good as scaling all research inputs by a factor of 2; it's only  as good. 

Plugging in Equation 3 (which describes research input  in terms of compute and labor) to Equation 1 (which estimates AI progress  based on research), our adjusted form of the Epoch model is .

Under a software-only singularity, we hold compute constant while scaling labor with AI efficiency, so  multiplied by a fixed compute term. Since labor scales as A, we have . By the same analysis as in our first section, we can see A grows exponentially if , and grows grows superexponentially if this ratio is >1. So our key ratio  just gets multiplied by , and it wasn't a waste to find it, phew! 

Now we get the true form of our equation: we get a software-only foom iff , or (via equation 2) iff we see empirically that . Call this the takeoff ratio: it corresponds to a) how much AI progress scales with inputs and b) how much of a penalty we take for not scaling compute.

Result: Above, we got , so our takeoff ratio is . That's quite close! If we think it's more reasonable to think of a historical growth rate of 4 instead of 3.5, we'd increase our takeoff ratio by a factor of , to a ratio of , right on the knife edge of FOOM. [4] [note: I previously had the wrong numbers here: I had lambda/beta = 1.6, which would mean the 4x/year case has a takeoff ratio of 1.05, putting it into FOOM land] 

So this isn't too far off from your results in terms of implications, but it is somewhat different (no FOOM for 3.5x, less sensitivity to the exact historical growth rate).

 

Analyzing your approach:

Tweaking alpha:

Your estimate of  is in fact similar in form to my ratio - but what you're calculating instead is 

One indicator that something's wrong is that your result involves checking whether , or equivalently whether , or equivalently whether . But the choice of 2 is arbitrary -- conceptually, you just want to check if scaling software by a factor n increases outputs by a factor n or more. Yet  clearly varies with n. 


One way of parsing the problem is that alpha is (implicitly) time dependent - it is equal to exp(r * 1 year) / exp(q * 1 year), a ratio of progress vs inputs in the time period of a year. If you calculated alpha based on a different amount of time, you'd get a different value. By contrast, r/q is a ratio of rates, so it stays the same regardless of what timeframe you use to measure it.[5]

Maybe I'm confused about what your Cobb-Douglas function is meant to be calculating - is it E within an Epoch-style takeoff model, or something else?


Nuances: 

Does Cobb-Douglas make sense?

The geometric average of rates thing makes sense, but it feels weird that that simple intuitive approach leads to a functional form (Cobb-Douglas) that also has other implications.

Wikipedia says Cobb-Douglas functions can have the exponents not add to 1 (while both being between 0 and 1). Maybe this makes sense here? Not an expert. 

How seriously should we take all this?

This whole thing relies on...

  • Assuming smooth historical trends
  • Assuming those trends continue in the future
  • And those trends themselves are based on functional fits to rough / unclear data.

It feels like this sort of thing is better than nothing, but I wish we had something better. 

I really like the various nuances you're adjusting for, like parallel vs serial scaling, and especially distinguishing algorithmic improvement from labor efficiency. [6] Thinking those things through makes this stuff feel less insubstantial and approximate...though the error bars still feel quite large. 

 

  1. ^

    Actually there's a complexity here, which is that scaling labor alone may be less efficient than scaling "research inputs" which include both labor and compute. We'll come to this in a few paragraphs.

  2. ^

    This is only coincidentally similar to your figure of 2.3 :)

  3. ^

    I originally had 1.6 here, but as Ryan points out in a reply it's actually 1.5. I've tried to reconstruct what I could have put into a calculator to get 1.6 instead, and I'm at a loss!

  4. ^

    I was curious how aggressive the superexponential growth curve would be with a takeoff ratio of a mere . A couple of Claude queries gave me different answers (maybe because the growth is so extreme that different solvers give meaningfully different approximations?), but they agreed that growth is fairly slow in the first year (~5x) and then hits infinity by the end of the second year.  I wrote this comment with the wrong numbers (0.96 instead of 0.9), so it doesn't accurately represent what you get if you plug in 4x capability growth per year. Still cool to get a sense of what these curves look like, though.

  5. ^

    I think can be understood in terms of the alpha-being-implicitly-a-timescale-function thing -- if you compare an alpha value with the ratio of growth you're likely to see during the same time period, e.g. alpha(1 year) and n = one doubling, you probably get reasonable-looking results.

  6. ^

    I find it annoying that people conflate "increased efficiency of doing known tasks" with "increased ability to do new useful tasks". It seems to me that these could be importantly different, although it's hard to even settle on a reasonable formalization of the latter. Some reasons this might be okay:
     

    • There's a fuzzy conceptual boundary between the two: if GPT-n can do the task at 0.01% success rate, does that count as a "known task?" what about if it can do each of 10 components at 0.01% success, so in practice we'll never see it succeed if run without human guidance, but we know it's technically possible?
    • Under a software singularity situation, maybe the working hypothesis is that the model can do everything necessary to improve itself a bunch, maybe just not very efficiently yet. So we only need efficiency growth, not to increase the task set. That seems like a stronger assumption than most make, but maybe a reasonable weaker assumption is that the model will 'unlock' the necessary new tasks over time, after which point they become subject to rapid efficiency growth.
    • And empirically, we have in fact seen rapid unlocking of new capabilities, so it's not crazy to approximate "being able to do new things" as a minor but manageable slowdown to the process of AI replacing human AI R&D labor.