Is AI Hitting a Wall or Moving Faster Than Ever?

post by garrison · 2025-01-09T22:18:51.497Z · LW · GW · 5 comments

This is a link post for https://garrisonlovely.substack.com/p/is-ai-hitting-a-wall-or-moving-faster

Contents

5 comments

5 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2025-01-09T23:07:07.640Z · LW(p) · GW(p)

Noticing progress in long reasoning models like o3 creates a different blind spot compared to popular reporting on how scaling of pretraining is stalling out. It can appear that long reasoning models reconcile the central point of pretraining stalling out with AI progress moving fast. But plausible success of reasoning models instead suggests that pretraining will continue scaling even more[1] than could be expected before.

Training systems were already on track to go from 50 MW, training current models for up to 1e26 FLOPs, to 150 MW in late 2024, and then 1 GW by end on 2025, training models for up to 5e27 FLOPs in 2026, 250x compute of original GPT-4. But with o3, it now seems more plausible that $150bn training systems will be built in 2026-2027 [LW · GW], training models for up to 5e28 FLOPs in 2027-2028, which is 500x compute of the currently deployed 1e26 FLOPs models or 2500x compute of original GPT-4.

Scaling of pretraining is not stalling out, even without the new long reasoning paradigm. It might begin stalling out in 2026 at the earliest, but now more likely only in 2028. The issue is that the scale of training systems is not directly visible, there is a 1-2 year lag between decisions to build them and the observed resulting AI progress.


  1. Reporting on how scaling is stalling out might have a point in returns on scale getting worse than expected. But if scale still keeps increasing despite that, there will be capabilities resulting from additional scale. Scaling by 10x in compute might do very little, and this is compatible with scaling by 500x in compute bringing a qualitative change. ↩︎

Replies from: admohanraj
comment by _liminaldrift (admohanraj) · 2025-01-14T13:39:27.490Z · LW(p) · GW(p)

What about the reports that GPT-5 performance isn't as strong as expected on many tasks due to lack of enough high-quality pretraining data? Isn't that a blocker for scaling to 5e28 FLOPs by 2028?

Though my understanding was that if we were hypothetically able to generate enough training data, these models would continue scaling according to the scaling laws. Are you making the argument that the synthetic data generated by these long reasoning models will allow us to continue scaling these models?

Replies from: Vladimir_Nesov, garrison
comment by Vladimir_Nesov · 2025-01-16T18:28:35.397Z · LW(p) · GW(p)

There is enough natural text data until 2026-2028, as I describe in the Peak Data [LW · GW] section of the linked post. It's not very good data, but with 2,500x raw compute of original GPT-4 (and possibly 10,000x-25,000x in effective compute [LW(p) · GW(p)] due to algorithmic improvement in pretraining), that's a lot of headroom that doesn't depend on inventing new things (such as synthetic data suitable for improving general intelligence through pretraining the way natural text data is).

Insufficient data could in principle be an issue with making good use of 5e28 FLOPs, but actually getting 5e28 FLOPs by 2028 (from a single training system) only requires funding. The decisions about this don't need to be taken based on AIs that exist today, they'll be taken based on AIs that exist in 2026-2027, trained on 1 GW training systems being built this year. With o3-like post-training, the utility and impressiveness of an LLM improves, so the chances of getting that project funded improve (compared to absence of such techniques).

comment by garrison · 2025-01-16T17:49:25.864Z · LW(p) · GW(p)

I think that is a problem for the industry, but probably not an insurmountable barrier the way some commentators make it out to be. 

  1. o-series of models may be able to produce new high quality training data
  2. sufficiently good reasoning approaches + existing base models + scaffolding may be sufficient to get you to automating ML research

One other thought is that there's probably an upper limit on how good an LLM can get even with unlimited high quality data and I'd guess that models would asymptotically approach it for a while. Based on the reporting around GPT-5 and other next-gen models, I'd guess that the issue is lack of data rather than approaching some fundamental limit. 

comment by Hzn · 2025-01-10T10:18:11.592Z · LW(p) · GW(p)

“The reasons why super human AI is a very low hanging fruit are pretty obvious.”

“1) The human brain is meager in terms of energy consumption & matter.”

“2) Humans did not evolved to do calculus, computer programming & things like that.”

“3) Evolution is not efficient.”