What are the strongest arguments for very short timelines?
post by Kaj_Sotala · 2024-12-23T09:38:56.905Z · LW · GW · 5 commentsThis is a question post.
Contents
Answers 18 elifland 14 Nathan Helm-Burger 7 Vladimir_Nesov 0 Pwlot None 5 comments
I'm seeing a lot of people on LW saying that they have very short timelines (say, five years or less) until AGI. However, the arguments that I've seen often seem to be just one of the following:
- "I'm not going to explain but I've thought about this a lot"
- "People at companies like OpenAI, Anthropic etc. seem to believe this"
- "Feels intuitive based on the progress we've made so far"
At the same time, it seems like this is not the majority view among ML researchers. The most recent representative expert survey that I'm aware of is the 2023 Expert Survey on Progress in AI. It surveyed 2,778 AI researchers who had published peer-reviewed research in the prior year in six top AI venues (NeurIPS, ICML, ICLR, AAAI, IJCAI, JMLR); the median time for a 50% chance of AGI was either in 23 or 92 years, depending on how the question was phrased.
While it has been a year since fall 2023 when this survey was conducted, my anecdotal impression is that many researchers not in the rationalist sphere still have significantly longer timelines, or do not believe that current methods would scale to AGI.
A more recent, though less broadly representative, survey is reported in Feng et al. 2024, In the ICLR 2024 "How Far Are We From AGI" workshop, 138 researchers were polled on their view. "5 years or less" was again a clear minority position, with 16.6% respondents. On the other hand, "20+ years" was the view held by 37% of the respondents.
Most recently, there were a number of "oh AGI does really seem close" comments with the release of o3. I mostly haven't seen these give very much of an actual model for their view either; they seem to mostly be of the "feels intuitive" type. There have been some posts discussing the extent [LW · GW] to which we can continue to harness compute and data for training bigger models, but that says little about the ultimate limits of the current models.
The one argument that I did see that felt somewhat convincing were the "data wall" and "unhobbling" sections of the "From GPT-4 to AGI" chapter of Leopold Aschenbrenner's "Situational Awareness", that outlined ways in which we could build on top of the current paradigm. However, this too was limited to just "here are more things that we could do".
So, what are the strongest arguments for AGI being very close? I would be particularly interested in any discussions that explicitly look at the limitations of the current models and discuss how exactly people expect those to be overcome.
Answers
Here's the structure of the argument that I am most compelled by (I call it the benchmarks + gaps argument), I'm uncertain about the details.
- Focus on the endpoint of substantially speeding up AI R&D / automating research engineering. Let's define our timelines endpoint as something that ~5xs the rate of AI R&D algorithmic progress (compared to a counterfactual world with no post-2024 AIs). Then make an argument that ~fully automating research engineering (experiment implementation/monitoring) would do this, along with research taste of at least the 50th percentile AGI company researcher (experiment ideation/selection).
- Focus on REBench since it's the most relevant benchmark. REBench is the most relevant benchmark here, for simplicity I'll focus on only this though for robustness more benchmarks should be considered.
- Based on trend extrapolation and benchmark base rates, roughly 50% we'll saturate REBench by end of 2025.
- Identify the most important gaps between saturating REBench and the endpoint defined in (1). The most important gaps between saturating REBench and achieving the 5xing AI R&D algorithmic progress are: (a) time horizon as measured by human time spent (b) tasks with worse feedback loops (c) tasks with large codebases (d) becoming significantly cheaper and/or faster than humans. There are some more more but they probably aren't as important, should also take into account unknown gaps.
- When forecasting the time to cross the gaps, it seems quite plausible that we get to the substantial AI R&D speedup within a few years after saturating REBench, so by end of 2028 (and significantly earlier doesn't seem crazy).
- This is the most important part of the argument, and one that I have lots of uncertainty over. We have some data regarding the "crossing speed" of some of the gaps but the data are quite limited at the moment. So there are a lot of judgment calls needed and people with very different intuitions might think the remaining gaps will take a long time to cross without this being close to falsified by our data.
- This is broken down into "time to cross the gaps at 2024 pace of progress" -> adjusting based on compute forecasts and intermediate AI R&D speedups before reaching 5x.
- From substantial AI R&D speedup to AGI. Once we have the 5xing AIs, that's potentially already AGI by some definitions but if you have a stronger one, the possibility of a somewhat fast takeoff means you might get it within a year or so after.
One reason I like this argument is that it will get much stronger over time as we get more difficult benchmarks and otherwise get more data about how quickly the gaps are being crossed.
I have a longer draft which makes this argument but it's quite messy and incomplete and doesn't add that much on top of the above summary for now. Unfortunately I'm prioritizing other workstreams over finishing this at the moment. DM me if you'd really like a link to the messy draft.
I've been arguing for 2027-ish AGI for several years now. I do somewhat fall into the annoying category of refusing to give my full details for believing this (publicly). I've had some more in-depth discussions about this privately.
One argument I have been making publicly is that I think Ajeya's Bioanchors report greatly overestimated human brain compute. I think a more careful reading of Joe Carlsmith's report that hers was based on supports my own estimates of around 1e15 FLOPs.
Connor Leahy makes some points I agree with in his recent Future of Life interview. https://futureoflife.org/podcast/connor-leahy-on-why-humanity-risks-extinction-from-agi/
Another very relevant point is that recent research on the human connectome shows that long-range connections (particularly between regions of the cortex) are lower bandwidth than was previously thought. Examining this bandwidth in detail leads me to believe that efficient decentralized training should be possible. Even with considering that training a human brain equivalent model would require 10000x parallel brain equivalents to have a reasonable training time, the current levels of internet bandwidth between datacenters worldwide should be more than sufficient.
Thus, my beliefs are strongly pointint towards: "with the right algorithms we will have more than good enough hardware and more than sufficient data. Also, those algorithms are available to be found, and are hinted at by existing neuroscience data." Thus, with AI R&D accelerated research on algorithms, we should expect rapid progress on peak capabilities and efficiency which doesn't plateau at human-peak-capability or human-operation-speed. Super-fast and super-smart AGI within a few months of full AGI, and rapidly increasing speeds of progress leading up to AGI.
If I'm correct, then the period of time from 2026 to 2027 will contain as much progress on generally intelligent systems as all of history leading up to 2026. ASI will thus be possible before 2028.
Only social factors (e.g. massively destructive war or unprecedented international collaboration on enforcing an AI pause) will change these timelines.
Further thoughts here: A path to human autonomy [LW · GW]
An AGI broadly useful for humans needs to be good at general tasks for which currently there is no way of finding legible problem statements (where System 2 reasoning is useful) with verifiable solutions. Currently LLMs are slightly capable at such tasks, and there are two main ways in which they become more capable, scaling and RL.
Scaling is going to continue rapidly showing new results at least until 2026-2027, probably also 2028-2029 [LW · GW]. If there's no AGI or something like a $10 trillion AI company by then, there won't be a trillion dollar training system and the scaling experiments will fall back to the rate of semiconductor improvement.
Then there's RL, which as o3 demonstrates applies to LLMs as a way of making them stronger and not merely eliciting capabilities formed in pretraining. But it only works directly around problem statements with verifiable solutions, and it's unclear how to generate them for more general tasks or how far will the capabilities generalize from the training problems that are possible to construct in bulk. (Arguably self-supervised learning is good at instilling general capabilities because the task of token prediction is very general, it subsumes all sorts of things. But it's not legible.) Here too scale might help with generalization stretching further from the training problems, and with building verifiable problem statements for more general tasks, and we won't know how much it will help until the experiments are done.
So my timelines are concentrated on 2025-2029, after that the rate of change in capabilities goes down. Probably 10 more years of semiconductor and algorithmic progress after that are sufficient to wrap it up though, so 2040 without AGI seems unlikely.
It's worthy of a (long) post, but I'll try to summarize. For what it's worth, I'll die on this hill.
General intelligence = Broad, cross-domain ability and skills.
Narrow intelligence = Domain-specific or task-specific skills.
The first subsumes the second at some capability threshold.
My bare bones definition of intelligence: prediction. It must be able to consistently predict itself & the environment. To that end it necessarily develops/evolves abilities like learning, environment/self sensing, modeling, memory, salience, planning, heuristics, skills, etc. Roughly what Ilya says about token prediction necessitating good-enough models to actually be able to predict that next token (although we'd really differ on various details)
Firstly, it's based on my practical and theoretical knowledge of AI and insights I believe to have had into the nature of intelligence and generality for a long time. It also includes systems, cybernetics, physics, etc. I believe a holistic view helps inform best w.r.t. AGI timelines. And these are supported by many cutting edge AI/robotics results of the last 5-9 years (some old work can be seen in new light) and also especially, obviously, the last 2 or so.
Here are some points/beliefs/convictions I have for thinking AGI for even the most creative goalpost movers is basically 100% likely before 2030, and very likely much sooner. A fast takeoff also, understood as the idea that beyond a certain capability threshold for self-improvement, AI will develop faster than natural, unaugmented humans can keep up with.
It would be quite a lot of work to make this very formal, so here are some key points put informally:
- Weak generalization has been already achieved. This is something we are piggybacking off of already, and there is meaningful utility since GPT-3 or so. This is an accelerating factor.
- Underlying techniques (transformers , etc) generalize and scale.
- Generalization and performance across unseen tasks improves with multi-modality.
- Generalist models outdo specialist ones in all sorts of scenarios and cases.
- Synthetic data doesn't necessarily lead to model collapse and can even be better than real world data.
- Intelligence can basically be brute-forced it looks like, so one should take Kurzweil *very* seriously (he tightly couples his predictions to increase in computation).
- Timelines shrunk massively across the board for virtually all top AI names/experts in the last 2 years. Top Experts were surprised by the last 2 years.
- Bitter Lesson 2.0.: there are more bitter lessons than Sutton's, which are that all sorts of old techniques can be combined for great increases in results. See the evidence in papers linked below.
- "AGI" went from a taboo "bullshit pursuit for crackpots", to a serious target of all major labs, publicly discussed. This means a massive increase in collective effort, talent, thought, etc. No more suppression of cross-pollination of ideas, collaboration, effort, funding, etc.
- The spending for AI only bolsters, extremely so, the previous point. Even if we can't speak of a Manhattan Project analogue, you can say that's pretty much what's going on. Insane concentrations of talent hyper focused on AGI. Unprecedented human cycles dedicated to AGI.
- Regular software engineers can achieve better results or utility by orchestrating current models and augmenting them with simple techniques(RAG, etc). Meaning? Trivial augmentations to current models increase capabilities - this low hanging fruit implies medium and high hanging fruit (which we know is there, see other points).
I'd also like to add that I think intelligence is multi-realizable, and generality will be considered much less remarkable soon after we hit it and realize this than some still think it is.
Anywhere you look: the spending, the cognitive effort, the (very recent) results, the utility, the techniques...it all points to short timelines.
In terms of AI papers, I have 50 references or so I think support the above as well. Here are a few:
SDS : See it. Do it. Sorted Quadruped Skill Synthesis from Single Video Demonstration, Jeffrey L., Maria S., et al. (2024).
DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning, Zhenyu J., Yuqi X., et in. (2024).
One-Shot Imitation Learning, Duan, Andrychowicz, et al. (2017).
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al., (2017).
Unsupervised Learning of Semantic Representations, Mikolov et al., (2013).
A Survey on Transfer Learning, Pan and Yang, (2009).
Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly, Xian et al., (2018).
Learning Transferable Visual Models From Natural Language Supervision, Radford et al., (2021).
Multimodal Machine Learning: A Survey and Taxonomy, Baltrušaitis et al., (2018).
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine, Harsha N., Yin Tat Lee et al. (2023).
A Vision-Language-Action Flow Model for General Robot Control, Kevin B., Noah B., et al. (2024).
Open X-Embodiment: Robotic Learning Datasets and RT-X Models, Open X-Embodiment Collaboration, Abby O., et al. (2023).
5 comments
Comments sorted by top scores.
comment by Lukas_Gloor · 2024-12-23T11:50:42.055Z · LW(p) · GW(p)
It surveyed 2,778 AI researchers who had published peer-reviewed research in the prior year in six top AI venues (NeurIPS, ICML, ICLR, AAAI, IJCAI, JMLR); the median time for a 50% chance of AGI was either in 23 or 92 years, depending on how the question was phrased.
Doesn't that discrepancy (how much answers vary between different ways of asking the question) tell you that the median AI researcher who published at these conferences hasn't thought about this question sufficiently and/or sanely?
It seems irresponsible to me to update even just a small bit to the specific reference class of which your above statement is true.
If you take people who follow progress closely and have thought more and longer about AGI as a research target specifically, my sense is that the ones who have longer timeline medians tend to say more like 10-20y rather than 23y+. (At the same time, there's probably a bubble effect in who I follow or talk to, so I can get behind maybe lengthening that range a bit.)
Doing my own reasoning, here are the considerations that I weigh heavily:
- we're within the human range of most skill types already (which is where many of us in the past would have predicted that progress speeds up, and don't see any evidence of anything that should change our minds on that past prediction – deep learning visibly hitting a wall would have been one conceivable way, but it hasn't happened yet)
- that time for "how long does it take to cross and overshoot the human range at a given skill?" has historically gotten a lot smaller and is maybe even decreasing(?) (e.g., it admittedly took a long time to cross the human expert range in chess, but it took less long in Go, less long at various academic tests or essays, etc., to the point that chess certainly doesn't constitute a typical baseline anymore)
- that progress has been quite fast lately, so that it's not intuitive to me that there's a lot of room left to go (sure, agency and reliability and "get even better at reasoning")
- that we're pushing through compute milestones rather quickly because scaling is still strong with some more room to go, so on priors, the chance that we cross AGI compute thresholds during this scale-up is higher than that we'd cross it once compute increases slow down
that o3 seems to me like significant progress in reliability, one of the things people thought would be hard to make progress on
Given all that, it seems obvious that we should have quite a lot of probability of getting to AGI in a short time (e.g., 3 years). Placing the 50% forecast feels less obvious because I have some sympathy for the view that says these things are notoriously hard to forecast and we should smear out uncertainty more than we'd intuitively think (that said, lately the trend has been that people consistently underpredict progress, and maybe we should just hard-update on that.) Still, even on that "it's prudent to smear out the uncertainty" view, let's say that implies that the median would be like 10-20 years away. Even then, if we spread out the earlier half of probability mass uniformly over those 10-20 years, with an added probability bump in the near-term because of the compute scaling arguments (we're increasing training and runtime compute now but this will have to slow down eventually if AGI isn't reached in the next 3-6 years or whatever), that IMO very much implies at least 10% for the next 3 years. Which feels practically enormously significant. (And I don't agree with smearing things out too much anyway, so my own probability is closer to 50%.)
↑ comment by Kaj_Sotala · 2024-12-23T15:23:34.236Z · LW(p) · GW(p)
Doesn't that discrepancy (how much answers vary between different ways of asking the question) tell you that the median AI researcher who published at these conferences hasn't thought about this question sufficiently and/or sanely?
We know that AI expertise and AI forecasting are separate skills and that we shouldn't expect AI researchers to be skilled at the latter. So even if researchers have thought sufficiently and sanely about the question of "what kinds of capabilities are we still missing that would be required for AGI", they would still be lacking the additional skill of "how to translate those missing pieces into a timeline estimate".
Suppose that a researcher's conception of current missing pieces is a mental object M, their timeline estimate is a probability function P, and their forecasting expertise F is a function that maps M to P. In this model, F can be pretty crazy, creating vast differences in P depending how you ask, while M is still solid.
I think the implication is that these kinds of surveys cannot tell us anything very precise such as "is 15 years more likely than 23", but we can use what we know about the nature of F in untrained individuals to try to get a sense of what M might be like. My sense is that answers like "20-93 years" often translate to "I think there are major pieces missing and I have no idea of how to even start approaching them, but if I say something that feels like a long time, maybe someone will figure it out in that time", "0-5 years" means "we have all the major components and only relatively straightforward engineering work is needed for them", and numbers in between correspond to Ms that are, well, somewhere in between those.
Replies from: Lukas_Gloor↑ comment by Lukas_Gloor · 2024-12-23T16:00:31.450Z · LW(p) · GW(p)
Suppose that a researcher's conception of current missing pieces is a mental object M, their timeline estimate is a probability function P, and their forecasting expertise F is a function that maps M to P. In this model, F can be pretty crazy, creating vast differences in P depending how you ask, while M is still solid.
Good point. This would be reasonable if you think someone can be super bad at F and still great at M.
Still, I think estimating "how big is this gap?" and "how long will it take to cross it?" might quite related, so I expect the skills to be correlated or even strongly correlated.
comment by xpostah · 2024-12-23T09:55:49.779Z · LW(p) · GW(p)
+1
On lesswrong, everyone and their mother has an opinion on AI timelines. People just stating their views without any arguments doesn't add a lot of value to the conversation. It would be good if there was a single (monthly? quarterly?) thread that collates all the opinions that are stated without proof. And outside of this thread only posts with some argumentation are allowed.
P.P.S. Sorry for the wrong link, it's fixed now
Replies from: xpostah