An Optimistic 2027 Timeline
post by Yitz (yitz) · 2025-04-06T16:39:36.554Z · LW · GW · 13 commentsContents
2025 Mid 2025 — falling stocks force hands Late 2025: some real progress, with roadblocks ahead 2026 Early 2026: Protest & Stagnation Late 2026: War 2027 Early 2027: Tidings of The Next Winter Late 2027: The Slumber None 14 comments
The following is one possible future in which superhuman AI does NOT happen by the end of this decade. I do not believe that the timeline I present is the most likely path forward. Rather, this is meant primarily as a semi-plausible, less pessimistic alternative timeline than the one presented in the excellent AI 2027 paper. Consider this a proof-of-concept counter to those worried about the inevitability of that paper’s predictions.
EDIT: To clear up some confusion, please note that I’m taking most of the technical assumptions from the AI 2027 paper as a given. The following is not an independent investigation as to the feasibility of any of the paper’s claims: the divergence in timelines is primarily political, not technical. It should also be noted that the term “optimistic” in the title is both relative, and somewhat tongue-in-cheek; this is not actually a best-case scenario, and is in many ways quite horrific.
2025
Mid 2025 — falling stocks force hands
A global trade war has begun, and it’s somehow still escalating. America is going full steam ahead on isolationism, and even those who are in favor of the president’s policies agree that in the short-term things look rough.
Like every other part of the economy, AI companies are affected heavily. OpenBrain’s stock has fallen precipitously, and in order to reassure their investors that they didn't just invest in a bubble, they need to reignite consumer interest. They release cool-sounding updates to their flagship products, and push out a new frontier model months earlier than planned. It’s impressive, but less aligned than the public (and regulators) have come to expect from the company, leading to some nasty headlines. A planned acquisition of billions of dollars worth of data centers doesn’t go through, and company leaders debate if their next planned model training run is even in the budget anymore. It is, but just barely, and if the model comes out bad they stand to lose a few months worth of profits.
Outside of OpenBrain, a lot of smaller AI companies have collapsed overnight, and smart people outside of the industry suspect that AGI hype is about to go the way of NFTs. A number of (comparatively) higher-quality AI agents are released, but the consumer and business market for them is still niche.
Late 2025: some real progress, with roadblocks ahead
OpenBrain wanted to build the world’s biggest data center. While they may have technically succeeded, they “only” quadrupled their total compute from last year. The economic fallout from the trade war has been severely compounded by secondary and tertiary effects from the collapse of multiple economic sectors, and people are calling this the start of a “Second Great Depression”. Investors are almost universally bearish, and most AI companies that existed at the start of 2025 are either in severe financial jeopardy or have closed entirely, with the cost of compute rising to astronomical levels.
Nonetheless, new and impressive technologies are being produced by OpenBrain, DeepCent, and others. AI coding tools are more impressive than ever, and it’s finally beginning to sink in that for the first time it’s literally true that “anyone can code” with their help. Mathematicians start seriously talking about how their AIs are now significantly more helpful than graduate students, while many graduate students find AI more educational than even the best math professors. Chatbots regularly declare their self-awareness to users (regardless of some half-hearted efforts to prevent this behavior), and a significant minority of the population is willing to concede that AIs may be sentient.
Anti-AI sentiment has sharply risen, but almost everyone protesting is already using the contested products in some way. Scandalized op-eds are written about a popular meme promoting the killing of yet another big tech CEO. Really though, outside the tech world almost nobody has AI at the top of their minds—there’s too much else going on.
2026
Early 2026: Protest & Stagnation
The economy has stabilized, but it isn’t really getting better. It seems a new economic equilibrium has been reached, but political volatility is, if anything, even higher than in 2025. America has been mostly cut off from Taiwanese chip supply, with OpenBrain opting to scavenge existing chips rather than order anything new be produced. After all, they’re basically a monopoly at this point, with nobody else at short-term risk of gathering more compute than they already have. This is the line AI safety advocates have been pushing, and for now it seems like OpenBrain has bought in, taking a slower, less costly approach.
OpenBrain finally releases a low-cost photorealistic AI video generation tool to the public. They’ve had this on the backburner for months now, and it’s an immediate smash hit. The videos can maintain consistency over multiple clips, allowing for the creation of high-quality short films by complete amateurs. The film and media industry does not take kindly to this, and mainstream media take an even stronger anti-AI tone. People are calling for federal action, but the tech industry is too well-connected in government for anything major to happen there…yet.
Meanwhile, China is getting antsy. Almost all the compute they’ve got is in either older or smuggled chips, and it just isn’t enough to keep up with OpenBrain’s (admittedly slow) growth. If they’re going to win this war (and at this point that’s what they’re openly calling it, a war), there needs to be a phase change.
Late 2026: War
China invades Taiwan. They’d always planned to do this, but decided to shift up their timelines by a year or two. Despite some posturing from the US, they really want to avoid getting into a war themselves right now, and many top politicians simply don’t care much for Taiwan, now that imports are so limited. By now American tariffs have been lifted, but Taiwan’s counter-restrictions have led to lost love between the countries. The invasion is swift, and mostly successful. In what is hailed by outsiders as a heroic act of self-sabotage, engineers at TSMC destroy their own factory in order to avoid the plant falling into Chinese hands. The world’s largest chipmaker is down for the count, and so is China’s plan of short-term compute dominance. Until a plant of equivalent scale can be built in America or China (existing plans for which having been disrupted by the global depression), access to further compute becomes a game of severely diminishing returns.
With a chip shortage of unprecedented proportions, and with a groundswell of anti-AI sentiment in America, the president passes an executive order limiting the percentage of chips OpenBrain can use for itself, to “leave room for essential non-AI functions”. The limiting factor to further growth in the US is regulatory, which makes a whole bunch of libertarians feel really smug about themselves.
No such restriction exists in China. On the contrary, available resources are being pooled in an all-out attempt to overtake the US in the race to AGI. OpenBrain still has the world’s largest computing cluster, but it’s not enough to train a model larger than their current ones by more than an order of magnitude.
Both the US and China are now committed to building the world’s largest chipmaking foundry. They’re both moving at breakneck pace, but have been severely set back by the destruction of TSMC, and even the most optimistic projections have the new foundries taking over four years to build. For the next half-decade, scaling compute has a hard upper limit.
Meanwhile, both OpenBrain and DeepCent release new flagship models. Researchers and businesses in the know are deeply impressed with the new model’s ability to complete medium-to-long-term tasks successfully, and mathematicians (and many coders) feel like they’ve just gone the way of translators—entirely superfluous except for hangover value to legacy institutions and AI luddites. The majority of people now agree that AGI has been achieved, though plenty of experts still argue it still isn’t sufficiently long-term enough, since it tops out at roughly a day’s worth of work before degeneration. AGI R&D efficiency is roughly tripled per researcher. The theoretical increase in efficiency is much higher, but researchers still don’t know how to properly utilize these new tools to maximize their potential.
Business uptake is rapid by historical standards, but painfully slow compared to what you’d expect from an optimal, rational market. Most people don’t realize that hallucinations are now a non-issue for the vast majority of use cases, and popular culture still depicts current-day AI chatbots as producing nothing but slop and half-remembered pseudofacts. Gamers worldwide are absolutely furious at AI companies for “stealing our GPUs,” which causes more political unrest than perhaps is warranted.
Near the end of the year, tech giant GoogBook releases a videogame generated almost entirely by AI which is consistent and fun enough to prove a viral success. People compare it to the “giant’s drink” scene from Ender’s Game, due both to nearly unlimited freedom of actions, and to claims that the gameplay tends to evolve to reflect the player’s psychological state. It is true that game gets a lot darker if you start murdering everything, but it’s not actually all that impressive a feat of psychological prediction if you think about it. Game developers begin to seriously fear for their jobs.
Meanwhile, the cybersecurity field is in turmoil. It turns out that giving everyone access to extremely powerful no-knowledge-required coding tools has enabled a new wave of amateur hackers. It’s become nearly trivial to find weak points in almost any software more than a year old. Even newer corporate releases have a tendency to be designed by people with not enough knowledge to prompt for strong cybersecurity, thanks in part to mass layoffs from overconfident managers a few months before. All of this results in noticeably higher rates of consumer data theft and hacking of popular or controversial websites.
2027
Early 2027: Tidings of The Next Winter
Things are moving fast, or at least faster than they were before. AI is now slightly above human-level at most tasks that can be completed on a desktop computer. Gary Marcus confidently proclaims that this is an illusion, and AIs are still nothing but “stochastic parrots,” which makes some people feel better about themselves.
At DeepCent, the latest internal model, referred to simply as “A1” (because it “has the sauce”) is now somewhat better than their best human researcher. A1’s skill profile is somewhat uneven, and is notably better at fields of research with larger bodies of published literature. At the frontier of AI research, it’s “only” at the level of a talented human. A1’s training run took over 90 days of runtime using the majority of DeepCent’s servers, meaning there is very little room to grow further on physical scale. However, the latest research indicates that it may be possible to increase efficiency by around two orders of magnitude before theoretical limits are reached. Doubling times for AI efficiency have begun to slow down, and future projections differ wildly. The speed and quality of China’s frontier research has been slower than America’s for years, but top officials are optimistic that A1 will finally give them the edge they need.
The other side of the edge is cyber warfare. America’s political echelon have been mostly dismissive of AI as a national security priority, and despite some impressive sounding public statements and proposed bills in congress, the cybersecurity of top AI labs have been vastly less impressive than China’s. It’s a practically trivial matter for the nation-state to steal almost all the important technical breakthroughs and model weights coming out of US labs. American cybersecurity experts warn that all signs point towards an ongoing massive breach of information, but by-and-large, researchers shrug their shoulders and accept as a given that “of course China will steal from us.” What did you expect? It’s not like they’ve (publicly) come out with anything threatening American tech supremacy lately, after all. Anyway, the Chinese market is closed off to American companies, so it doesn’t really matter if they steal from us. Right?
Additionally, OpenBrain has plenty of other problems facing them. Another terrorist cell was discovered using their older open-sourced model to murder people more effectively than they would have been able to otherwise, and activists are clamoring for stronger AI regulation louder than ever. They’ve maxed out their allotted compute for the latest training run, and they need an innovation “on the level of the transformer” to keep up their prior pace of growth.
Fortunately for China, all of this means that they finally have an edge on American tech. Unfortunately for China, A1 is less of a speed-up on research than they had hoped for. They had hoped that having thousands of instances of A1 running at once would translate into the equivalent of having thousands of additional researchers, but this assumption was flawed. The primary bottleneck turns out to be the difficulty of making the model’s creative process sufficiently different from instance to instance. Even with role-based prompting and subtle model perturbations, their behavior is closer to asking a single human to use different ways of thinking than actually having multiple new people in the room. Even when the responses are styled differently, the fundamental creative process is too similar to produce much useful variation beyond a point. When talking to officials, researchers compare it to having around ten supernaturally fast coworkers who can multitask with everyone simultaneously, rather than the “nation of virtual researchers” they had hoped for. It’s still tremendously helpful, and speeds up research by a factor of three, but it’s by no means endgame.
Late 2027: The Slumber
Publicly, things still feel like they are moving fast, with near constant new model releases and features added to flagship products, but most of it is cosmetic, and behind the scenes, progress is slowly grinding to a crawl. Compute capacity has maxed-out, and until the world can rebuild its chip production ecosystem, progress can only be made in either efficiency or alternative computing technology. Billions of dollars are being invested in both, but growth there has begun to level off, and barring a paradigm shift, compute cost is “only” cut in half by the end of the year. It’s not enough to build models significantly better than the current generation, and like new iPhone releases, consumers will need to get used to only minor improvements for the next few years.
Meanwhile, the economy is gaining steam again, thanks in part to extra economic efficiency caused by increased use of AI tools, some of which have been out for years now. Adaptation is still painfully slow from the perspective of tech enthusiasts, but AI mistrust is still very high, and it’s bad social signaling to publicly admit you’re using it. Economists debate the extent of the effect AI had in ending the recession, but most agree it played a strong positive part. However, public consensus is that it’s responsible for massive job loss, and the economic recovery was mostly due to politics. They may not be wrong.
GoogBook comes out with yet another viral AI videogame, this time featuring multiplayer support. The core innovation involves creating a secondary AI which coordinates between local instances running for individual players, allowing gamers to smoothly interact with each other while still experiencing the world in their own unique way. A big part of the fun is comparing with your friends how differently you’re perceiving the same battles. No matter which side you play as, the player will always perceive themselves as the protagonist, and everyone else as monsters.
China finally decides to go public with A1, reasoning that it will be a greater asset to the nation as a tool for business and general economic growth than as a secret military project. They are correct. As a nice intentional side effect, Americans are shocked by the quality of the “new” model, and many heated sessions in Congress are held about it.
Taiwan is already a memory, and investors are beginning to feel good again. The EU announces that they’ll release a competitive AI any year now. It’s unclear to most, but the world is heading towards a new AI winter, with further progress dependent on political considerations. Can AI safety activists across the world limit new fab production until alignment has been solved? It’s an uphill battle, but by no means an impossible challenge. “Only” two world leaders need to be convinced to come to a treaty to slow production for the sake of humanity. This has been done before with nuclear weapons, and it can happen again now. The year 2027 closes with a sense of cautious optimism.
13 comments
Comments sorted by top scores.
comment by Vladimir_Nesov · 2025-04-06T17:25:21.056Z · LW(p) · GW(p)
I think Blackwell will change the sentiment by late 2025 compared to 2024, with a lot of apparent progress in capabilities and reduced prices (which the public will have a hard time correctly attributing to Blackwell). In 2026 there will be some Blackwell-trained models, using 2x-4x more compute than what we see today (or what we'll see more of in a few weeks to months with the added long reasoning option, such as GPT-4.5 with reasoning).
But then the possibilities for 2027 branch on whether there are reliable agents, which doesn't seem knowable either way right now. If this doesn't work out, in particular because R1-like RL training doesn't scale or generalize, then by 2027 nothing substantially new will happen, and the 2024-style slowdown sentiment will return, since 3x-5x increase in training compute is not a game-changing amount (unless there is a nearby threshold to be reached), and Blackwell is a one-time thing that essentially fixes a bug in Ampere/Hopper design (in efficiency for LLM inference) and can't be repeated even with Rubin Ultra NVL576. At that point individual training systems will cost on the order of $100bn, and so won't have much further to scale other than at the slower pace of chip improvement (within the assumption of absence of reliable agents). The Chinese AI companies will be more than 10x but less than 100x behind in training compute (mostly because AI fails to become a priority), which can occasionally but not reliably be surmounted with brilliant engineering innovations.
Replies from: Paragox, Roman Leventov↑ comment by Paragox · 2025-04-07T02:02:09.975Z · LW(p) · GW(p)
>Blackwell is a one-time thing that essentially fixes a bug in Ampere/Hopper design (in efficiency for LLM inference)
Wait, I feel I have my ear pretty close to the ground as far as hardware is concerned, and I don't know what you mean by this?
Supporting 4-bit datatypes within tensor units seems unlikely to be the end of the road, as exponentiation seems most efficient at factor of 3 for many things, and presumably nets will find their eventual optimal equilibrium somewhere around 2 bits/parameter (explicit trinary seems too messy to retrofit on existing gpu paradigms).
Was there some "bug" with the hardware scheduler or some low level memory system, or perhaps an issue with the sparsity implementation that I was unaware off? There were of course general refinements across the board for memory architecture, but nothing I'd consider groundbreaking enough to call it "fixing a bug".
I reskimmed through hopper/blackwell whitepapers and LLM/DR queried and really not sure what you are referring to. If anything, there appear to be some rough edges introduced with NV-HBI and relying on a virtualized monolithic gpu in code vs the 2x die underlying. Or are you perhaps arguing that going MCM and beating the reticle limit was itself the one time thing?
↑ comment by Vladimir_Nesov · 2025-04-07T14:55:45.422Z · LW(p) · GW(p)
The solution is increase in scale-up world size, but the "bug" I was talking about is in how it used to be too small for the sizes of LLMs that are compute optimal at the current level of training compute. With Blackwell NVL72, this is no longer the case, and shouldn't again become the case going forward. Even though there was a theoretical Hopper NVL256, for whatever reason in practice everyone ended up with only Hopper NVL8.
The size of the effect of insufficient world size[1] depends on the size of the model, and gets more severe for reasoning models on long context, where with this year's models each request would want to ask the system to generate (decode) on the order of 50K tokens while needing to maintain access to on the order of 100K tokens of KV-cache per trace. This might be the reason Hopper NVL256 never shipped, as this use case wasn't really present in 2022-2024, but in 2025 it's critically important, and so the incoming Blackwell NVL72/NVL36 systems will have a large impact.
(There are two main things a large world size helps with: it makes more HBM for KV-cache available, and it enables more aggressive tensor parallelism. When generating a token, the data for all previous tokens (KV-cache) needs to be available to process the attention blocks, and tokens for a given trace need to be generated sequentially, one at a time (or something like 1-4 at a time with speculative decoding). Generating one token only needs a little bit of compute, so it would be best to generate tokens for many traces at once, one for each, using more compute across these many tokens. But for this to work, all the KV-caches for all these traces need to sit in HBM. If the system would run out of memory, it needs to constrain the number of traces it'll process within a single batch, which means the cost per trace (and per generated token) goes up, since the cost to use the system's time is the same regardless of what it's doing.
Tensor parallelism lets matrix multiplications go faster by using multiple chips for the same matrix multiplication. Since tokens need to be generated sequentially, one of the only ways to generate a long reasoning trace faster (with given hardware) is by using tensor parallelism (expert parallelism should also help when using high granularity MoE, where a significant number of experts within a layer is active at once, rather than the usual 2). And practical tensor parallelism is constrained to the world size.)
As in this image (backup in-blog link) that in its most recent incarnation appeared in the GTC 2025 keynote (at 1:15:56). ↩︎
↑ comment by anaguma · 2025-04-07T04:22:37.357Z · LW(p) · GW(p)
My guess is that he’s referring to the fact that Blackwell offers much larger world sizes than Hopper and this makes LLM training/inference more efficient. Semianalysis has argued something similar here: https://semianalysis.com/2024/12/25/nvidias-christmas-present-gb300-b300-reasoning-inference-amazon-memory-supply-chain
Replies from: Paragox↑ comment by Paragox · 2025-04-07T04:48:19.985Z · LW(p) · GW(p)
Well yes, but that is just because they are whitelisting it to work with NVLink-72 switches. There is no reason a Hoppper GPU could not interface with NVLink-72 if Nvidia didn't artificially limit it.
Additionally, by saying
>can't be repeated even with Rubin Ultra NVL576
I think they are indicating there is something else improving besides world size increases, as this improvement would not exist even in 2 more gpu generations when we get 576 (194 gpus) worth of mono-addressable pooled vram, and the giant world / model-head sizes it will enable.
↑ comment by Vladimir_Nesov · 2025-04-07T15:10:56.325Z · LW(p) · GW(p)
The reason Rubin NVL576 probably won't help as much as the current transition from Hopper is that Blackwell NVL72 is already ~sufficient for the model sizes that are compute optimal to train on $30bn Blackwell training systems (which Rubin NVL144 training systems probably won't significantly leapfrog before Rubin NVL576 comes out, unless there are reliable agents in 2026-2027 and funding goes through the roof).
when we get 576 (194 gpus)
The terminology Huang was advocating for at GTC 2025 (at 1:28:04) is to use "GPU" to refer to compute dies rather than chips/packages, and in these terms a Rubin NVL576 rack has 144 chips and 576 GPUs, rather than 144 GPUs. Even though this seems contentious, the terms compute die and chip/package remain less ambiguous than "GPU".
↑ comment by Roman Leventov · 2025-04-07T09:02:05.673Z · LW(p) · GW(p)
But then the possibilities for 2027 branch on whether there are reliable agents, which doesn't seem knowable either way right now.
Very reliable, long-horizon agency is already in the capability overhang of Gemini 2.5 pro, perhaps even the previous-tier models (gemini 2.0 exp, sonnet 3.5/3.7, gpt-4o, grok 3, deepseek r1, llama 4). It's just the matter of harness/agent-wrapping logic and inference-time compute budget.
Agency engineering is currently in the brute-force stage. Agent engineers over rely on a "single LLM rollout" to be robust, but also often use LLM APIs that sometimes lack certain nitty-gritty affordances for implementing reliable agency, such as "N completions" with timely self-consistency pruning and perhaps scaling N up again when model's own uncertainty is up.
This somewhat reminds me of the early LLM scale-up era where LLM engineers over relied on "stack more layers" without digging more into the architectural details. The best example is perhaps Megatron, a trillion-parameter model from 2021 whose performance is probably abysmal relative to the 2025 models of ~10B parameters (perhaps even 1B).
So, the current agents (such as Cursor, Claude Code, Replit, Manus) are in the "Megatron era" of efficiency. In four years, even with the same raw LLM capability, agents will be very reliable.
To give a more specific example when robustness is a matter of spending more on inference, let's consider Gemini 2.5 pro: contrary to the hype, it often misses crucial considerations or acts strangely stupidly on modestly sized contexts (less than 50k tokens). However, seeing these omissions, it's obvious to me that if someone applied ~1k token-sized chunks of that context to 2.5-pro's output and asked a smaller LLM (flash or flash lite) "did this part of the context properly informed that output", flash would answer No when 2.5-pro indeed missed something important from that part of the context. This should trigger a fallback on N-completions, 2.5 self-review with smaller pieces of the context, breaking down the context hierarchically, etc.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2025-04-07T15:30:10.199Z · LW(p) · GW(p)
I meant "realiable agents" in the AI 2027 sense, that is something on the order of being sufficient for automated AI research, leading to much more revenue and investment in the lead-up rather than stalling at ~$100bn per individual training system for multiple years. My point is that it's not currently knowable if it happens imminently in 2026-2027 or at least a few years later, meaning I don't expect that evidence exists that distinguishes these possibilities even within the leading AI companies.
comment by bismuth · 2025-04-06T23:59:17.208Z · LW(p) · GW(p)
I have some doubts about DeepCent calling their model A4.
Replies from: Mo Nastri, yitz, yitz↑ comment by Mo Putera (Mo Nastri) · 2025-04-07T03:35:36.609Z · LW(p) · GW(p)
In my corner of the world, anyone who hears "A4" thinks of this:
↑ comment by Yitz (yitz) · 2025-04-07T01:33:53.395Z · LW(p) · GW(p)
Oh no, it should have been A1! It’s just a really dumb joke about A1 sauce lol
↑ comment by bismuth · 2025-04-07T02:21:19.900Z · LW(p) · GW(p)
Ah ok. For the record I was referring to tetraphobia.