Forecasting AGI: Insights from Prediction Markets and Metaculus

alvin-anestrand

Forecasting AGI: Insights from Prediction Markets and Metaculus

post by Alvin Ånestrand (alvin-anestrand) · 2025-02-04T13:03:45.927Z · LW · GW · 0 comments

This is a link post for https://forecastingaifutures.substack.com/p/forecasting-agi-insights-from-prediction-markets

  Conclusion
None
No comments

I have tried to find all prediction market and Metaculus questions related to AGI timelines. Here I examine how they compare to each other, and what they actually say about when AGI might arrive.

If you know of a market that I have missed, please tell me in the comment section! It would also be helpful if you tell me about what questions you think are relevant but are missing from this analysis. This is a linkpost, and I prefer if you comment in the original post on my new blog, Forecasting AI Futures, but feel free to comment here as well. Subscribe to the blog for updates on my future forecasting posts related to AI safety.

Whenever possible, please check the more recent probability estimates in the embedded sites, instead of looking at my At The Time Of Writing (ATTOW) numbers.

So, what does prediction markets and Metaculus have to say about AGI?

Metaculus has this question for the arrival date of AGI:

The AI system needs to be able to:

Pass a really hard Turing test.
Have general robotic capabilities (being able to assemble a “circa-2021 Ferrari 312 T4 1:8 scale automobile model” or equivalent).
Achieve “at least 75% accuracy in every task and 90% mean accuracy across all tasks” on the MMLU benchmark, which measures expertise in a wide range of academic subjects.
Achieve at least 90% accuracy with a single attempt for each question on the APPS benchmark, which measures coding skills.

Metaculus thinks this will probably occur around the middle of 2030, though with high uncertainty. The interval between the lower and upper quartiles for the individual predictions on this question is (2026-12-28 - 2039-03-27) ATTOW.

GPT-4o achieves an accuracy of 88.7% on MMLU, as seen in the leaderboard here. GPT-4 was used to get 22% accuracy on APPS. Unfortunately, most of the best models have not been tested on either MMLU or APPS.

OpenAI’s o3 has been reported of achieving 71.7% on SWE-bench Verified. We can compare that to GPT-4, which managed to achieve 22.4% on SWE-bench Verified and 22% accuracy on APPS. Based on this, I think o3 would manage to achieve above 50% accuracy on APPS.

The two criteria that AI currently seem furthest from fulfilling are the robotics capabilities and APPS accuracy, though current best performance on the APPS benchmark is uncertain. Coding capabilities are improving very fast, which indicated by the rapid improvements in accuracy in SWE-bench Verified, while robotics capabilities are lagging behind. If there are not too many errors in the APPS benchmark dataset, robotic skills are probably the limiting factor.

But even though this is an interesting thing to forecast, the first AI system fulfilling these criteria is not necessarily a true AGI. Like the current SOTA general-purpose AIs, that are getting really good at things like answering graduate-level questions (o3 achieves 87.7% accuracy on GPQA) and coding, the systems are not really agentic enough yet to, for example, replace all remote workers.

On the other hand, an AI system that can perform all purely intellectual tasks that a human can do might be developed before the robotics criteria is fulfilled. Depending on what definition of AGI you prefer, AGI might arrive before the resolution of the question above. These considerations imply that the question only provides a rough proxy for estimating actual AGI arrival time.

Manifold is of course also attempting to forecast AGI. A user called RemNi helpfully posted a question for the probability that AGI arrives before each of the coming years. The resolution criteria are relatively subjective but seem good enough to me: “AGI can theoretically perform any intellectual task that a human being can. It involves the capability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience.” Here are the links to the markets for the next 8 years, and their respective predictions ATTOW:

‌We can also follow how the probability has changed over time:

The difference between the Manifold and Metaculus predictions could be due to the different resolution criteria. Even if the Manifold version is a bit more subjective, to me it seems closer to an actual definition of AGI than the Metaculus question. I don’t think robotics capabilities are necessary for an AI system to be transformative [? · GW], which an AGI would arguably be.

There are also markets for how probable it is that OpenAI announces AGI this year. Polymarket and Kalshi estimate 27% probability and 24% probability respectively for this ATTOW.

Of course, an announcement of AGI is not equivalent to AGI arrival. It seems like OpenAI and Microsoft have made a deal that Microsoft will lose access to OpenAI technology once AGI is achieved, and they have agreed on a monetary definition of AGI as an “AI system that can generate at least $100 billion in profits”. AGI could plausibly arrive before then, since AGI could be really expensive to run. But the criteria are “OpenAI or an official representative of the company announces that it has created an artificial general intelligence (AGI)”, and OpenAI could make such an announcement to the public before the monetary definition is achieved. It wouldn’t really surprise me if they announce AGI as a market strategy to hype their technology, so even if this market would be correctly priced according to the criteria, I think it is likely to provide an overestimation of how likely OpenAI is to actually achieve AGI this year. And since OpenAI seem to be in the frontline, it probably overestimates the probability of AGI this year in general.

Conclusion

To summarize some of the most important details:

Manifold estimates a 14% probability of AGI this year, a bit lower than Kalshi’s 24% probability and Polymarket’s 27% probability for OpenAI announcing AGI this year.

Metaculus expects AGI around the middle of 2030 but require robotics capabilities that some would disagree are necessary for AGI.

Manifold estimates a 47% probability of AGI before 2028. It also estimates a 43% probability that AGI arrives after 2025 and before 2029:
57% before 29 - 14% before 2026 = 43%.

What should you make of this? First, although several sources predict that AGI is likely to arrive within the next 10 years, there is high uncertainty. The predictions differ between Metaculus and Manifold, and between the respective resolution criteria. Manifold estimates about an even chance of AGI arriving before 2028, while Metaculus estimates an even chance of AGI arriving before the mid of 2030.

What do I think myself?

My best guess is that AGI arrives somewhere between the mid of 2026 and before the end of 2027, with above 50% probability. This approximately matches Manifold’s predictions, but with lower probability of AGI this year.

Also, the Manifold predictions can provide an estimation for the cumulative distribution function for AGI arrival. I think they get the shape of the function right, with the probability of AGI rising fast in the beginning and rising slower after 2029. If AGI takes that long, it might mean that it was harder than expected to achieve. Additionally, humanity will have had more time to coordinate around regulation and international treaties, perhaps even pulling off an international pause, which would increase AGI timelines.

I will write more about international coordination in a separate post; prediction markets have many interesting things to say about that as well.

0 comments

Comments sorted by top scores.

Forecasting AGI: Insights from Prediction Markets and Metaculus

Contents

Conclusion

0 comments