Literature review of TAI timelines

post by Jsevillamol, keith_wynroe, David Atkinson · 2023-01-27T20:07:38.186Z · LW · GW · 6 comments

This is a link post for


  Read the rest of the review here

We summarize and compare several models and forecasts predicting when transformative AI will be developed.



Over the last few years, we have seen many attempts to quantitatively forecast the arrival of transformative and/or general Artificial Intelligence (TAI/AGI) using very different methodologies and assumptions. Keeping track of and assessing these models’ relative strengths can be daunting for a reader unfamiliar with the field. As such, the purpose of this review is to:

  1. Provide a relatively comprehensive source of influential timeline estimates, as well as brief overviews of the methodologies of various models, so readers can make an informed decision over which seem most compelling to them.
  2. Provide a concise summarization of each model/forecast distribution over arrival dates.
  3. Provide an aggregation of internal Epoch subjective weights over these models/forecasts. These weightings do not necessarily reflect team members’ “all-things-considered” timelines, rather they are aimed at providing a sense of our views on the relative trustworthiness of the models.

For aggregating internal weights, we split the timelines into “model-based” and “judgment-based” timelines. Model-based timelines are given by the output of an explicit model. In contrast, judgment-based timelines are either aggregates of group predictions on, e.g., prediction markets, or the timelines of some notable individuals. We decompose in this way as these two categories roughly correspond to “prior-forming” and “posterior-forming” predictions respectively.

In both cases, we elicit subjective probabilities from each Epoch team member reflective of:

  1. how likely they believe a model’s assumptions and methodology to be essentially accurate, and
  2. how likely it is that a given forecaster/aggregate of forecasters is well-calibrated on this problem,

respectively. Weights are normalized and linearly aggregated across the team to arrive at a summary probability. These numbers should not be interpreted too literally as exact credences, but rather a rough approximation of how the team views the “relative trustworthiness” of each model/forecast.



(Italicized values are interpolated from a gamma distribution fitted to known values). 1: See this appendix for the individual weightings from respondents and the rationale behind their aggregation. 
Visualization of the different forecasts, and their aggregates.


Read the rest of the review here


Comments sorted by top scores.

comment by Lukas Finnveden (Lanrian) · 2023-01-29T06:39:15.975Z · LW(p) · GW(p)

The numbers you use from Holden says that he thinks AGI by 2036 is more than 10%. But when fitting the curves you put that at exactly 10%, which will predictably be an underestimate. It seems better to fit the curves without that number and just check that the result is higher than 10%.

Replies from: David Atkinson
comment by David Atkinson · 2023-01-31T02:35:12.480Z · LW(p) · GW(p)

Thanks very much for catching this. We've updated the extrapolation to only consider the two datapoints that are precisely specified. With so few points, the extrapolation isn't all that trustworthy, so we've also added some language to (hopefully) make that clear.

comment by konstantin ( · 2023-02-01T14:56:25.849Z · LW(p) · GW(p)

Great work, helped me to get clarity on which models I find useful and which ones I don't.
The tool on the page doesn't seem to work for me though, tried Chrome and Safari.

Replies from: David Atkinson
comment by David Atkinson · 2023-02-01T15:43:29.232Z · LW(p) · GW(p)

Should be fixed now! Thanks for noticing this.

Replies from:
comment by konstantin ( · 2023-02-01T16:29:29.188Z · LW(p) · GW(p)

Cheers! Works for me

comment by konstantin ( · 2023-02-01T16:36:41.852Z · LW(p) · GW(p)

To improve the review, an important addition would be to account for the degree to which different methods influence one another.
E.g. Holden and Ajeya influence one another heavily through conversations.  And as Metaculus and Samotsvety, they already incorporate the other models, most notably the bioanchors framework. Maybe you are already correcting for this in the weighted average?

Also, note that e.g., Ajeya uses her own judgment to set the weights for the different models within the bioanchors framework.

Overall, I think right now there is a severe echo chamber effect within most of the forecasts that lets me weigh full outside views, such as the semi-informative priors much higher