When reporting AI timelines, be clear who you're deferring to

post by Sam Clarke · 2022-10-10T14:24:14.504Z · LW · GW · 6 comments

6 comments

Comments sorted by top scores.

comment by jacob_cannell · 2022-10-10T18:59:37.031Z · LW(p) · GW(p)

I was somewhat surprised by how seriously people here took the BioAnchors model predictions, given that the model is not really Bayesian: it is not a simple model which postdicts the past. If you take any of its main subanchors and apply them to predict the arrival of ANNs that match primate vision, or ANNs that match linguistic cortex, etc, they clearly are wildly miscalibrated and predict OOM higher compute requirements than were actually required. So I spent a couple of days gathering the data and wrote up a simple model [LW · GW] that does seem to reasonably postdict the past; I now defer to that (completely shameless plug).

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-10-11T00:12:21.282Z · LW(p) · GW(p)

I don't think that's fair. BioAnchors is the best model publicly available by many legible metrics; it does postdict the past reasonably well and while I agree that your model postdicts the past better, you didn't really spend much effort arguing for this in your post, so you shouldn't be satisfied with the fact that I agree with your conclusion. Also, I think you are using the word "Bayesian" in an unusual way here, I normally hear the phrase "bayesian model" used to mean something else, something which Bio Anchors definitely counts as.

That said, for those watching along, I do think the compute requirements Ajeya estimates are OOMs too high and have argued as much myself, and I do encourage people to think about Jacob's argument here & go read his post. I'm especially excited to see him elaborate on the (imo plausible) claim that if you apply the Bio Anchors framework to predicting e.g. human-level vision or audio or whatever, it'd be very surprised by recent progress in those areas, and therefore that it overestimates compute requirements. I'd be curious to hear Ajeya's response to that argument.

Replies from: jacob_cannell
comment by jacob_cannell · 2022-10-11T06:34:35.257Z · LW(p) · GW(p)

BioAnchors is the best model publicly available by many legible metrics; it does postdict the past reasonably well and while I agree that your model postdicts the past better, you didn't really spend much effort arguing for this in your post, so you shouldn't be satisfied with the fact that I agree with your conclusion.

I did spend effort arguing for a model that postdicts the past, that's much of the point of my post. So perhaps you mean I didn't spend effort comparing said postdiction ability to that of he BioAnchors model. I sketched a computable technique over the relevant dataset to a sufficient level of detail that the reader hopefully can simulate and predict the general outcome. I could go farther and actually evaluate said model on a larger dataset, but it's somewhat time consuming and the utility of that is mostly constrained by how one evaluates the intelligence or salient equivalent capabilities of various systems.

Also, I think you are using the word "Bayesian" in an unusual way here, I normally hear the phrase "bayesian model" used to mean something else, something which Bio Anchors definitely counts as.

A Bayesian model has a specific form - it is a model that computes a posterior as the product of a likelihood (which computes evidence fit) and a prior, where the prior minimally must weight against bit complexity, ala Occam/Solomonoff. The likelihood component measures the postdiction fit over the relevant evidence p(E|H) - and is required for any bayesian model.

So when I say the BioAnchors model isn't attempting to be Bayesian, I believe that is just straightforwardly true in the sense that (from what I recall at least), it doesn't even really attempt to postdict the relevant historical evidence. Now sure you could argue it's doing that implicitly, but bayesian models are usually very explicit about their likelihood (postdiction) fit.

I'm especially excited to see him elaborate on the (imo plausible) claim that if you apply the Bio Anchors framework to predicting e.g. human-level vision or audio or whatever, it'd be very surprised by recent progress in those areas, and therefore that it overestimates compute requirements.

Yeah so I encourage readers to try this on their own ... I may get to it myself and write it up, but I would need to actually study BioAnchors more closely. I didn't even look at it when writing my post, this comparison only (admittedly naturally) came up later in comments.

comment by Quadratic Reciprocity · 2022-12-25T17:52:27.332Z · LW(p) · GW(p)

Did you publish the results from the timelines deference survey / are you intending to at some point in the future? Would be very cool to look at 

Replies from: Sam Clarke, Sam Clarke
comment by Sam Clarke · 2023-01-09T11:10:41.132Z · LW(p) · GW(p)

Sorry for late, will be out this month!