↑ comment by Sammy Martin (SDM) ·
2022-05-16T12:29:07.330Z · LW(p) · GW(p)
Essentially, the problem is that 'evidence that shifts Bio Anchors weightings' is quite different, more restricted, and much harder to define than the straightforward 'evidence of impressive capabilities'. However, the reason that I think it's worth checking if new results are updates is that some impressive capabilities might be ones that shift bio anchors weightings. But impressiveness by itself tells you very little.
I think a lot of people with very short timelines are imagining the only possible alternative view as being 'another AI winter, scaling laws bend, and we don't get excellent human-level performance on short term language-specified tasks anytime soon', and don't see the further question of figuring out exactly what human-level on e.g. MMLI would imply.
This is because the alternative to very short timelines from (your weightings on) Bio Anchors isn't another AI winter, rather it's that we do get all those short-term capabilities soon, but have to wait a while longer to crack long-term agentic planning because that doesn't come "for free" from competence on short-term tasks, if you're as sample-inefficient as current ML is.
So what we're really looking for isn't systems getting progressively better and better at short-horizon language tasks. That's something that either the lifetime-anchor Bio Anchors view or the original Bio Anchors view predicts, and we need something that discriminates between the two.
We have some (indirect) evidence that original bio anchors is right: namely that it being wrong implies evolution missed an obvious open goal to make bees and mice generally intelligent long term planners, and that human beings generally aren't vastly better than evolution at designing things anyway, and the lifetime anchor would imply that AGI is a glaring exception to this general trend.
As evidence, this has the advantage of being about something that really happened: human beings are the only human-level general intelligence that exists so far, so we have very good reasons to think matching the human brain is sufficient. However, it has the disadvantage of all the usual disanalogies between evolution and its requirements, and human designers and our requirements. Maybe this just is one of those situations where we can outdo evolution: that's not especially unlikely.
What's the evidence on the other side (i.e. against original bio anchors and for the lifetime anchor)?
There are two kinds that I tend to hear. One is that short-horizon competence is enough for dangerous/transformative capabilities. E.g. the claim that if you can build something that's "human level/superhuman at charisma/persuasion/propaganda/manipulation, at least on short timescales" that represents a gigantic existential risk factor that condemns us to disaster further down the line (the AI PONR idea), or that at this point actors with bad incentives will be far too influential/wealthy/advancing the SOTA in AI.
However, I'd consider this changing the subject: essentially it's not an argument for AGI takeover soon, rather it's an argument for 'certain narrow AIs are far more dangerous than you realize'. That means you have to go all the way back to the start and argue for why such things would be catastrophic in the first place. We can't rely on the simple "it'll be superintelligent and seize a DSA".
Suppose we get such narrow AIs, that can do most short-term tasks for which there's data, but don't generalize to long horizons consistently. This scenario 10 years from now looks something like: AI automates away lots of jobs, can do certain kinds of short-term persuasion and manipulation, can speed up capabilities and alignment research, but not fully replace human researchers. Some of these AIs are agentic and possibly also misaligned (in ways that are detectable and fall far short of the ability to take over, since by assumption they aren't competitive with humans at long-term planning). This certainly seems wild and full of potential danger, where slowing down progress could be much harder. It also looks like a scenario with far more attention on AI alignment than today [LW · GW], where the current funders of alignment research are much wealthier than now, and with plenty of obvious examples of what the problem is to catch people's attention. Overall, it doesn't seem like a scenario where (current AI alignment researchers + whoever else is working on it in 10 years) have considerably less leverage over the future than now: it could easily be more [LW · GW].
The other reason for favouring the lifetime anchor is you get long-horizon competence for free once you're excellent at (a given list of) short-horizon tasks. This is arguing, more or less, that for the tasks that matter, current architectures are brainlike in their efficiency, such that the lifetime anchor makes more sense. A lot of the arguments in favour of this have a structure roughly like: look at a wide-ranging comprehension benchmark like MMLI - when an AI is human level on all of this, it'll be able to keep a train of thought running continuously, keep a working memory and plan over very long timescales the same way humans do.
As evidence, this has the significant advantage of being relevant and not having to deal with the vagaries of what tradeoffs evolution may have made differently to human engineers. It has the disadvantage of being fiction. [LW · GW] Or at least evidence that's not yet been observed. You see AIs getting more and more impressive at a wider range of short-horizon tasks, which is roughly compatible with either view, but you don't observe the described outcome of them generalizing out to much longer-term tasks than that.
So, to return to the original question, what would count as (additional) evidence in favour of the lifetime anchor? The answer clearly can't be "nothing", since if we build AGI in 5 years, that counts.
I think the answer is, anything that looks like unexpectedly cheap, easy, 'for free' generalization from relatively shorter to relatively longer horizon tasks (e.g. from single reasoning steps to many reasoning steps) without much fine-tuning.
This is different from many of the other signs of impressiveness we've seen recently: just learning lots of shorter-horizon tasks without much transfer between them [LW(p) · GW(p)], being able to point models successfully at particular short-horizon tasks with good prompting, getting much better at a wider range of tasks that can only be done over short horizons. All of these are expected on either view.
This unexpected evidence is very tricky to operationalize. Default bio anchors assumes we'll see a certain degree of generalizing from shorter to longer horizon tasks, and that we'll see AI get better and better sample-efficiency on few-shot tasks, since it assumes that in 20 or so years we'll get enough of such generalization to get AGI. I guess we just need to look for 'more of it than we expected to see'?
That seems very hard to judge, since you can't read off predictions about subhuman capabilities from bio anchors like that.Replies from: rohinmshah, not-relevant
↑ comment by Rohin Shah (rohinmshah) ·
2022-05-16T13:12:51.231Z · LW(p) · GW(p)
Yeah, this all seems right to me.
when an AI is human level on all of this, it'll be able to keep a train of thought running continuously.
It does not seem to me like "can keep a train of thought running" implies "can take over the world" (or even "is comparable to a human"). I guess the idea is that with a train of thought you can do amplification? I'd be pretty surprised if train-of-thought-amplification on models of today (or 5 years from now) led to novel high quality scientific papers, even in fields that don't require real-world experimentation.
↑ comment by Not Relevant (not-relevant) ·
2022-05-16T13:07:05.316Z · LW(p) · GW(p)
I think this is the best writeup about this I’ve seen, and I agree with the main points, so kudos!
I do think that evidence of increasing returns to scale of multi-step chain of thought prompting are another weak datapoint in favor of the human lifetime anchor.
I also think there are pretty reasonable arguments that NNs may be more efficient than the human brain at converting flops to capabilities, e.g. if SGD is a better version of the best algorithm that can be implemented on biological hardware. Similarly, humans are exposed to a much smaller diversity of data than LMs (the internet is big and weird), and thus they may get more “novelty” per flop and thus generalize better from less data. My main point here is just that “biology is optimal” isn’t as strong a rejoinder when we’re comparing a process so different from what biology did.