Survey on the acceleration risks of our new RFPs to study LLM capabilities

post by Ajeya Cotra (ajeya-cotra) · 2023-11-10T23:59:52.515Z · LW · GW · 1 comments

1 comments

Comments sorted by top scores.

comment by Akash (akash-wasil) · 2023-11-11T15:01:29.064Z · LW(p) · GW(p)

Thanks for sharing this! I'm curious if you have any takes on Nate's comment or Oliver's comment:

Nate:

I don't think we have any workable plan for reacting to the realization that dangerous capabilities are upon us. I think that when we get there, we'll predictably either (a) optimize against our transparency tools or otherwise walk right of the cliff-edge anyway, or (b) realize that we're in deep trouble, and slow way down and take some other route to the glorious transhumanist future (we might need to go all the way to WBE, or at least dramatically switch optimization paradigms). 

Insofar as this is true, I'd much rather see efforts go _now_ into putting hard limits on capabilities in this paradigm, and booting up alternative paradigms (that aren't supposed to be competitive with scaling, but that are hopefully competitive with what individuals can do on home computers). I could see evals playing a role in that policy (of helping people create sane capability limits and measure whether they're being enforced), but that's not how I expect evals to be used on the mainline.

Oliver:

I have a generally more confident take that slowing things down is good, i.e. don't find arguments that "current humanity is better suited to handle the singularity" very compelling. 

I think I am also more confident that it's good for people to openly and straightforwardly talk about existential risk from AI. 

I am less confident in my answer to the question of "is generic interpretability research cost-effective or even net-positive?". My guess is still yes, but I really feel very uncertain, and feel a bit more robust in my answer to your question than that question.