Posts

Scenario planning for AI x-risk 2024-02-10T00:14:11.934Z
AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes 2024-01-24T19:38:33.461Z
AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copyright Infringement, and Congressional Questions about Research Standards in AI Safety 2024-01-04T16:09:31.336Z
AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar 2023-12-07T15:59:11.622Z
AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI 2023-11-15T16:07:37.216Z
AISN #24: Kissinger Urges US-China Cooperation on AI, China's New AI Law, US Export Controls, International Institutions, and Open Source AI 2023-10-18T17:06:54.364Z
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-08-01T15:39:47.841Z
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-07-25T16:58:44.528Z
AISN#15: China and the US take action to regulate AI, results from a tournament forecasting AI risk, updates on xAI’s plan, and Meta releases its open-source and commercially available Llama 2 2023-07-19T13:01:00.939Z

Comments

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:50:36.502Z · LW · GW

Thank you for reading and responding to it! For what it's worth, some of these ideas got rolling during your "AI safety under uncertainty" workshop at EAG Boston.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:47:39.052Z · LW · GW

Yep, another good point, and in principle I agree. A couple of caveats, though:

First, it's not clear to me that experts would agree on enough dynamics to make these clusters predicatively reliable. There might be agreement on the dynamics between scaling laws and timelines (and that's a nice insight!) — but the Killian et al. paper considered 14 variables, which (for example) would be 91 pairwise dynamics to agree on. I'd at least like some data on whether conditional forecasts converge. I think FRI is doing some work on that.

Second, the Grace et al. paper suggested that expert forecasts exhibited framing effects. So, even if experts did agree on underlying dynamics, those agreements might not be able to be reliably elicited. But maybe conditional forecasts are less susceptible to framing effects.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:30:22.681Z · LW · GW

Thanks for the clarification! I didn't mean to imply that Anthropic hasn't been thinking about the full spectrum of risk — only that "misuse" and "autonomy and replication" are the two categories of catastrophic risk explicitly listed in the RSP.

If I do think of a good way to evaluate accident risks before deployment, I'll definitely let you know. (I might actually pitch my team to work on this.)

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:22:47.938Z · LW · GW

Yep, fair enough. I agree that an MTBF of millions of years is an alternative sustainable theory of victory. 

Could you expand on "the challenge is almost entirely in getting to an acceptably low rate"? It's not clear to me that that's true. For example, it seems plausible that at some point nuclear risk was at an acceptably low rate (maybe post-fall of the USSR? I'm niether an expert nor old enough to remember) conditional on a further downward trend — but we didn't get a further downward trend.

Comment by Corin Katzke (corin-katzke) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-12-03T00:19:08.613Z · LW · GW

It’s called “responsible scaling”. In its own name, it conveys the idea that not further scaling those systems as a risk mitigation measure is not an option.

That seems like an uncharitable reading of "responsible scaling." Strictly speaking, the only thing that name implies is that it is possible to scale responsibly. It could be more charitably interpreted as "we will only scale when it is responsible to do so." Regardless of whether Anthropic is getting the criteria for "responsibility" right, it does seem like their RSP leaves open the possibility of not scaling.