Posts

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems 2024-11-19T16:36:40.501Z
AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels 2024-10-28T16:03:39.258Z
AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary 2024-10-01T20:35:32.399Z
AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics 2024-09-11T19:14:08.274Z
Soft Nationalization: how the USG will control AI labs 2024-08-27T15:11:14.601Z
AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety? 2024-08-21T18:09:33.284Z
AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering 2024-07-29T17:50:52.454Z
AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry 2024-07-09T19:28:29.338Z
AI governance needs a theory of victory 2024-06-21T16:15:46.560Z
AI Safety Newsletter #37: US Launches Antitrust Investigations Plus, recent criticisms of OpenAI and Anthropic, and a summary of Situational Awareness 2024-06-18T18:07:45.904Z
AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks 2024-06-05T17:45:25.261Z
AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data 2024-05-16T14:29:21.683Z
AI Clarity: An Initial Research Agenda 2024-05-03T13:54:22.894Z
AISN #34: New Military AI Systems Plus, AI Labs Fail to Uphold Voluntary Commitments to UK AI Safety Institute, and New AI Policy Proposals in the US Senate 2024-05-02T16:12:47.783Z
AISN #33: Reassessing AI and Biorisk Plus, Consolidation in the Corporate AI Landscape, and National Investments in AI 2024-04-12T16:10:57.837Z
Investigating the role of agency in AI x-risk 2024-04-08T15:12:50.791Z
Thousands of malicious actors on the future of AI misuse 2024-04-01T10:08:42.357Z
AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets 2024-03-07T16:39:56.027Z
Scenario planning for AI x-risk 2024-02-10T00:14:11.934Z
AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes 2024-01-24T19:38:33.461Z
AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copyright Infringement, and Congressional Questions about Research Standards in AI Safety 2024-01-04T16:09:31.336Z
AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar 2023-12-07T15:59:11.622Z
AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI 2023-11-15T16:07:37.216Z
AISN #24: Kissinger Urges US-China Cooperation on AI, China's New AI Law, US Export Controls, International Institutions, and Open Source AI 2023-10-18T17:06:54.364Z
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-08-01T15:39:47.841Z
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-07-25T16:58:44.528Z
AISN#15: China and the US take action to regulate AI, results from a tournament forecasting AI risk, updates on xAI’s plan, and Meta releases its open-source and commercially available Llama 2 2023-07-19T13:01:00.939Z

Comments

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:50:36.502Z · LW · GW

Thank you for reading and responding to it! For what it's worth, some of these ideas got rolling during your "AI safety under uncertainty" workshop at EAG Boston.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:47:39.052Z · LW · GW

Yep, another good point, and in principle I agree. A couple of caveats, though:

First, it's not clear to me that experts would agree on enough dynamics to make these clusters predicatively reliable. There might be agreement on the dynamics between scaling laws and timelines (and that's a nice insight!) — but the Killian et al. paper considered 14 variables, which (for example) would be 91 pairwise dynamics to agree on. I'd at least like some data on whether conditional forecasts converge. I think FRI is doing some work on that.

Second, the Grace et al. paper suggested that expert forecasts exhibited framing effects. So, even if experts did agree on underlying dynamics, those agreements might not be able to be reliably elicited. But maybe conditional forecasts are less susceptible to framing effects.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:30:22.681Z · LW · GW

Thanks for the clarification! I didn't mean to imply that Anthropic hasn't been thinking about the full spectrum of risk — only that "misuse" and "autonomy and replication" are the two categories of catastrophic risk explicitly listed in the RSP.

If I do think of a good way to evaluate accident risks before deployment, I'll definitely let you know. (I might actually pitch my team to work on this.)

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:22:47.938Z · LW · GW

Yep, fair enough. I agree that an MTBF of millions of years is an alternative sustainable theory of victory. 

Could you expand on "the challenge is almost entirely in getting to an acceptably low rate"? It's not clear to me that that's true. For example, it seems plausible that at some point nuclear risk was at an acceptably low rate (maybe post-fall of the USSR? I'm niether an expert nor old enough to remember) conditional on a further downward trend — but we didn't get a further downward trend.

Comment by Corin Katzke (corin-katzke) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-12-03T00:19:08.613Z · LW · GW

It’s called “responsible scaling”. In its own name, it conveys the idea that not further scaling those systems as a risk mitigation measure is not an option.

That seems like an uncharitable reading of "responsible scaling." Strictly speaking, the only thing that name implies is that it is possible to scale responsibly. It could be more charitably interpreted as "we will only scale when it is responsible to do so." Regardless of whether Anthropic is getting the criteria for "responsibility" right, it does seem like their RSP leaves open the possibility of not scaling.