corin-katzke

Posts
Comments

Posts

AISN #51: AI Frontiers 2025-04-15T16:01:56.701Z

AISN #50: AI Action Plan Responses 2025-03-31T20:13:31.533Z

AISN #49: Superintelligence Strategy 2025-03-06T17:46:50.965Z

AISN #48: Utility Engineering and EnigmaEval 2025-02-18T19:15:16.751Z

AISN #47: Reasoning Models 2025-02-06T18:52:29.843Z

AISN #46: The Transition 2025-01-23T18:09:36.858Z

The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating 2025-01-21T16:57:00.998Z

AISN #45: Center for AI Safety 2024 Year in Review 2024-12-19T18:15:56.416Z

Analysis of Global AI Governance Strategies 2024-12-04T10:45:25.311Z

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems 2024-11-19T16:36:40.501Z

AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels 2024-10-28T16:03:39.258Z

AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary 2024-10-01T20:35:32.399Z

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics 2024-09-11T19:14:08.274Z

Soft Nationalization: how the USG will control AI labs 2024-08-27T15:11:14.601Z

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety? 2024-08-21T18:09:33.284Z

AI Safety Newsletter #39: Implications of a Trump Administration for AI Policy Plus, Safety Engineering 2024-07-29T17:50:52.454Z

AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry 2024-07-09T19:28:29.338Z

AI governance needs a theory of victory 2024-06-21T16:15:46.560Z

AI Safety Newsletter #37: US Launches Antitrust Investigations Plus, recent criticisms of OpenAI and Anthropic, and a summary of Situational Awareness 2024-06-18T18:07:45.904Z

AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks 2024-06-05T17:45:25.261Z

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data 2024-05-16T14:29:21.683Z

AI Clarity: An Initial Research Agenda 2024-05-03T13:54:22.894Z

AISN #34: New Military AI Systems Plus, AI Labs Fail to Uphold Voluntary Commitments to UK AI Safety Institute, and New AI Policy Proposals in the US Senate 2024-05-02T16:12:47.783Z

AISN #33: Reassessing AI and Biorisk Plus, Consolidation in the Corporate AI Landscape, and National Investments in AI 2024-04-12T16:10:57.837Z

Investigating the role of agency in AI x-risk 2024-04-08T15:12:50.791Z

Thousands of malicious actors on the future of AI misuse 2024-04-01T10:08:42.357Z

AISN #32: Measuring and Reducing Hazardous Knowledge in LLMs Plus, Forecasting the Future with LLMs, and Regulatory Markets 2024-03-07T16:39:56.027Z

Scenario planning for AI x-risk 2024-02-10T00:14:11.934Z

AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes 2024-01-24T19:38:33.461Z

AISN #29: Progress on the EU AI Act Plus, the NY Times sues OpenAI for Copyright Infringement, and Congressional Questions about Research Standards in AI Safety 2024-01-04T16:09:31.336Z

AISN #27: Defensive Accelerationism, A Retrospective On The OpenAI Board Saga, And A New AI Bill From Senators Thune And Klobuchar 2023-12-07T15:59:11.622Z

AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI 2023-11-15T16:07:37.216Z

AISN #24: Kissinger Urges US-China Cooperation on AI, China's New AI Law, US Export Controls, International Institutions, and Open Source AI 2023-10-18T17:06:54.364Z

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-08-01T15:39:47.841Z

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer 2023-07-25T16:58:44.528Z

AISN#15: China and the US take action to regulate AI, results from a tournament forecasting AI risk, updates on xAI’s plan, and Meta releases its open-source and commercially available Llama 2 2023-07-19T13:01:00.939Z

Comments

Comment by Corin Katzke (corin-katzke) on The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating · 2025-01-21T17:34:36.144Z · LW · GW

our

Note: coauthored by Gideon Futerman.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:50:36.502Z · LW · GW

Thank you for reading and responding to it! For what it's worth, some of these ideas got rolling during your "AI safety under uncertainty" workshop at EAG Boston.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:47:39.052Z · LW · GW

Yep, another good point, and in principle I agree. A couple of caveats, though:

First, it's not clear to me that experts would agree on enough dynamics to make these clusters predicatively reliable. There might be agreement on the dynamics between scaling laws and timelines (and that's a nice insight!) — but the Killian et al. paper considered 14 variables, which (for example) would be 91 pairwise dynamics to agree on. I'd at least like some data on whether conditional forecasts converge. I think FRI is doing some work on that.

Second, the Grace et al. paper suggested that expert forecasts exhibited framing effects. So, even if experts did agree on underlying dynamics, those agreements might not be able to be reliably elicited. But maybe conditional forecasts are less susceptible to framing effects.

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:30:22.681Z · LW · GW

Thanks for the clarification! I didn't mean to imply that Anthropic hasn't been thinking about the full spectrum of risk — only that "misuse" and "autonomy and replication" are the two categories of catastrophic risk explicitly listed in the RSP.

If I do think of a good way to evaluate accident risks before deployment, I'll definitely let you know. (I might actually pitch my team to work on this.)

Comment by Corin Katzke (corin-katzke) on Scenario planning for AI x-risk · 2024-02-13T17:22:47.938Z · LW · GW

Yep, fair enough. I agree that an MTBF of millions of years is an alternative sustainable theory of victory.

Could you expand on "the challenge is almost entirely in getting to an acceptably low rate"? It's not clear to me that that's true. For example, it seems plausible that at some point nuclear risk was at an acceptably low rate (maybe post-fall of the USSR? I'm niether an expert nor old enough to remember) conditional on a further downward trend — but we didn't get a further downward trend.

Comment by Corin Katzke (corin-katzke) on Responsible Scaling Policies Are Risk Management Done Wrong · 2023-12-03T00:19:08.613Z · LW · GW

It’s called “responsible scaling”. In its own name, it conveys the idea that not further scaling those systems as a risk mitigation measure is not an option.

That seems like an uncharitable reading of "responsible scaling." Strictly speaking, the only thing that name implies is that it is possible to scale responsibly. It could be more charitably interpreted as "we will only scale when it is responsible to do so." Regardless of whether Anthropic is getting the criteria for "responsibility" right, it does seem like their RSP leaves open the possibility of not scaling.