Safe Search is off: root causes of AI catastrophic risks

post by Jemal Young (ghostwheel) · 2025-01-31T18:22:43.947Z · LW · GW · 0 comments

Contents

  Root cause #1: Reliance on deep learning without knowing how to do it safely
  Root cause #2: Pressure to make progress on the most powerful capabilities
  Root cause #3: Exploring the AI design space without a clear stopping point
  Interaction effects
None
No comments

Epistemic status: My best guess

When I look at advanced AI development, I see three general conditions that seem to be the root causes of all catastrophic risks:

Others have called out these conditions in various ways, but to my knowledge only within broader discussions. I think these conditions are easy to recognize and their riskiness is easy to explain, which may make this root-cause framing useful for public engagement. With this post, I’ve tried to articulate how each condition on its own enables catastrophic risks and how interactions between conditions increase the likelihood and severity of risks. My hope is that a concise articulation of these root causes will be a helpful reference for outreach work.

Root cause #1: Reliance on deep learning without knowing how to do it safely

Deep learning is the most effective way we know of to unlock capabilities and improve performance on practically any task, and advanced AI development has come to depend on it. The trouble is that we don’t understand deep learning well enough to ensure that models behave as intended when we need them to.

It's common to observe behaviors that violate intended specifications after deployment, even in systems with guardrails. Models show sudden performance jumps and develop capabilities through mechanisms we don't fully understand. Despite years of research, basic problems like goal misgeneralization and robustness persist alongside capability gains.

Lacking the theoretical foundation to make highly predictable, reliable deep learning models is fine for trivial domains. Yet deep learning-based systems are likely to perform crucial tasks in domains of catastrophic risk—such as critical infrastructure,[1] space operations,[2] and nuclear command, control, and communications (NC3).[3] In domains like these, unintended behaviors could lead to catastrophe even with human oversight. In NC3, for example, a deep learning-based detection system compromised by inadequate training data or adversarial attacks could make false identifications.[4] This could trigger a human-ordered second strike in retaliation for a first strike that never happened.

As deep learning moves into domains where mistakes can be unrecoverable, not knowing how to make reliable models becomes increasingly dangerous.

Root cause #2: Pressure to make progress on the most powerful capabilities

Competition drives advanced AI development, alongside other motivating factors such as ambition and curiosity. These drivers create pressure to make progress on the most powerful capabilities—like reasoning, abstraction, planning, tool use, and autonomous pursuit of goals—because that's where the biggest competitive advantages can be gained, where the biggest economic opportunities lie, where researchers and institutions can earn the most prestige, and where some of the most interesting technical challenges lie.

AI systems that can reason through complex problems, come up with novel strategies, and use tools—tools that include narrow AI—to pursue goals in the physical world will be useful beyond their intended purposes. What it takes to discover drugs is what it takes to discover chemical weapons.[5] Capabilities that can be applied to any goal can be applied to potentially catastrophic ones, and the pressure to make progress on such capabilities is not letting up.

Root cause #3: Exploring the AI design space without a clear stopping point

The part of the AI design space that humans can potentially reach is vast. Even our current approaches allow an enormous range of possibilities. We’re looking for power, and there is no clear stopping point.

We already see optimization processes finding unexpected solutions that violate intended constraints, because that’s what happens when effective search meets opportunity. Smarter systems will be better at this than current systems. If a system is smarter than us, it will be better than us at noticing opportunities to exploit our intended constraints.

The longer we explore the AI design space, the more likely we are to make systems that can change our environment in catastrophic ways. Without technical solutions to constrain such behavior, the likelihood of catastrophe will grow as long as we explore in directions of greater capability. Exploring without a stopping point means accepting this risk.

Interaction effects

When immature deep learning theory meets pressure to make progress on the most powerful capabilities, we get unreliable behavior from increasingly powerful systems.

When pressure to make progress on the most powerful capabilities meets open-ended exploration, we efficiently guide ourselves toward the most dangerous regions we can reach in the AI design space.

Interactions between these root causes affect the likelihood and severity of risks. Pressure to make progress on the most powerful capabilities is more likely to lead to catastrophe if we continue relying on deep learning without knowing how to do it safely. The risks of relying on deep learning without knowing how to do it safely become more severe under pressure to make progress on the most powerful capabilities. And pressure to make progress on the most powerful capabilities makes us more likely to push toward regions of the AI design space where the risks are most severe.

These root causes and their interactions characterize the current state of advanced AI development.

  1. ^

    Gerstein, Daniel M. and Erin N. Leidy, Emerging Technology and Risk Analysis: Artificial Intelligence and Critical Infrastructure. Homeland Security Operational Analysis Center operated by the RAND Corporation, 2024. https://www.rand.org/pubs/research_reports/RRA2873-1.html.

  2. ^

    "Linking Large Language Models for Space Domain Awareness." Defense One, 12 Jan. 2024, www.defenseone.com/sponsors/2024/01/linking-large-language-models-space-domain-awareness/393302/. Accessed 28 Jan. 2025.

  3. ^

    Saltini, Alice. "AI and Nuclear Command, Control and Communications: P5 Perspectives." The European Leadership Network (2023).

  4. ^

    Saltini, Alice. "To Avoid Nuclear Instability, a Moratorium on Integrating AI into Nuclear Decision-making Is Urgently Needed: The NPT PrepCom Can Serve As a Springboard." European Leadership Network, 12 Jan. 2024, europeanleadershipnetwork.org/commentary/to-avoid-nuclear-instability-a-moratorium-on-integrating-ai-into-nuclear-decision-making-is-urgently-needed-the-npt-prepcom-can-serve-as-a-springboard/. Accessed 28 Jan. 2025.

  5. ^

    Urbina, Fabio et al. “Dual Use of Artificial Intelligence-powered Drug Discovery.” Nature machine intelligence vol. 4,3 (2022): 189-191. doi:10.1038/s42256-022-00465-9

0 comments

Comments sorted by top scores.