On the Rationality of Deterring ASI

post by Dan H (dan-hendrycks) · 2025-03-05T16:11:37.855Z · LW · GW · 9 comments

This is a link post for nationalsecurity.ai/

Contents

  Executive Summary
    Deterrence
    Nonproliferation
    Competitiveness
  Additional Commentary
      Emphasize terrorist-proof security over superpower-proof security.
      Robust compute security is plausible and incentive-compatible.
      “Superweapons” as a motivating concern for state competition in AI.
      Against An “AI Manhattan Project”
None
9 comments

I’m releasing a new paper “Superintelligence Strategy” alongside Eric Schmidt (formerly Google), and Alexandr Wang (Scale AI). Below is the executive summary, followed by additional commentary highlighting portions of the paper which might be relevant to this collection of readers.

Executive Summary

Rapid advances in AI are poised to reshape nearly every aspect of society. Governments see in these dual-use AI systems a means to military dominance, stoking a bitter race to maximize AI capabilities. Voluntary industry pauses or attempts to exclude government involvement cannot change this reality. These systems that can streamline research and bolster economic output can also be turned to destructive ends, enabling rogue actors to engineer bioweapons and hack critical infrastructure. “Superintelligent” AI surpassing humans in nearly every domain would amount to the most precarious technological development since the nuclear bomb. Given the stakes, superintelligence is inescapably a matter of national security, and an effective superintelligence strategy should draw from a long history of national security policy.

Deterrence

A race for AI-enabled dominance endangers all states. If, in a hurried bid for superiority, one state inadvertently loses control of its AI, it jeopardizes the security of all states. Alternatively, if the same state succeeds in producing and controlling a highly capable AI, it likewise poses a direct threat to the survival of its peers. In either event, states seeking to secure their own survival may preventively sabotage competing AI projects. A state could try to disrupt such an AI project with interventions ranging from covert operations that degrade training runs to physical damage that disables AI infrastructure. Thus, we are already approaching a dynamic similar to nuclear Mutual Assured Destruction (MAD), in which no power dares attempt an outright grab for strategic monopoly, as any such effort would invite a debilitating response. This strategic condition, which we refer to as Mutual Assured AI Malfunction (MAIM), represents a potentially stable deterrence regime, but maintaining it could require care. We outline measures to maintain the conditions for MAIM, including clearly communicated escalation ladders, placement of AI infrastructure far from population centers, transparency into datacenters, and more.

Nonproliferation

While deterrence through MAIM constrains the intent of superpowers, all nations have an interest in limiting the AI capabilities of terrorists. Drawing on nonproliferation precedents for weapons of mass destruction (WMDs), we outline three levers for achieving this. Mirroring measures to restrict key inputs to WMDs such as fissile material and chemical weapons precursors, compute security involves knowing reliably where high-end AI chips are and stemming smuggling to rogue actors. Monitoring shipments, tracking chip inventories, and employing security features like geolocation can help states account for them. States must prioritize information security to protect the model weights underlying the most advanced AI systems from falling into the hands of rogue actors, similar to controls on other sensitive information. Finally, akin to screening protocols for DNA synthesis services to detect and refuse orders for known pathogens, AI companies can be incentivized to implement technical AI security measures that detect and prevent malicious use.

Competitiveness

Beyond securing their survival, states will have an interest in harnessing AI to bolster their competitiveness, as successful AI adoption will be a determining factor in national strength. Adopting AI-enabled weapons and carefully integrating AI into command and control is increasingly essential for military strength. Recognizing that economic security is crucial for national security, domestic capacity for manufacturing high-end AI chips will ensure a resilient supply and sidestep geopolitical risks in Taiwan. Robust legal frameworks governing AI agents can set basic constraints on their behavior that follow the spirit of existing law. Finally, governments can maintain political stability through measures that improve the quality of decision-making and combat the disruptive effects of rapid automation.

By detecting and deterring destabilizing AI projects through intelligence operations and targeted disruption, restricting access to AI chips and capabilities for malicious actors through strict controls, and guaranteeing a stable AI supply chain by investing in domestic chip manufacturing, states can safeguard their security while opening the door to unprecedented prosperity.

Additional Commentary

There are several arguments from the paper worth highlighting.

Emphasize terrorist-proof security over superpower-proof security.

Though there are benefits to state-proof security (SL5), this is a remarkably daunting task that is arguably much less crucial than reaching security against non-state actors and insider threats (SL3 or SL4).

Robust compute security is plausible and incentive-compatible.

Treating high-end AI compute like fissile material or chemical weapons appears politically and technically feasible, and we can draw from humanity’s prior experience managing WMD inputs for an effective playbook. Compute security interventions we recommend in the paper include:

Additionally, states may demand certain transparency measures from each other’s AI projects, using their ability to maim projects as leverage. AI-assisted transparency measures, which might involve AIs inspecting code and outputting single-bit compliance signals, might make states much more likely to agree to transparency measures. We believe technical work on these sorts of verification measures is worth aggressively pursuing as it becomes technologically feasible.

We draw a distinction between compute security efforts that deny compute to terrorists, and efforts to prevent powerful nation-states from acquiring or using compute. The latter is worth considering, but our focus in the paper is on interventions which would prevent rogue states or non-state actors from acquiring large amounts of compute. Security of this type is incentive-compatible: powerful nations will want states to know where their high-end chips are, for the same reason that the US has an interest in Russia knowing where its fissile material is. Powerful nations can deter each other in various ways, but nonstate actors cannot be subject to robust deterrence.

“Superweapons” as a motivating concern for state competition in AI.

A controlled superintelligence would possibly grant its wielder a “strategic monopoly on power” over the world—complete power to shape its fate. Many readers here would already find this plausible, but it’s worth mentioning that this probably requires undermining mutual assured destruction (MAD), a high bar. Nonetheless, there are several ways MAD may be circumvented by a nation wielding superintelligence. Mirroring a recent paper, we mention several “superweapons”—feasible technological advances that would question nuclear deterrence between states. The prospect of AI-enabled superweapons helps convey why powerful states will not accept a large disadvantage in AI capabilities.

Against An “AI Manhattan Project”

A US “AI Manhattan Project” to build superintelligence is ill-advised because it would be destructively sabotaged by rival states. Its datacenters would be easy to detect and target. Many researchers at American labs have backgrounds and family in rival nations, and many others would fail to get a security clearance. The time and expense to secure sensitive information against dedicated superpowers would trade off heavily with American AI competitiveness, to say nothing of what it would cost to harden a frontier datacenter against physical attack. If they aren’t already, rival states will soon be fully aware of the existential threat that US achievement of superintelligence would pose for them (regardless of whether it is controlled), and they will not sit idly by if an actor is transparently aiming for a decisive strategic advantage, as discussed in [12].

9 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2025-03-05T22:38:38.552Z · LW(p) · GW(p)

Cyberattacks can't disable anything with any reliability or for more than days to weeks though, and there are dozens of major datacenter campuses from multiple somewhat independent vendors. Hypothetical AI-developed attacks might change that, but then there will also be AI-developed information security, adapting to any known kinds of attacks and stopping them from being effective shortly after. So the MAD analogy seems tenuous, the effect size (of this particular kind of intervention) is much smaller, to the extent that it seems misleading to even mention cyberattacks in this role/context.

comment by Julian Bradshaw · 2025-03-05T20:49:50.313Z · LW(p) · GW(p)

This is creative.

TL;DR: To mitigate race dynamics, China and the US should deliberately leave themselves open to the sabotage ("MAIMing") of their frontier AI systems. This gives both countries an option other than "nuke the enemy"/"rush to build superintelligence first" if superintelligence appears imminent: MAIM the opponent's AI. The deliberately unmitigated risk of being MAIMed also encourages both sides to pursue carefully-planned and communicated AI development, with international observation and cooperation, reducing AINotKillEveryone-ism risks.

The problem with this plan is obvious: with MAD, you know for sure that if you nuke the other guy, you're gonna get nuked in return. You can't hit all the silos, all the nuclear submarines. With MAIM, you can't be so confident: maybe the enemy's cybersecurity has gotten too good, maybe efficiency has improved and they don't need all their datacenters, maybe their light AGI has compromised your missile command.

So the paper argues for at least getting as close as possible to assurance that you'll get MAIMed in return: banning underground datacenters, instituting chip control regimes to block rogue actors, enforcing confidentiality-preserving inspections of frontier AI development.

Definitely worth considering. Appreciate the writeup.

comment by Ben Livengood (ben-livengood) · 2025-03-05T23:30:31.422Z · LW(p) · GW(p)

I have significant misgivings about the comparison with MAD which relies on overwhelming destructive response being available and thus renders a debilitating first-strike being unavailable.

With AGI a first strike seems both likely to succeed and predicted in advance by several folks in several ways (full takeover, pivotal act, singleton outcome) whereas only a few (Von Neumann) argued for a first strike before the USSR obtained nuclear weapons, with no arguments I am aware of after they did.

If an AGI takeover is likely to trigger MAD itself then that is a separate and potentially interesting line of reasoning, but I don't see the inherent teeth in MAIM.  If countries are in a cold war rush to AGI then the most well-funded and covert attempt will achieve AGI first and likely initiate a first strike that circumvents MAD itself through new technological capabilities.

Replies from: Julian Bradshaw
comment by Julian Bradshaw · 2025-03-06T00:17:11.332Z · LW(p) · GW(p)

I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.

If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there's a general agreement that's trusted by all sides, then it's substantially more likely superintelligence isn't used to perform first strikes (and that it doesn't kill everyone), because who would agree without strong guarantees against that?

(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined - you can't as easily prove "hey this is is just a civilian nuclear reactor, we're not making weapons-grade stuff here". But an attempt is perhaps worthwhile.)

Replies from: ben-livengood
comment by Ben Livengood (ben-livengood) · 2025-03-06T01:02:24.923Z · LW(p) · GW(p)

I think MAIM might only convince people who have p(doom) < 1%.

If we're at the point that we can convincingly say to each other "this AGI we're building together can not be used to harm you" we are way closer to p(doom) == 0 than we are right now, IMHO.

Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all?  The risk is "anyone gets AGI" until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn't perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it's likely to encompass 99.9% of what other human blobs care about.

Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other.  And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.

comment by niplav · 2025-03-05T18:18:32.943Z · LW(p) · GW(p)

Link in the first line of the post probably should also be https://www.nationalsecurity.ai/.

Replies from: oliver-zhang
comment by ozhang (oliver-zhang) · 2025-03-05T18:35:17.491Z · LW(p) · GW(p)

Thank you! This has been updated.

comment by mikko (morrel) · 2025-03-05T17:48:12.512Z · LW(p) · GW(p)

If we find ourselves in a world where ASI seems imminent and nations understand its implications, I'd predict that time will be more characterized by mutually assured cooperation rather than sabotage. One key reason for this is that if one nation is seen as leading the race and trying to grab a strategic monopoly via AI, both its allies and enemies will have similar incentives to pursue safety — via safety assurances or military action. There are quite a lot of agreeable safety assurances we can develop and negotiate (some of which you discuss in the paper), and pursuing them will very likely be attempted before direct military escalation. A surprisingly likely end result and stable equilibrium of this then seems to be one where ASI is developed and tightly monitored as an international effort.

This equilibrium of cooperation seems like a plausible outcome the more it's understood that:

  • ASI can be hugely beneficial
  • The alignment problem and loss of control pose an even larger risk than national conflicts
  • Trying to develop ASI for a strategic advantage over other nations carries higher risk of both national conflict and loss of control, but does not much impact its benefits over the alternative

While sabotage and military power are the deterrent, it seems unlikely they will be the action taken; there will likely be no clear points at which to initiate a military conflict, no "fire alarm [LW · GW]" — while at the same time nations will feel pressured to act before it is too late. This is an unstable equilibrium that all parties will be incentivized to de-escalate, resulting in "mutually assured cooperation".

That said, I recognize this cooperation-focused perspective may appear optimistic. The path to "mutually assured cooperation" is far from guaranteed. Historical precedents for international cooperation on security matters are mixed[1]. Differences in how nations perceive AI risks and benefits, varying technological capabilities, domestic political pressures, and the unpredictable nature of AI progress itself could all dramatically alter this dynamic. The paper's MAIM framework may indeed prove more accurate if trust breaks down or if one actor believes they can achieve decisive strategic advantage before others can respond. I'm curious how others view the balance of incentives between competition and cooperation in this context.

 

  1. ^

    I like the anecdote that the Cuban missile crisis was at its peak defused because the nations found a deal that was plainly rational and fair, with Kennedy saying he would be in an insupportable position to refuse the deal because "it’s gonna — to any man at the United Nations or any other rational man, it will look like a very fair trade”.

comment by O O (o-o) · 2025-03-05T23:38:52.783Z · LW(p) · GW(p)

I've always wondered, why didn't superpowers apply MAIM to nuclear capabilities in the past?

> Speculative but increasingly plausible, confidentiality-preserving AI verifiers

Such as?