Verification methods for international AI agreements

post by Akash (akash-wasil) · 2024-08-31T14:58:10.986Z · LW · GW · 1 comments

This is a link post for https://arxiv.org/abs/2408.16074

Contents

    Overview
  Abstract
  Executive summary
    Verification methods
      National technical means
      Access-dependent methods
      Hardware-dependent methods
    Limitations and considerations
    Future directions
None
1 comment

TLDR: A new paper summarizes some verification methods for international AI agreements. See also summaries on LinkedIn and Twitter

Several co-authors and I are currently planning some follow-up projects about verification methods. There are also at least 2 other groups planning to release reports on verification methods. If you have feedback or are interested in getting involved, please feel free to reach out.

Overview

There have been many calls for potential international agreements around the development or deployment of advanced AI. If governments become more concerned about AI risks, there might be a short window of time in which ambitious international proposals are seriously considered. If this happens, I expect many questions will be raised, such as:

Our paper attempts to get readers thinking about these questions and considering the kinds of verification methods that nations could deploy. The paper is not conclusive– its main goal is to provide some framings/concepts/descriptions/examples that can help readers orient to this space & inspire future research. 

I'd be especially interested in feedback on the following questions:

Abstract

What techniques can be used to verify compliance with international agreements about advanced AI development? In this paper, we examine 10 verification methods that could detect two types of potential violations: unauthorized AI training (e.g., training runs above a certain FLOP threshold) and unauthorized data centers. We divide the verification methods into three categories: (a) national technical means (methods requiring minimal or no access from suspected non-compliant nations), (b) access-dependent methods (methods that require approval from the nation suspected of unauthorized activities), and (c) hardware-dependent methods (methods that require rules around advanced hardware). For each verification method, we provide a description, historical precedents, and possible evasion techniques. We conclude by offering recommendations for future work related to the verification and enforcement of international AI governance agreements.

Executive summary

Efforts to maximize the benefits and minimize the global security risks of advanced AI may lead to international agreements. This paper outlines methods that could be used to verify compliance with such agreements. The verification methods we cover are focused on detecting two potential violations: 

Verification methods

We identify 10 verification methods and divide them into three categories: 

  1. National technical means. Methods that can be used by nations unilaterally. 
  2. Access-dependent methods. Methods that require a nation to grant access to national or international inspectors 
  3. Hardware-dependent methods. Methods that require agreements pertaining to advanced hardware

National technical means

  1. Remote sensing: Detect unauthorized data centers and semiconductor manufacturing via visual and thermal signatures. 
  2. Whistleblowers: Incentivize insiders to report non-compliance. 
  3. 3. Energy monitoring: Detect power consumption patterns that suggest the potential presence of large GPU clusters. 
  4. 4. Customs data analysis: Track the movement of critical AI hardware and raw materials. 
  5. Financial intelligence: Monitor large financial transactions related to AI development.

Access-dependent methods

  1. Datacenter inspections: Conduct inspections of sites to assess the size of a data center, verify compliance with hardware agreements, and verify compliance with other safety and security agreements. 
  2. Semiconductor manufacturing facility inspections: Conduct inspections of sites to determine the quantity of chip production and verify that chip production conforms to any agreements around advanced hardware. 
  3. AI developer inspections: Conduct inspections of AI development facilities via interviews, document and training transcript audits, and potential code reviews.

Hardware-dependent methods

  1. Chip location tracking: Automatic location tracking of advanced AI chips. 
  2. Chip-based reporting: Automatic notification if chips are used for unauthorized purposes.

Limitations and considerations

The verification methods we propose have some limitations, and there are many complicated national and international considerations that would influence if and how they are implemented. Some of these include: 

Future directions

Our work provides a foundation for discussions on AI governance verification, but several key areas require further research: 

1 comments

Comments sorted by top scores.

comment by davekasten · 2024-08-31T18:43:02.621Z · LW(p) · GW(p)

I would suggest that the set of means available to nation-states to unilaterally surveil another nation state is far more expansive than the list you have.  For example, the good-old-fashioned "Paying two hundred and eighty-two thousand dollars in a Grand Cayman banking account to a Chinese bureaucrat"* appears nowhere in your list.  


*If you get that this is a reference to the movie Spy Game, you are cool.  If you don't, go watch Spy Game.  It has a worldview on power that is extremely relevant to rationalists.