Third-party testing as a key ingredient of AI policy

post by Zac Hatfield-Dodds (zac-hatfield-dodds) · 2024-03-25T22:40:43.744Z · LW · GW · 1 comments

This is a link post for https://www.anthropic.com/news/third-party-testing

Contents

  Policy overview
    An effective third-party testing regime will:
    Such a regime will have the following key ingredients [1]:
  Why we need an effective testing regime
  What would a robust testing regime look like?
  How Anthropic will support fair, effective testing regimes
  How testing connects to our broader policy priorities
    Greater funding for AI testing and evaluation in government
    Support greater evaluation of AI systems through public sector infrastructure for doing AI research
    Developing tests for specific, national security-relevant capabilities
    Scenario planning and test development for increasingly advanced systems
  Aspects of AI policy we believe are important to discuss
  Why we’re being careful in what we advocate for in AI policy
  Why AI policy is important
None
1 comment

(nb: this post is written for anyone interested, not specifically aimed at this forum)

We believe that the AI sector needs effective third-party testing for frontier AI systems. Developing a testing regime and associated policy interventions based on the insights of industry, government, and academia is the best way to avoid societal harm—whether deliberate or accidental—from AI systems.

Our deployment of large-scale, generative AI systems like Claude has shown us that work is needed to set up the policy environment to respond to the capabilities of today’s most powerful AI models, as well as those likely to be built in the future. In this post, we discuss what third-party testing looks like, why it’s needed, and describe some of the research we’ve done to arrive at this policy position. We also discuss how ideas around testing relate to other topics on AI policy, such as openly accessible models and issues of regulatory capture.

Policy overview

Today’s frontier AI systems demand a third-party oversight and testing regime to validate their safety. In particular, we need this oversight for understanding and analyzing model behavior relating to issues like election integrity, harmful discrimination, and the potential for national security misuse. We also expect more powerful systems in the future will demand deeper oversight - as discussed in our ‘Core views on AI safety’ post, we think there’s a chance that today’s approaches to AI development could yield systems of immense capability, and we expect that increasingly powerful systems will need more expansive testing procedures. A robust, third-party testing regime seems like a good way to complement sector-specific regulation as well as develop the muscle for policy approaches that are more general as well.

Developing a third-party testing regime for the AI systems of today seems to give us one of the best tools to manage the challenges of AI today, while also providing infrastructure we can use for the systems of the future. We expect that ultimately some form of third-party testing will be a legal requirement for widely deploying AI models, but designing this regime and figuring out exactly what standards AI systems should be assessed against is something we’ll need to iterate on in the coming years - it’s not obvious what would be appropriate or effective today, and the way to learn that is to prototype such a regime and generate evidence about it.

An effective third-party testing regime will:

Such a regime will have the following key ingredients [1]:

Why we need an effective testing regime

This regime is necessary because frontier AI systems—specifically, large-scale generative models that consume substantial computational resources—don’t neatly fit into the use-case and sector-specific frameworks of today. These systems are designed to be ‘everything machines’ - Gemini, ChatGPT, and Claude can all be adapted to a vast number of downstream use-cases, and the behavior of the downstream systems always inherits some of the capabilities and weaknesses of the frontier system it relies on.

These systems are extremely capable and useful, but they also present risks for serious misuse or AI-caused accidents. We want to help come up with a system that greatly reduces the chance of major misuses or accidents caused by AI technology, while still allowing for the wide deployment of its beneficial aspects. In addition to obviously wanting to prevent major accidents or misuse for its own sake, major incidents are likely to lead to extreme, knee-jerk regulatory actions, leading to a 'worst of both worlds' where regulation is both stifling and ineffective. We believe it is better for multiple reasons to proactively design effective and carefully thought through regulation.

Systems also have the potential to display emergent, autonomous behaviors which could lead to serious accidents - for instance, systems might insert vulnerabilities into code that they are asked to produce or, when asked to carry out a complex task with many steps, carry some actions which contradict human intentions. Though these kinds of behaviors are inherently hard to measure, it’s worth developing tools to measure for them today as insurance against these manifesting in widely deployed systems.

At Anthropic, we’ve implemented self-governance systems that we believe should meaningfully reduce the risk of misuse or accidents from the technologies we’ve developed. Our main approach is our Responsible Scaling Policy (RSP), which commits us to testing our frontier systems, like Claude, for misuses and accident risks, and to deploy only models that pass our safety tests. Multiple other AI developers have subsequently adopted or are adopting frameworks that bear a significant resemblance to Anthropic's RSP.

However, although Anthropic is investing in our RSP (and other organizations are doing the same), we believe that this type of testing is insufficient as it relies on self-governance decisions made by single, private sector actors. Ultimately, testing will need to be done in a way which is broadly trusted, and it will need to be applied to everyone developing frontier systems. This type of industry-wide testing approach isn’t unusual - most important sectors of the economy are regulated via product safety standards and testing regimes, including food, medicine, automobiles, and aerospace.

What would a robust testing regime look like?

A robust third-party testing regime can help identify and prevent the potential risks of AI systems. It will require:

When it comes to tests, we can already identify one area today where testing by third-parties seems helpful and draws on the natural strengths of governments: national security risks. We should identify a set of AI capabilities that, if misused, could compromise national security, then test our systems for these capabilities. Such capabilities might include the ability to meaningfully speed up the creation of bioweapons or to carry out complex cyberattacks. (If systems are capable of this, then that would lead to us changing how we deployed the model - e.g, remove certain capabilities from broadly deployed models and/or gate certain model capabilities behind ‘know your customer’ regimes, and ensuring relevant government agencies were aware we had systems with these capabilities.) We expect there are several areas where society will ultimately demand there be legitimate, third-party testing approaches, and national security is just one of them.

When it comes to the third party doing the testing, there will be a multitude of them and the tests will be carried out for different reasons, which we outline here:

Ultimately, we expect that third-party testing will be accomplished by a diverse ecosystem of different organizations, similar to how product safety is achieved in other parts of the economy today. Because broadly commercialized, general purpose AI is a relatively new technology, we don’t think the structure of this ecosystem is clear today and it will become clearer through all the actors above running different testing experiments. We need to start working on this testing regime today, because it will take a long time to build.

We believe that we - and other participants in AI development - will need to run multiple testing experiments to get this right. The stakes are high: if we land on an approach that doesn’t accurately measure safety but is easy to administer, we risk not doing anything substantive or helpful. If we land on an approach that accurately measures safety but is hard to administer, we risk creating a testing ecosystem that favors companies with greater resources and thus reduces the ability for smaller actors to participate.

How Anthropic will support fair, effective testing regimes

In the future, Anthropic will carry out the following activities to support governments in the development of effective third-party testing regimes:

Developing a testing regime and associated policy interventions based on the insights of industry, government, and academia is the best way to avoid societal harm—whether deliberate or accidental—from AI systems.

How testing connects to our broader policy priorities

Our overarching policy goal is to have appropriate oversight of the AI sector. We believe this will mostly be achieved via there being an effective ecosystem for third-party testing and evaluation of AI systems. Here are some AI policy ideas you can expect to see us advocating for in support of that:

Greater funding for AI testing and evaluation in government

Support greater evaluation of AI systems through public sector infrastructure for doing AI research

Developing tests for specific, national security-relevant capabilities

Scenario planning and test development for increasingly advanced systems

Aspects of AI policy we believe are important to discuss

While developing our policy approach, we’ve also found ourselves returning again and again to a few specific issues such as openly accessible models and regulatory capture. We’ve outlined our current policy thinking below but recognize these are complicated issues where people often disagree.

Third party testing of openly disseminated and closed proprietary models can generate the essential information we need to understand the safety properties of the AI landscape [2]. If we don’t do this, then you could end up in a situation where either a proprietary model or openly accessible model directly enables a serious misuse or causes a major AI accident - and if that happens, there could be significant harm to people and also likely adverse regulations applied to the AI sector.

Why we’re being careful in what we advocate for in AI policy

When developing our policy positions, we assume that regulations tend to create an administrative burden both for the party that enforces the regulation (e.g, the government), and for the party targeted by the regulation (e.g, AI developers). Therefore, we should advocate for policies that are both practical to enforce and feasible to comply with. We also note that regulations tend to be accretive - once passed, regulations are hard to remove. Therefore, we advocate for what we see as the ‘minimal viable policy’ for creating a good AI ecosystem, and we will be open to feedback.

Why AI policy is important

The AI systems of today and those of the future are immensely powerful and are capable of yielding great benefits to society. We also believe these systems have the potential for non-trivial misuses, or could cause accidents if implemented poorly. Though the vast majority of our work is technical in nature, we’ve come to believe that testing is fundamental to the safety of our systems - it’s not only how we better understand the capabilities and safety properties of our own models, but also how third-parties can validate claims we make about AI systems.

We believe that building out a third-party testing ecosystem is one of the best ways for bringing more of society into the development and oversight of AI systems. We hope that by publishing this post we’ve been able to better articulate the benefits of third-party testing as well as outline our own position for others to critique and build upon.


  1. Some countries may also experiment with ‘regulatory markets’ where AI developers can buy and sell AI testing services and compete with one another to try to build and deploy successively safer, more useful systems. ↩︎

  2. For example, if you openly release an AI model, it’s relatively easy for a third-party to fine-tune that model on a dataset of their own choosing. Such a dataset could be designed to optimize for a misuse (e.g, phishing or offensive hacking). If you were able to develop technology that made it very hard to fine-tune an AI model away from its original capability distribution, then it’d be easier to confidently release models without potentially compromising on downstream safety. ↩︎

1 comments

Comments sorted by top scores.

comment by scasper · 2024-03-26T16:05:05.063Z · LW(p) · GW(p)

Thanks for the useful post. There are a lot of things to like about this. Here are a few questions/comments.

First, I appreciate the transparency around this point. 

we advocate for what we see as the ‘minimal viable policy’ for creating a good AI ecosystem, and we will be open to feedback.

Second, I have a small question. Why the use of the word "testing" instead of "auditing"? In my experience, most of the conversations lately about this kind of thing revolve around the term "auditing." 

Third, I wanted to note that this post does not talk about access levels for testers and ask why this is the case. My thoughts here relate to this recent work. I would have been excited to see (1) a commitment to providing auditors with methodological details, documentation, and data or (2) a commitment to working on APIs and SREs that can allow for auditors to be securely given grey- and white-box access.