What are Responsible Scaling Policies (RSPs)?

post by Vishakha (vishakha-agrawal), Algon · 2025-04-05T16:01:34.762Z · LW · GW · 0 comments

This is a link post for https://aisafety.info/questions/NK4Y/What-are-Responsible-Scaling-Policies-(RSPs)

Contents

No comments

This is an article in the featured articles series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback

The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.

METR[1] defines a responsible scaling policy (RSP) as a specification of “what level of AI capabilities an AI developer is prepared to handle safely with their current protective measures, and conditions under which it would be too dangerous to continue deploying AI systems and/or scaling up AI capabilities until protective measures improve.”

Anthropic was the first company to publish an RSP in September 2023 defining 4 AI Safety Levels.

“A very abbreviated summary of the ASL system is as follows:

  • ASL-1 refers to systems which pose no meaningful catastrophic risk, for example a 2018 LLM or an AI system that only plays chess.
  • ASL-2 refers to systems that show early signs of dangerous capabilities – for example ability to give instructions on how to build bioweapons – but where the information is not yet useful due to insufficient reliability or not providing information that e.g. a search engine couldn’t. Current LLMs, including Claude, appear to be ASL-2.
  • ASL-3 refers to systems that substantially increase the risk of catastrophic misuse compared to non-AI baselines (e.g. search engines or textbooks) OR that show low-level autonomous capabilities.
  • ASL-4 and higher (ASL-5+) is not yet defined as it is too far from present systems, but will likely involve qualitative escalations in catastrophic misuse potential and autonomy.”

 

Other AI companies[2] have released their own versions of such documents with various names:

RSPs have received positive and negative reactions from the AI safety community. Evan Hubinger of Anthropic, for instance, argues that they are “pauses done right [AF · GW]”; others are more skeptical [LW · GW]. Objections to RSPs include that they serve to relieve regulatory pressure and shift the "burden of proof" from the people working on capabilities to people concerned about safety while serving only as a promissory note rather than an actual policy.

Further reading:

  1. ^

    Formerly known as ARC Evals.

  2. ^

    More companies still have committed to publish such documents.

0 comments

Comments sorted by top scores.