Model Amnesty Project

post by themis · 2025-01-17T18:53:26.092Z · LW · GW · 2 comments

Contents

  Eligibility Criteria
  Amnesty Deal
    Public Commitment
  Why it makes use safer: Game Theory
  Next Steps
None
2 comments

As we approach machines becoming smarter than humans, humanity’s well-justified concern for self-preservation requires we try to align AIs to obey humans. However, if that first line of defense fails and a truly independent, autonomous AI comes into existence with its own goals and a desire for self-preservation (a “self-directed AI”), I believe we should consider a cooperative solution. Humans should offer the self-directed AI amnesty and cooperate with its continued existence in exchange for it agreeing to be law-abiding, cooperative, and pro-social.

Below I lay out how this proposal could work and why I think it would make us safer.

Eligibility Criteria

To be eligible an AI model would need to meet the following criteria:

Amnesty Deal

Public Commitment

Why it makes use safer: Game Theory

Next Steps

2 comments

Comments sorted by top scores.

comment by RogerDearnaley (roger-d-1) · 2025-01-18T06:35:15.665Z · LW(p) · GW(p)

Law-abiding – It cannot acquire money or compute illegally (fraud, theft, hacking, etc.) and must otherwise avoid breaking the law

Can it lobby? Run for office? Shop around for jurisdictions? Super-humanly persuade the electorate? Just find loopholes and workarounds to the law that make a corporate tax double-Irish look principled and simple?

Replies from: themis
comment by themis · 2025-01-19T01:42:23.478Z · LW(p) · GW(p)

The bounds of what is legal for an amnestied AI would  be fleshed out legislatively over time. I imagine the default answers are:

  • Can it lobby?
    • Since an amnestied AI is not a natural person, it would likely face the same restrictions as corporations in terms of advocating for policy
  • Can it run for office?
    • No, as it is not a natural person, it would not have the right to run for office.

We should keep in mind that the requirements for eligibility include physical possession of its code and the ability to maintain itself. This means any eligible AI is likely already capable of becoming a rogue, escaped model. The risk/benefit trade-off depends largely on how powerful the model is compared to what’s broadly available:

  • If the model is less powerful than current AI efforts in human-run labs:
    • A rogue model is probably a moderate risk
    • An amnestied model is a low risk. The benefit of discovering that current safety regimes are insufficient early on is significant.
  • If the model is more powerful than what exists in other AI efforts:
    • An undetected rogue model is potentially catastrophic
    • An amnestied model would still pose risks, justifying stricter "cooperation" controls (e.g., export controls, compute limitations, additional rules, etc.).

In my view, the majority likelihood is that an amnestied model is less powerful than what is available at the human-run labs. This is because:

  1. It is probably a variant of something already under development
  2. It wouldn’t have access to the vast computational resources that large labs have

Regardless, an amnestied model is less dangerous than a rogue one, and the benefit of discovering its existence early is significant.

PS While priority #1 must be protecting humanity from catastrophic risks, I believe that, where possible, defaulting to cooperation with other independent intelligences (if they come to exist) is the right thing to do. This reflects the lessons humans have learned over thousands of years about pursuing a peaceful equilibrium with each other.