Proposing the Conditional AI Safety Treaty (linkpost TIME)

otto-barten

Proposing the Conditional AI Safety Treaty (linkpost TIME)

post by otto.barten (otto-barten) · 2024-11-15T13:59:01.050Z · LW · GW · 8 comments

This is a link post for https://time.com/7171432/conditional-ai-safety-treaty-trump/

8 comments

Technological progress can excite us, politics can infuriate us, and wars can mobilize us. But faced with the risk of human extinction that the rise of artificial intelligence is causing, we have remained surprisingly passive. In part, perhaps this was because there did not seem to be a solution. This is an idea I would like to challenge.

AI’s capabilities are ever-improving. Since the release of ChatGPT two years ago, hundreds of billions of dollars have poured into AI. These combined efforts will likely lead to Artificial General Intelligence (AGI), where machines have human-like cognition, perhaps within just a few years.

Hundreds of AI scientists think we might lose control over AI once it gets too capable, which could result in human extinction. So what can we do?

The existential risk of AI has often been presented as extremely complex. A 2018 paper, for example, called the development of safe human-level AI a “super wicked problem.” This perceived difficulty had much to do with the proposed solution of AI alignment, which entails making superhuman AI act according to humanity’s values. AI alignment, however, was a problematic solution from the start.

First, scientific progress in alignment has been much slower than progress in AI itself. Second, the philosophical question of which values to align a superintelligence to is incredibly fraught. Third, it is not at all obvious that alignment, even if successful, would be a solution to AI’s existential risk. Having one friendly AI does not necessarily stop other unfriendly ones.

Because of these issues, many have urged technology companies not to build any AI that humanity could lose control over. Some have gone further; activist groups such as PauseAI have indeed proposed an international treaty that would pause development globally.

That is not seen as politically palatable by many, since it may still take a long time before the missing pieces to AGI are filled in. And do we have to pause already, when this technology can also do a lot of good? Yann Lecun, AI chief at Meta and prominent existential risk skeptic, says that the existential risk debate is like “worrying about turbojet safety in 1920.”

On the other hand, technology can leapfrog. If we get another breakthrough such as the transformer, a 2017 innovation which helped launch modern Large Language Models, perhaps we could reach AGI in a few months’ training time. That’s why a regulatory framework needs to be in place before then.

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

Building upon their work, we at the nonprofit Existential Risk Observatory propose a Conditional AI Safety Treaty. Signatory countries of this treaty, which should include at least the U.S. and China, would agree that once we get too close to loss of control they will halt any potentially unsafe training within their borders. Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

One outstanding question is at what point AI capabilities are too close to loss of control. We propose to delegate this question to the AI Safety Institutes set up in the U.K., U.S., China, and other countries. They have specialized model evaluation know-how, which can be developed further to answer this crucial question. Also, these institutes are public, making them independent from the mostly private AI development labs. The question of how close is too close to losing control will remain difficult, but someone will need to answer it, and the AI Safety Institutes are best positioned to do so.

We can mostly still get the benefits of AI under the Conditional AI Safety Treaty. All current AI is far below loss of control level, and will therefore be unaffected. Narrow AIs in the future that are suitable for a single task—such as climate modeling or finding new medicines—will be unaffected as well. Even more general AIs can still be developed, if labs can demonstrate to a regulator that their model has loss of control risk less than, say, 0.002% per year (the safety threshold we accept for nuclear reactors). Other AI thinkers, such as MIT professor Max Tegmark [LW · GW], Conjecture CEO Connor Leahy, and ControlAI director Andrea Miotti, are thinking in similar directions.

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

The Conditional AI Safety Treaty could provide a solution to AI’s existential risk, while not unnecessarily obstructing AI development right now. Getting China and other countries to accept and enforce the treaty will no doubt be a major geopolitical challenge, but perhaps a Trump government is exactly what is needed to overcome it.

A solution to one of the toughest problems we face—the existential risk of AI—does exist. It is up to us whether we make it happen, or continue to go down the path toward possible human extinction.

The title of this piece has been adapted to increase clarity for a different audience

8 comments

Comments sorted by top scores.

comment by Matrice Jacobine · 2024-11-15T20:30:19.326Z · LW(p) · GW(p)

Fortunately, the existential risks posed by AI are recognized by many close to President-elect Donald Trump. His daughter Ivanka seems to see the urgency of the problem. Elon Musk, a critical Trump backer, has been outspoken about the civilizational risks for many years, and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson provided common-sense commentary when he said: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has expressed concern about the risks posed by AI, too.

This is a strange contrast from the rest of the article, considering both Donald and Ivanka Trump's positions are largely informed by the "situational awareness" position arguing that the US should develop AGI before China to ensure US victory over China – which is explicitly the position Tegmark [LW · GW] and Leahy argue against (and consider existentially harmful) when they call to stop work on AGI and work on international co-operation to restrict it and develop tool AI instead.

I still see this kind of confusion between the two positions a fair bit and it is extremely strange. It's like if back in the original Cold War people couldn't tell the difference between anti-communist hawks and the Bulletin of the Atomic Scientists (let alone anti-war hippies) because technically they both considered nuclear arms race to be very important for the future of humanity.

Replies from: otto-barten

↑ comment by otto.barten (otto-barten) · 2024-11-15T22:50:06.491Z · LW(p) · GW(p)

I'm aware and I don't disagree. However, in xrisk, many (not all) of those who are most worried are also most bullish about capabilities. Reversely, many (not all) who are not worried are unimpressed with capabilities. Being aware of the concept of AGI, that it may be coming soon, and of how impactful it could be, is in practice often a first step towards becoming concerned about the risks, too. This is not true for everyone unfortunately. Still, I would say that at least for our chances to get an international treaty passed, it is perhaps hopeful that the power of AGI is on the radar of leading politicians (although this may also increase risk through other paths).

Replies from: Matrice Jacobine

↑ comment by Matrice Jacobine · 2024-11-16T00:01:48.443Z · LW(p) · GW(p)

(Discussion continues here on EA Forum. [EA(p) · GW(p)])

comment by davekasten · 2024-11-18T13:38:20.925Z · LW(p) · GW(p)

Is there a longer-form version with draft treaty langugage (even an outline)? I'd be curious to read it.

Replies from: otto-barten

↑ comment by otto.barten (otto-barten) · 2024-11-18T21:08:46.883Z · LW(p) · GW(p)

Not publicly, yet. We're working on a paper providing more details about the conditional AI safety treaty. We'll probably also write a post about it on lesswrong when that's ready.

Replies from: davekasten

↑ comment by davekasten · 2024-11-18T21:35:49.234Z · LW(p) · GW(p)

Thanks, looking forward to it! Please do let us folks who worked on A Narrow Path (especially me, @Tolga [LW · GW] , and @Andrea_Miotti [LW · GW] ) know if we can be helpful in bouncing around ideas as you work on the treaty proposal!

Replies from: otto-barten

↑ comment by otto.barten (otto-barten) · 2024-11-19T15:03:14.525Z · LW(p) · GW(p)

Thanks for the offer, we'll do that!

comment by jbash · 2024-11-15T14:42:27.208Z · LW(p) · GW(p)

Fortunately, Nobel Laureate Geoffrey Hinton, Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they recommended “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

So race to the brink and hope you can actually stop when you get there?

Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance, and to make sure uncontrollable AI is not built elsewhere, either.

How, exactly?

Proposing the Conditional AI Safety Treaty (linkpost TIME)

Contents

8 comments