Proposal: we should start referring to the risk from unaligned AI as a type of *accident risk*

christopher-king

Proposal: we should start referring to the risk from unaligned AI as a type of accident risk

post by Christopher King (christopher-king) · 2023-05-16T15:18:55.427Z · LW · GW · 6 comments

  Current terminology: no good reference point
  New terminology: tons of reference points!
  Additional benefits
None
6 comments

In the wider political sphere, a lot of people are worried about AI misuse risk. Unaligned AI is not a type of misuse. I think the clearest way to describe this is as an accident risk, in the same sense of the word as industrial accident. In particular, AI existential risk is a type of accident from operating heavy machinery. Using this terminology can immediately help someone not familiar with AI know the category of risk we are talking about, and that in particular it isn't misuse risk.

Note that this isn't intended to replace the term existential risk. Rather, it is meant to be used in addition to that term, and in particular it should be used when contrasting with theoretical misuse risks.

Current terminology: no good reference point

Alice: I hear that you are worried about AI existential risk. So in particular, you are worried about misuse.

Bob: No, the AI kills everyone on its own.

Alice: Is there anything else like this?

Bob: Uhm, Nuclear explosions?

Alice: So a misuse risk? Bob: No, I mean last century they were worried it would set the atmosphere on fire.

Alice: I'm not familiar with that either.
Bob: It's something called instrumental convergence where the AI kills everyone to achieve a goal.

Alice: So misuse risk?

Bob: Not quite, the creators didn't intend for that result.

Alice: I still have no reference point for what you are talking about? I guess I'll need to analyze your arguments more specifically before even understanding the general category of risk you're afraid of it. The probability of me actually doing this is probably like 10%-ish.

New terminology: tons of reference points!

Alice: I hear that you are worried about AI existential risk. So in particular, you are worried about misuse.

Bob: No, I am worried about accident risk.

Alice: oh, so like a car crash or an industrial accident!

Bob: Yes! I'm worried that things will go wrong in ways the creator didn't intend.

Alice: Ah, so you do you think we need more laboratory testing?

Bob: I think even this is risky, because the impact radius will be far larger than that of the lab itself.

Alice: oh, like nuclear weapons testing or biohazards.

Bob: Yes! I think the impact radius may be even bigger than a nuclear explosion though.

Alice: although I do not quite understand how this could work, I understand enough that I want to learn more. And I know understand that you are more worried about accident risk than misuse risk, in the same way that car manufacturers are more worried about car crashes than using cars as weapons. The probability of me actually looking further into this is 30%-ish.

Additional benefits

The way society typically deals with accidents from heavy machinery is much better than the way they are currently treating AI.

In particular, a domain expert having a brilliant safety plan does not suffice for heavy machinery. Neither does the good intentions of the creators. And the possibility that the Chinese might lose limbs to heavy machines also is not sufficient. Rather, you must also loop in measures from the field of risk management.

OpenAI and DeepMind employees are at risk of serious injury while on the job (because of heightened risk of the singularity occuring). Does OpenAI or DeepMind have any posters like this hanging up in their office to keep them safe? Should the employees and the companies work with OSHA to increase industry-wide safety standards? Image credit: Canva

I also think the term accident risk also avoids some of the mistakes from anthropomorphization. We often model AI as having something analogous to human motivation, due to instrumental convergence. However, from the point of view of the creators, this is still a heavy machinery accident.

I think it's better to start from this point of view, and treat the analogies to human psychology as models. The most important reason for this is so we don't project human irrationality or human morality and desires onto the machine. It is just a machine after all, and there's nothing specific to AI systems that is analogous to those two human qualities.

So in conclusion, if someone asks if you're talking about AI misuse risk, say no, you're talking about AI accident risk.

6 comments

Comments sorted by top scores.

comment by Robert Miles (robert-miles) · 2023-05-16T21:34:53.122Z · LW(p) · GW(p)

Are we not already doing this? I thought we were already doing this. See for example this talk I gave in 2018

https://youtu.be/pYXy-A4siMw?t=35

I guess we can't be doing it very well though

Replies from: christopher-king

↑ comment by Christopher King (christopher-king) · 2023-05-16T22:00:18.614Z · LW(p) · GW(p)

Oh wait, I think I might've come up with this idea based on vaguely remembering someone bring up your chart.

(I think adding an OSHA poster is my own invention though.)

comment by Maximilian Kaufmann (max-kaufmann) · 2023-05-16T22:15:09.318Z · LW(p) · GW(p)

A point against that particular terminology which you might find interesting. https://www.lesswrong.com/posts/6bpW2kyeKaBtuJuEk/why-i-hate-the-accident-vs-misuse-ai-x-risk-dichotomy-quick [LW · GW]

comment by Gordon Seidoh Worley (gworley) · 2023-05-16T17:53:47.547Z · LW(p) · GW(p)

I really like this idea, since in an important sense these are accident risks: we don't intend for AI to cause existential catastrophe but it might if we make mistakes (and we make mistakes by default). I get why some folks in the safety space might not like this framing because accidents imply there's some safe default path and accidents are deviations from that when in fact "accidents" are the default thing AI do and we have to thread a narrow path to get good outcomes, but seems like a reasonable way to move the conversation forward with the general public even if the technical details of it are wrong. If the goal is to get people to care about AI doing bad things despite our best efforts to the contrary, framing that as an accident seems like the best conceptual handle most folks have readily available.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2023-05-17T03:59:29.390Z · LW(p) · GW(p)

Um, I don't think accident is a great description. I mean, yeah, the process gets started with an accident or foolish greed which initiates a period of uncontrolled recursive self-improvement, and then humanity is doomed. But humanity is not doomed at that point because of an accident, humanity is doomed at that point because it has allowed to come into existence an overwhelming powerful alien-minded entity capable of killing or enslaving all of humanity at its whim. And because we have reason to suspect, due to reasoning about instrumental goals, that this alien-minded entity will indeed choose to destroy us.

To me, that sounds like war. The thing that makes it scary is the agency of our enemy. Yeah, there's an accident involved, where a portal to the Dark Realms was opened and a selfish alien god invaded our world through that portal, but 'accident' doesn't seem to quite cover it to me. Especially when the situation is such that the portal is likely to be very deliberately opened by a greedy overconfident human who thinks they will be able to control the alien god and get great personal power from it.

So actually, we're more like... at war with demon-summoning cultists who think they will get rich but will actually just get everyone killed?

Note: I'm actually pretty convinced that our best bet at survival is opening the portal very very carefully, and studying the aliens beyond, without letting them take actions in our world or perceive us at all [LW · GW]. In other words, running powerful AGI within censored simulations which don't have mention of humans, computers, or human cultural artifacts, or even the same physics as our universe has. In such conditions, I think we can safely study them, and this is our best hope of designing effective methods of alignment in time. The danger is that this is a very costly and unprofitable venture, and the same tools needed to do this allow one to instead undertake the profitable gamble of letting the AI know about and interact with our world and thus risk our lives.

I don't see the big AI labs as necessarily making the wrong choices here even. The way they see it (I hope) is that step 1 is to race to a powerful enough AI that it can recursively self-improve in secure containment for enough generations that we have something powerful enough to be worth studying in the expensive censored simulation. And in order to fund that venture, you need to use your not-quite-powerful-enough-to-doom-us AI to make a lot of money, and to experiment on, and to help get closer to the truly dangerous AI... It's certainly a risky gamble to be taking though.

comment by Luciana Fruin (luciana-fruin) · 2023-05-16T21:04:44.392Z · LW(p) · GW(p)

In Health and Safety terms, this would be a Hazard (hence biohazard). Risk refers to the likelihood of the hazard occuring, but a hazard can be severe even if the risk is miniscule.