How reasonable is taking extinction risk?

fvelde

How reasonable is taking extinction risk?

post by FVelde · 2024-07-23T18:05:16.225Z · LW · GW · 4 comments

4 comments

The people that make general artificial intelligence models, believe that these models could mean the end of mankind. As an example of how that could happen, a future version of ChatGPT might be so smart that the advanced chatbot could create enormous danger. It could disseminate information like how to easily build weapons of mass destruction or even just build those weapons itself. For instance, after ChatGPT was first made, many people used it as the basis for agents that could roam the internet by themselves. One, called ChaosGPT, was tasked with creating a plan for taking over the world. The idea was funny, but would have been less funny if the evil machine was advanced enough to actually do it. After AI surpasses human intelligence, one of those smarter-than-human AIs could be given the task to wipe out humanity and be smart enough to complete the task. Assessing the likelihood of this scenario can be done by looking at its three steps. Since (1) AI is likely to surpass human intelligence at some point, (2) some people might give the task of destroying humanity to a superintelligent machine (there is precedent) and (3) humanity has caused the extinction of many less intelligent species without even trying to, the scenario seems at least possible. This scenario is only one of many in which the arrival of superintelligent machines spells the end of humanity.

Should we let the people building ever smarter AI models continue, if they are thereby risking human extinction? And more broadly: is taking a risk of extinction ever reasonable?

If your answer is 'no,' then you can stop reading this article. All I have to tell you is that the leaders of OpenAI (the maker of ChatGPT), Anthropic (its main competitor) and Google Deepmind (Googles AI lab) as well as the top three most cited AI scientists have all publicly stated that AI models have a chance of causing human extinction.

If your answer is instead 'yes, we can let them continue building these models,' then you are open to arguments for risking human extinction for some benefit. You might have said that it was okay to test the first nuclear bomb when it was still unclear whether the bomb would set the atmosphere on fire. Or that it was okay to build the first particle accelerator when it was still unclear whether it would create a black hole. And in the future you might be convinced time and time again, that risk of human extinction is acceptable, because you are ‘open to reason’.

But if we risk human extinction time and time again, the risk adds up and we end up extinct with near certainty. So, at some point you have to say ‘No, from now on, human extinction can not be risked’ or it is near guaranteed to happen. Should you now say: ‘Okay, we can take a risk now but at some point in the future, we have to put a complete halt to this’, then you are open to postponing drawing a line in the sand. That means that you can keep getting convinced by those wanting to risk extinction, probably not always but time and time again, that a line can be drawn in the future rather than now. But if we allow the line to be pushed further and further back indefinitely, again, humanity is practically guaranteed to go extinct. So the point from which we no longer accept extinction risk cannot be in the future either and therefore has to be now.

With estimates of odds of human extinction in our lifetimes non-trivial, this is no mere intellectual exercise. This concerns us and our loved ones. Benefits, like promises of economic growth or curing of diseases, are meaningless if the price is everyone on earth dying. To visualise this, imagine having to play Russian roulette a hundred times in a row with a million dollars of prize money every time you survive. It could be a billion for each win and the outcome would still be death. If we allow risk of human extinction, through building general artificial intelligence models or something else, we as good as guarantee human extinction at some point and it might already happen in our lifetimes.

There are however two circumstances in which taking an extinction risk can be worth it. These two circumstances have to do with other extinction risks and the fact that there are fates worse than death.

The first is one where problems like climate change and nuclear war also pose an extinction risk and AI might help mitigate those risks. It might do so by providing a blueprint for transitioning to clean energy and coming up with a way to let the great nuclear powers slowly build down their nuclear bomb supply. Should AI convincingly lower the overall risk of extinction, it is warranted to continue building greater AI models, even if it brings its own risks. The argument can then be changed to: 'If increasing extinction risk is reasonable once, it will be reasonable in the future, until the risk materialises at some point. But allowing near certain extinction is not reasonable, so increasing extinction risk once is not reasonable either.' AI mitigating other extinction risks to a degree that it cancels out its own might be a far cry, but it is worth looking into.

The second acceptable circumstance is one in which the alternative is a fate worse than human extinction. An idea that is commonplace in American AI companies is that China will continue to build AI and try to take over the world with it. One reaction to this possibility is to accept the extinction risk of building AI to stay ahead of the China and prevent a suppressive government with global power. A world like that could look like the world described in 1984 by George Orwell:

“There will be no curiosity, no enjoyment of the process of life. All competing pleasures will be destroyed. But always— do not forget this, Winston— always there will be the intoxication of power, constantly increasing and constantly growing subtler. Always, at every moment, there will be the thrill of victory, the sensation of trampling on an enemy who is helpless. If you want a picture of the future, imagine a boot stamping on a human face— forever. ”

Figuring out if AI mitigates or worsens extinction risk is an important question that needs work. So is figuring out if a fate worse than extinction is likely enough to warrant taking extinction risk to prevent it. But if AI only increases the risk of human extinction and at the same time a fate worse than extinction is not sufficiently likely, then we have to draw a line in the sand. Do we draw the line now or in the future? From the moment there is extinction risk, but rather before, we need to draw a line. We don’t know when AI models will be smart enough to pose a risk of human extinction and we cannot afford to wait and see. Because if waiting and seeing is reasonable, it is reasonable every now and then, until humanity goes extinct. But allowing human extinction with near certainty is not reasonable, so neither is waiting and seeing.

4 comments

Comments sorted by top scores.

comment by Dagon · 2024-07-23T18:23:24.849Z · LW(p) · GW(p)

Your first alternative hypothesis (there's ALREADY a path to extinction) is clear to me, and it is unclear what sign or magnitude of change in that risk that AI will bring. Which makes your title a bit suspect - AI doesn't bring a risk of extinction, it "merely" changes the likelihood, and perhaps the severity, of possible extinction paths.

Replies from: FVelde, FVelde

↑ comment by FVelde · 2024-07-24T05:03:13.247Z · LW(p) · GW(p)

The title, previously 'Is taking extinction risk reasonable?' has been changed to 'On extinction risk over time and AI'. I appreciate the correction.

↑ comment by FVelde · 2024-07-23T22:18:56.355Z · LW(p) · GW(p)

I agree that AI changes the likelihood of extinction rather than bring a risk where there was none before. In that sense the right question could be 'Is increasing the probability of extinction reasonable?'.

Assuming that you mean by the last sentence that AI does not bring new extinction paths, I would like to counter that AI could well bring new paths to extinction; that is, there are probably paths to human extinction that open up when a machine intelligence surpasses human intelligence. Just like chess engines can apply strategies that humans have not thought of, some machine intelligence could find ways to wipe out humanity that have not yet been imagined. Furthermore, there might be ways to cause human extinction that can only be executed by a superior intelligence. An example could be a path that starts with hacking into many well defended servers in short succession, at a speed that even the best group of human hackers could not execute, in order to shut down a large part of the internet.

comment by Vladimir_Nesov · 2024-07-24T05:35:23.629Z · LW(p) · GW(p)

For a given AGI lab, the decision to keep working on the project despite believing at least 10% risk of extinction depends on the character of counterfactuals. Success is not just another draw out of the extinction urn, taking another step on the path to eventual doom, instead it promises that the new equilibrium involves robust safety with no future draws. So it's all about the alternatives.

One issue for individual labs is that their alternative is likely that the other labs develop AGI instead, they personally have little power to pause AI globally, unless they involve themselves in coordination with all other capable actors. Many arguments stop here, considering such coordination infeasible.

The risk of literal extinction for reasons other than AGI seems vanishingly small for the foreseeable future. There are many global catastrophic risks with moderate probability when added up over decades, some of which might disrupt the course of civilization for millennia, but not literal extinction. The closest risk of actual extinction that doesn't involve AGI I can imagine is advanced biotechnology of the kind that's not even on the horizon yet. It's unclear how long it would take to get there without AI, while dodging civilization-wreaking catastrophes that precede its development, but I would guess a lower bound of many decades before this becomes a near-term possibility. Even then it won't become a certainty of immediate doom, in a similar way to how large nuclear arsenals still haven't cashed out in a global nuclear conflict for many decades. So it makes sense to work towards global coordination to pause AI for at least this long, as long as there is vigorous effort to develop AI alignment theory and prepare in all ways that make sense during this time.

How reasonable is taking extinction risk?

Contents

4 comments