To the average human, controlled AI is just as lethal as 'misaligned' AI

post by YonatanK (jonathan-kallay) · 2024-03-14T14:52:43.570Z · LW · GW · 20 comments

A few months ago I posted this understated short piece [LW · GW] proposing, in a nutshell, that the average person has at least as much to fear from perfectly controlled advanced AI as they would from so-called 'misaligned' AI, because if automation can emerge that can defeat all humans' defenses on its own whim, even despite its developers' best efforts to prevent this from happening, it seems to me that automation that merely assists a small group of humans to defeat the rest of humans' defenses would be a technically easier milestone, without the hurdle of subverting its own makers' intentions. Willing human participation in automation-enabled mass killing is being improperly relegated, I attempted to suggest, to manageable, "falling into the wrong hands" edge cases, particularly as the possibility has a self-fulfilling dynamic: if there might exist one clandestine group that wants to and could attain the means to 'take out' most of the human population, it would be rational for anyone wishing to survive such a purge to initiate it themselves. Thus, the existence of many groups with a reason to 'take out' most of the human population is guaranteed by the emergence of a widely distributable, low side-effect mass-killing technology like AI.

I received some insightful responses, for which I'm grateful. But I perceived the post as being mostly ignored. Granted, it was not well-written. Nevertheless, the basic idea was there, and no comments were offered that I felt satisfactorily put it to rest. Given that AI 'misalignment' is a favorite topic of this community, a claim about an AI risk that is just as catastrophic and more likely might be expected to be taken up enthusiastically, no matter how inelegantly it is presented.

To be fair, there is no actual inconsistency here. LW is not an AI risk community. It's OK to be interested in 'alignment' because 'alignment' is interesting, and to withhold engagement from adjacent problems one finds less interesting. What I've found when discussing this topic with other people, though, no matter how clever, is a visceral resistance to normalizing mass killing. It is more comfortable to treat it as deviant behavior, not something that could be predicted of reasonable, rational people (including, of course, oneself) given the right circumstances. This community, despite its recognition of the possible correctness of the Orthogonality Thesis, meaning the intentions of artificial intelligent agents or aliens cannot be trusted, seems to me to place faith in the benevolent tendencies of intelligent human agents. But, consequences aside, blind faith clashes with its explicitly stated aim of seeking to hold true beliefs and to "each day, be less wrong about the world." I hope my argument is wrong in fundamental ways, not cosmetic ones, and so I hope the Less Wrong community will engage with it to refute it. That invitation is the main thing, but I'll throw in a modest expansion of the sketchy argument.

A response I received [LW(p) · GW(p)] (again, thanks) compared my claim that the existence of mass killing technology triggers its use to von Neumann's failed argument for the US to launch a preemptive nuclear strike against the Soviet Union. Von Neumann argued that as a nuclear exchange was inevitable, it would be strategically advantageous to get it out of the way: “if you say today at five o'clock, I say why not one o'clock?” Actual humans rejected von Neumann's logic, which he based on a game theory model of human behavior, and thereby avoided a horrific outcome.

The situations are, indeed, similar in important ways, both in the repugnant acts being driven by self-fulfilling beliefs about what others would do, and the massive loss of human life. But the mass killing is only one aspect of what would have made the preemptive strike horrific, and we should not rush to generalize to "human benevolence will override game theory rationality to avoid massive loss of human life." Rather, the nuclear exchange outcomes that were (and continue to be) avoided no one actually wants, demonstrating that human intuition is more self-interestedly rational than the rational actor in von Neumann's model (Amartya Sen has said as much when saying that the "purely economic man is indeed close to being a social moron"). In contrast, a serious case for intentionally reducing the human population is already being made on its own (let us call it Malthusian) merits, serious enough to demand attempts at anti-Malthusian rebuttal, including from members of this community [LW · GW]. One of the pillars of the anti-Malthusianism position is simply that it's wrong to wish for an end-state whose realization requires the death or forced sterilization of billions of people. This is a normative assertion, not a predictive one: if one has to say "it's wrong to kill or sterilize billions of people," one has practically already conceded that there may be rational reasons for doing so, if one could, but one ought not to as a good person, meaning "a person who will hold true to a mutually beneficial agreement not to engage in such behavior." But in the presence of sufficiently accessible technology that would make intentional low-pop feasible, embracing the reasons for doing so, seeing them as right, becomes not just a preference but a survival strategy. To overcome this, to prevent defections from the "killing is wrong" regime, a high-pop world would have to be both feasible and clearly, overwhelmingly better than the low-pop one.

Given the current discourse on climate change as an existential threat, it hardly feels necessary to spell out the Malthusian argument. In the absence of any significant technological developments, sober current trajectory predictions seem to me to range from 'human extinction' to 'catastrophic, but survivable,' involving violent paths to low-pop (or no-pop) states. Another pillar of anti-Malthusianism is that even the 'better' of these scenarios, where population and natural resource consumption peacefully stabilize, things don't look great. The modern global political-economy is relatively humane (in the Pinkerian sense) under conditions of growth, which, under current conditions, depends on a growing population and rising consumption. Under stagnant or deflationary conditions it can be expected to become more cutthroat, violent, undemocratic and unjust. So far, high-pop is either suicidal or dystopian.

So how do these bad options compare against a world that has been managed into a low-pop state? A thought experiment: if you could run simulations of transporting a hand-picked crew (sized to your choosing) from Earth's present population to an exact replica of Earth, including all of its present man-made infrastructure and acquired knowledge, just not all of its people, what percentage of those simulations would you predict to produce good, even a highly enviable, lives for the crew and their descendants? High predictions seem warranted. Present views on many varieties of small population human life, even with their lack of access to science and technology, are favorable; life as a highly adaptable, socially cooperative apex predator can be quite good. With the addition of the accumulated knowledge from agrarian and industrial societies, but not the baggage from their unsustainable growth patterns, it could be very good indeed.

I did just compare the current human trajectory to the low population alternative absent significant technological developments, which unfairly eliminates what I take to be most secular people's source of hope about the current trajectory, and the basis for anti-Malthusians' argument that everything will turn out rosy as long as we find a humane way to keep on breeding: in a high-and-growing-pop world, humanity will keep innovating its way around its problems.

But a safe off-ramp from the growth trap counts as a solution produced by the high-pop innovative world to solve its high-pop problems. Specifically, the high-pop root cause problem. Other specific solutions only treat individual symptoms, or side-effects of prior treatments, or side-effects of treatments of side-effects, and so on. Energy from cold fusion, if not sucked up by cryptocurrency mining or squashed by the fossil fuel industry, may help us dodge the climate change bullet, but doesn't remove "forever chemicals" from the environment, nor prevent over-fishing in the oceans, nor induce people in developed countries to have more babies (who will exacerbate over-fishing and die of cancer from forever chemicals). Conversely, a low-pop world doesn't need to care about cold fusion or lab-grown fish meat. What the anti-Malthusians offer from a high-pop world is a vague promise of continued innovation: yes, cancer caused by forever chemicals today, but someday, as long as we don't stop growing now, a cure for all cancers or even mortality itself. Even if this astounding claim is theoretically possible, it fails as soon as innovation itself is automated, which is exactly what all AI research is hell-bent on achieving, ‘misalignment’ risks be damned. If and when it is achieved, that technology, too, can be carried into the low-pop world, where it can be more carefully supervised and not wasted on billions of dead-weight, surplus people.

I have proposed human-controlled automated mass-killing technology as more dangerous to the average person than a malevolent artificial superintelligence because it is more task-specific and therefore technically simpler to achieve than general intelligence, doesn't require escaping its own creators' controls, and because once developed there is a race to be the first to put it to use. I am willing to concede that humans' cooperative powers may suppress the self-triggering dynamic to support the achievement of general intelligence first, if the 'misalignment' concerns from 'going all the way' to AGI appear to have been addressed and the crises of the high-pop world don't force the issue. With respect to the survival prospects for the average human, this seems to me to be a minor detail.

20 comments

Comments sorted by top scores.

comment by AnthonyC · 2024-03-14T20:08:50.055Z · LW(p) · GW(p)

Before anything else, I would note that your proposed scenario has winners. In other words, it's a horrible outcome, but in this world where alignment as you've defined it is easy but a group of humans uses it to destroy most of the population, that group of humans likely survives, repopulates the Earth, and continues on into the future.

This, to put it mildly, is a far, far better outcome than we'd get from an AI that wants to kill everyone of its own agency, or that doesn't care about us enough to avoid doing so as a side effect of other actions.

I don't remember where or when, but IIRC EY once wrote that if an evil person or group used ASI to take over the world, his reaction would be to 'weep for joy and declare victory' that any human-level agents whatsoever continue to exist and retain enough control to be able to be said to "use" the AI at all.

That said, yes, if we do figure out how to make an AI that actually does what humans want it to do, or CEV-would-want it to do, when they ask it to do something, then preventing misuse becomes the next major problem we need to solve, or ideally to have solved before building such an AI. And it's not an easy one.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-14T21:49:22.684Z · LW(p) · GW(p)

I agree, and I attempted to emphasize the winner-take-all aspect of AI in my original post.

The intended emphasis isn't on which of the two outcomes is preferable, or how to comparatively allocate resources to prevent them. It's on the fact that there is no difference between alignment and misalignment with respect to the survival expectations of the average person.

Replies from: AnthonyC
comment by AnthonyC · 2024-03-14T22:23:09.745Z · LW(p) · GW(p)

Ok, then that I understand. I do not think it follows that I should be indifferent between those two ways of me dying. Both are bad, but only one of them necessarily destroys everything I value.

In any case I think it's much more likely a group using an aligned-as-defined-here AGI to kill (almost) everyone by accident, rather than intentionally.

Replies from: M. Y. Zuo
comment by M. Y. Zuo · 2024-03-17T01:38:42.170Z · LW(p) · GW(p)

Both are bad, but only one of them necessarily destroys everything I value.

You don’t value the Sun, or the other stars in the sky?

Even in the most absurdly catastrophic scenarios it doesn’t seem plausible that they could be ‘necessarily destroyed’.

Replies from: AnthonyC
comment by AnthonyC · 2024-03-17T19:09:43.600Z · LW(p) · GW(p)

I'd say their value is instrumental, not terminal. The sun and stars are beautiful, but only when there are minds to appreciate them. They make everything else of value possible, because of their light and heat and production of all the various elements beyond Helium.

But a dead universe full of stars, and a sun surrounded by lifeless planets, have no value as far as I'm concerned, except insofar as there is remaining potential for new life to arise that would itself have value. If you gave me a choice between a permanently dead universe of infinite extent, full of stars, or a single planet full of life (of a form I'm capable of finding value in, so a planet full of only bacteria doesn't cut it) but surrounded by a bland and starless sky that only survives by artificial light and heat production (assume they've mastered controlled fusion and indoor agriculture), I'd say the latter is more valuable.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-18T16:26:28.884Z · LW(p) · GW(p)

@AnthonyC [LW · GW] I may be mistaken, but I took @M. Y. Zuo [LW · GW] to be offering a reductio ad absurdum response to your comment about not being indifferent between the two ways of dying. The 'which is a worse way to die' debate doesn't respond to what I wrote. I said

With respect to the survival prospects for the average human, this [whether or not the dying occurs by AGI] seems to me to be a minor detail.

I did not say that no one should care about the difference. 

But the two risks are not in competition, they are complementary. If your concern about misalignment is based on caring about the continuation of the human species, and you don't actually care how many humans other humans would kill in a successful alignment(-as-defined-here) scenario, a credible humans-kill-most-humans risk is still really helpful to your cause, because you can ally yourself with the many rational humans who don't want to be killed either way to prevent both outcomes by killing AI in its cradle.

comment by habryka (habryka4) · 2024-03-14T20:35:36.839Z · LW(p) · GW(p)

Downvoted because the title seems straightforwardly false while not actually arguing for it (making it a bit clickbaity, but I am more objecting to the fact that it's just false). Indeed, this site has a very large number of arguments and posts about why AIs could indeed kill people (and people with AIs might also kill people, though probably many fewer).

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-14T21:24:03.758Z · LW(p) · GW(p)

The title was intended as an ironic allusion to a slogan from the National Rifle Association in the U.S., to dismiss calls for tighter restrictions on gun ownership. I expected this allusion to be easily recognizable, but see now that it was probably a mistake.

Replies from: habryka4, kave, shankar-sivarajan
comment by habryka (habryka4) · 2024-03-14T21:50:58.845Z · LW(p) · GW(p)

Oh, I totally recognized it, but like, the point of that slogan is to make a locally valid argument that guns are indeed incapable of killing people without being used by people. That is not true of AIs, so it seems like it doesn't apply.

comment by kave · 2024-03-14T21:51:24.853Z · LW(p) · GW(p)

I recognised the allusion but also disliked the title.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-14T21:56:57.046Z · LW(p) · GW(p)

Thanks, title changed.

Replies from: shankar-sivarajan
comment by Shankar Sivarajan (shankar-sivarajan) · 2024-03-17T05:00:05.830Z · LW(p) · GW(p)

What was the old title? Something like "Misaligned AI doesn't kill people, misaligned people do"?

This new one sounds like it could make for a good slogan too: "To the average American, gun control is more lethal than guns."

comment by Radford Neal · 2024-03-15T00:43:11.178Z · LW(p) · GW(p)

Your post reads a bit strangely. 

At first, I thought you were arguing that AGI might be used by some extremists to wipe out most of humanity for some evil and/or stupid reason.  Which does seem like a real risk.  

Then you went on to point out that someone who thought that was likely might wipe out most of humanity (not including themselves) as a simple survival strategy, since otherwise someone else will wipe them out (along with most other people). As you note, this requires a high level of unconcern for normal moral considerations, which one would think very few people would countenance.

Now comes the strange part... You argue that actually maybe many people would be willing to wipe out most of humanity to save themselves, because...  wiping out most of humanity sounds like a pretty good idea!

I'm glad that in the end you seem to still oppose wiping out most of humanity, but I think you have some factual misconceptions about this, and correcting them is a necessary first step to thinking of how to address the problem.

Concerning climate change, you write: "In the absence of any significant technological developments, sober current trajectory predictions seem to me to range from 'human extinction' to 'catastrophic, but survivable'".

No. Those are not "sober" predictions. They are alarmist claptrap with no scientific basis. You have been lied to. Without getting into details, you might want to contemplate that global temperatures were probably higher than today during the "Holocene Climatic Optimum" around 8000 years ago.  That was the time when civilization developed.  And temperatures were significantly higher in the previous interglacial, around 120,000 years ago.  And the reference point for supposedly-disastrous global warming to come is "pre-industrial" time, which was in the "little ice age", when low temperatures were causing significant hardship. Now, I know that the standard alarmist response is that it's the rate of change that matters.  But things changed pretty quickly at the end of the last ice age, so this is hardly unprecedented. And you shouldn't believe the claims made about rates of change in any case - actual science on this question has stagnated for decades, with remarkably little progress being made on reducing the large uncertainty about how much warming CO2 actually causes.

Next, you say that the modern economy is relatively humane "under conditions of growth, which, under current conditions, depends on a growing population and rising consumption. Under stagnant or deflationary conditions it can be expected to become more cutthroat, violent, undemocratic and unjust."

Certainly, history teaches that a social turn towards violence is quite possible. We haven't transcended human nature.  But the idea that continual growth is needed to keep the economy from deteriorating just has no basis in fact.  Capitalist economies can operate perfectly fine without growth.  Of course, there's no guarantee that the economy will be allowed to operate fine.  There have been many disastrous economic policies in the past.  Again, human nature is still with us, and is complicated. Nobody knows whether social degeneration into poverty and tyranny is more likely with growth or without growth.

Finally, the idea that a world with a small population will be some sort of utopia is also quite disconnected from reality.  That wasn't the way things were historically. And even if it was, it woudn't be stable, since population will grow if there's plenty of food, no disease, no violence, etc. 

So, I think your first step should be to realize that wiping out most of humanity would not be a good thing. At all. That should make it a lot easier to convince other people not to do it.

comment by Seth Herd · 2024-03-14T21:44:14.198Z · LW(p) · GW(p)

I think you raise an important point. If we solve alignment, do we still all die?

This has been discussed in the alignment community under the terminology of a "pivotal act". It's often been assumed that an aligned AGI would prevent the creation of more AGIs to prevent both the accidental creation of misaligned AGIs, and the deliberate creation of AGIs that are misaligned to most of humanity's interests, while being aligned to the creator's goals. Your misuse category falls into the latter. So you should search for posts under the term pivotal act. I don't know of any particularly central ones off the top of my head.

However, I think this is worth more discussion. People have started to talk about "multipolar scenarios" in which we have multiple or many human-plus level AGIs. I'm unclear on how people think we'll survive such a scenario, except by not thinking about it a lot. I think this is linked to the shift in predicting a slower takeoff, where AGI doesn't become superintelligent that quickly. But I think the same logic applies, even if we survive for a few years longer.

I hope to be convinced otherwise, but I currently mostly agree with your logic for multipolar scenarios. I think we're probably doomed to die if that's allowed to happen. See What does it take to defend the world against out-of-control AGIs? [AF · GW] for reasons that a single AGI could probably end the world even if friendly AGIs have a headstart in trying to defend it.

I'd summarize my concerns thus: Self-improvement creates an unstable situation to which no game-theoretic cooperative equilibrium applies. It's like playing Diplomacy where the players can change the rules arbitrarily on each turn. If there are many AGIs under human control, one will eventually have goals for the use of Earth at odds with those of humanity at large. This could happen because of an error in its alignment, or because the human(s) controlling it has non-standard beliefs or values.

When this happens, I think it's fairly easy for a self-improving AGI to destroy human civilization (although perhaps not other AGIs with good backup plans). It just needs to put together a hidden (perhaps off-planet, underground or underwater) robotic production facility that can produce new compute and new robots. That's if there's nothing simpler and more clever to do, like diverting an asteroid or inventing a way to produce a black hole. The plans get easier the less you care about using the Earth immediately afterward.

I agree that this merits more consideration.

I also agree that the title should change. LW very much looks down on clickbait titles. I don't think you intended to argue that AI won't kill people, merely that people with AIs will. I believe you can edit the title, and you should.

Edit: I recognized the title and didn't take you to be arguing against autonomous AI as a risk - but it does actually make that claim, so probably best to change it.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-15T03:33:49.144Z · LW(p) · GW(p)

You have a later response to some clarifying comments from me, so this may be moot, but I want to call out that my emphasis is on the behavior of human agents who are empowered by automation that may fall well short of AGI. A "pivotal act" is a very germane idea, but rather than the pivotal act of the first AGI eliminating would-be AGI competitors, this act is carried out by humans taking out their human rivals.

It is pivotal because once the target population size has been achieved, competition ends, and further development of the AI technology can be halted as unnecessarily risky.

comment by Vladimir_Nesov · 2024-03-14T16:57:22.175Z · LW(p) · GW(p)

because it is more task-specific and therefore technically simpler to achieve than general intelligence, doesn't require escaping its own creators' controls

An argument for danger of human-directed misuse doesn't work as an argument against dangers of AI-directed agentic activity. Both are real, though misuse only becomes an extinction-level problem when AIs are very powerful, at which point the AI-directed activity that is not misuse by humans also becomes relevant. With extinction-level problems, it doesn't matter for allocation of attention which one is worse (since after a critical failure there are no retries with a different allocation to reflect lessons learned), only that either is significant and so both need to be addressed.

If alignment is very easy, misuse becomes important. If it's hard, absence of misuse doesn't help. Though there is also a problem of cultural value drift, where AIs change their own culture very quickly on human timescales without anyone individually steering the outcome (including the AIs), so that at the end of this process (that might take merely months to years) the AIs in charge of civilization no longer care about human welfare, with neither misuse nor prosaic misalignment (in individual principal-agent relationships) being the cause of this outcome.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-14T21:16:43.334Z · LW(p) · GW(p)

An argument for danger of human-directed misuse doesn't work as an argument against dangers of AI-directed agentic activity.

 

I agree. But I was not trying to argue against dangers of AI-directed agentic activity. The thesis is not that "alignment risk" is overblown, nor is the comparison of the risks the point, it's that those risks accumulate such that the technology is guaranteed to be lethal for the average person. This is significant because the risk of misalignment is typically thought to be accepted because of rewards that will be broadly shared. "You or your children are likely to be killed by this technology, whether it works as designed or not" is a very different story from "there is a chance this will go badly for everyone, but if it doesn't it will be really great for everyone."

Replies from: Seth Herd
comment by Seth Herd · 2024-03-14T22:20:33.691Z · LW(p) · GW(p)

That's an excellent summary sentence. It seems like that would be a useful statement in advocating for AI slowdown/shutdown.

comment by Michael Tontchev (michael-tontchev-1) · 2024-03-14T23:39:41.314Z · LW(p) · GW(p)

The super simple claim is:

If an unaligned AI by itself can do near-world-ending damage, an identically powerful AI that is instead alignable to a specific person can do the same damage.

I agree that it could likely do damage, but it does cut off the branches of the risk tree where many AIs are required to do damage in a way that relies on them being similarly internally misaligned, or at least more likely to cooperate amongst themselves than with humans.

So I'm not convinced it's necessarily the same distribution of damage probabilities, but it still leaves a lot of room for doom. E.g. if you really can engineer superspreadable and damaging pathogens, you may not need that many AIs cooperating.

Replies from: jonathan-kallay
comment by YonatanK (jonathan-kallay) · 2024-03-15T03:10:20.053Z · LW(p) · GW(p)

If an unaligned AI by itself can do near-world-ending damage, an identically powerful AI that is instead alignable to a specific person can do the same damage.

If you mean that as the simplified version of my claim, I don't agree that it is equivalent.

Your starting point, with a powerful AI that can do damage by itself, is wrong. My starting point is groups of people whom we would not currently consider to be sources of risk, who become very dangerous as novel weaponry, along with changes in relations of economic production, unlock the means and the motive to kill very large numbers of people.

And (as I've tried to clarify in my other responses) the comparison of this scenario to misaligned AI cases is not the point, it's the threat from both sides of the alignment question.