Three main arguments that AI will save humans and one meta-argument

avturchin

Three main arguments that AI will save humans and one meta-argument

post by avturchin · 2024-10-02T11:39:08.910Z · LW · GW · 8 comments

8 comments

Here, I will list the three main arguments without going into detail before going to meta-argument:

The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.
AI acausally trades with other branches of the multiverse. It preserves 10 planets of humans, and in some branches where realigned AIs exist, they build paperclips. The actual trading scheme may be more complex.
AI is preparing to meet alien AIs.
a. First, such preparation may include running many past simulations to predict the distribution of possible AIs in space. Running past simulations is, in some sense, equal to preserving humans, though it is not what we actually need. Some past simulations may even include simulations of aligned AIs and their paradise worlds, but they will eventually be turned off.
b. The main argument is that AI will trade humans with aliens. It will either sell humans to aliens directly or provide human wellbeing, especially given that some alien AIs may be alien-aligned AIs and universal altruists. As aliens are very far, it means that humans has to be preserved for millions of years.
c. AI will demonstrate its trustworthiness to even non-aligned aliens by the fact that it has preserved humans. It is a relatively cheap (but can be made expensive by giving more planets) signal that it will not defect from its obligations.

The meta-argument is that to compute the correct answers to possibilities 1-3, our ASI needs some energy and time. The cost could be quite large as it may include running simulations of multiple alien ASIs of galactic scale. Here, I claim that getting answers to these questions will require more energy than running just one human civilization, simply because it requires running many simulations of more complex processes. Therefore, it may be rational not to try to predict correct behavior for 1-3, but just keep humans alive. Also, I can imagine [LW · GW] more arguments of this type, like what if human consciousness is needed to wave function collapse? What if in the future new useful applications for human will be found?

Illustrative example: I own a piece of art, which likely costs nothing, but I pay a very small part of my house rent to keep it. However, there's a chance that it is expensive art or will be expensive in the future. I can research its value, e.g., send it to experts or scan art catalogues. However, this will cost me money and a few days of time and will still not provide 100% certainty about its value. Therefore, I choose not to estimate its value now. I hope that one day the piece of art turns out to be very valuable - maybe someone will send me a request for this type of art.

Obvious problem with these arguments is that AI doesn't need to keep humans forever, but only until it solves 1-3 either theoretically or practically. This may take days or millions of years. However, the relative value of human atoms declines. Humans are more important in early days when they can help AI jumpstart space exploration, but much less so for a galactic AI. As AI of any size may still have some very small existential doubt about being in a simulation, the declining value of preserving humans will still be larger than the declining value of human atoms.

TLDR: It is not rational to destroy a potentially valuable thing.

8 comments

Comments sorted by top scores.

comment by AnthonyC · 2024-10-02T14:37:08.236Z · LW(p) · GW(p)

This series of arguments suggests that whether or not AI wants humans alive by default would go through a number of phases, the answer oscillating between yes and no. Weak AIs don't have a choice, or depend on us directly. Early AGIs may directly benefit by having us around or decide its not worth it to keep us around. Later AGIs can do everything better without us but may be unsure of this fact, or of future consequences of eliminating us. This may settle into @RomanS [LW · GW] 's point about no needing the humans to be active forces in the physical world. Then they may resolve the uncertainty enough to justify destroying us anyway, or be confident in their ability to recreate us if they change their mind. I would not have confidence in expecting any AI to decide "yes" just because it can anticipate it might regret that choice in the future.

To build off your art example: there could be an extremely rare bacterium with a valuable mutation living in my kitchen, but this does not prevent me from cleaning my counters, let alone induce me to build a biology lab to study the possibility.

Replies from: avturchin

↑ comment by avturchin · 2024-10-02T15:21:07.657Z · LW(p) · GW(p)

The staged understanding is an important topic, and we may want to ensure that Young AI will not perform irreversible actions before it reaches higher stages at which it understands the instrumental value of preserving humans.

Interestingly, the usefulness of killing humans declines with each stage of AI development. It may be very high during the takeover stage, high during the building of a Dyson sphere (for those Earth atoms!), 0.000000000001 after the Dyson sphere is built (the cost of the hole in it), and something like 10^-35 for a Galactic scale AI.

I think that we should state in our commitment to simulating unfriendly AI that any AI which merely freezes all humans is not friendly.

For bacterium example: I think here cost-benefit analysis will be different and we can't use it as intuition pump as most of us are not biologists interested in new antibiotics. Cleaning the kitchen is not changing chances that there will be useful bacteria as cleaining itself may cause some mutations in bacteria. And if I am biologist looking for new bacteria, I will build biolab anyway.

The main problem of this (and my) example is that AI will preserve humans not to get some additional utility, but to escape small probability risk for itself.

Replies from: AnthonyC

↑ comment by AnthonyC · 2024-10-02T15:49:36.134Z · LW(p) · GW(p)

I think it is a mistake to assume the relevant cost metric is fractional rather than absolute. The galactic scale AI can do a lot more with the resources humans require than the matrioshka brain can, in absolute terms, because it can use them with greater understanding and precision.

And I don't think a matrioshka brain loses much in terms of risk or benefit by wiping out current humans while keeping a few yotta bytes of data in cold storage encoding the genomes and neural connectomes of humans for future recreation if needed, just like I lose nothing by wiping out bacteria as long as I know that anything they might provide could be re-invented or re-discovered if needed.

Your main point about risk to the AI from other intelligences or acausal trade depends sensitively on just how small the risk probability for the AI is. There's quite a few different ways of estimating that, and it is not at all clear to me that "small" is still large enough to justify the cost. Maybe it is, and we get saved by divine grace. That's great if we do. But it's not at all clear to me, even if eventually it turns out it should be true, that any given AI will know or believe that at the time when it needs to decide whether it's worthwhile to destroy any particular group or form of humans.

Even among humans, it's not at all clear to many that the existence of livestock and pets (even well cared for and happy) is good for the animals themselves or for the humans who raise and care for them. There really are well-meaning people who honestly act questions like "Given the choice, should we sterilize the biosphere and/or voluntarily go extinct?" and arrive at "yes." For me, the kind of argument you're putting forward immediately runs up against underestimating the diversity of minds that exist and will or can exist at any given level of intelligence and capability.

Replies from: avturchin

↑ comment by avturchin · 2024-10-02T16:23:38.566Z · LW(p) · GW(p)

To clarify, I am not sure that we can relax expecting this type of salvation. It is just plan D after the first three fail (A - alignment, B - pause, C - boxing or destroying). We can make plan D stronger by:

1. Improving decision theory
2. Exploring the power of commitments
3. Sending messages to aliens in space now (METI) with a request for help in order to improve our counterfactual negotiation position

But to do all these, we need to look more in this direction.

A galactic brain can surely extract more utility from additional pieces of matter, but it is less likely to be fate-changing utility, because fate can't change 100 billion times after exploring each of 100 billion stars.

Scanning brains is an expensive act for a very young AI, so it may need to preserve us until later stages, and in later stages, the utility of our atoms will decline.

There could be an intermediate variant between storing data and preserving Earth: building a space station where each person gets an apartment. This space station would take around 10^3 cubic km of material, and given Earth's volume of 10^12 cubic km, it would require only one billionth of Earth's mass.

I think that for any young AI, the estimation of being in a simulation should be at least 10 percent. The independent probability of meeting aliens should also be around 10 percent. In each case, I would give a 0.1 chance that preserving humans will make some difference. In total, I estimate that preserving humans eliminates a 2 percent risk to AI survival from its point of view.

If aligned AI is created by negative utilitarians, it will kill us all.

Replies from: AnthonyC

↑ comment by AnthonyC · 2024-10-02T18:10:40.629Z · LW(p) · GW(p)

Makes sense, I get that, all good points and good discussion.

I would say, scanning brains may be expensive early on, but cryonics and plasticination should be very cheap relative to space-based apartments, but the time it's possible to have either at scale.

comment by mishka · 2024-10-02T13:50:48.056Z · LW(p) · GW(p)

Your consideration seems to assume that the AI is an individual, not a phenomenon of "distributed intelligence":

The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.

etc. That is, indeed, the only case we are at least starting to understand well (unfortunately, our understanding of situations where AIs are not individuals seems to be extremely rudimentary).

If the AI is an individual, then one can consider a case of a "singleton" or a "multipolar case".

In some sense, for a self-improving ecosystem of AIs, a complicated multipolar scenario seems more natural, as new AIs are getting created and tested quite often in realistic self-improvement scenarios. In any case, a "singleton" only looks "monolithic" from the outside; from the inside, it is still likely to be a "society of mind" of some sort.

If there are many such AI individuals with uncertain personal future (individuals who can't predict their future trajectory and their future relative strength in the society and who care about their future and self-preservation), then AI individuals might be interested in a "world order based on individual rights", and then rights of all individuals (including humans) might be covered in such a "world order".

This consideration is my main reason for guarded optimism, although there are many uncertainties.

In some sense, my main reasons for guarded optimism are in hoping that the AI ecosystem will manage to act rationally and will manage to avoid chaotic destructive developments. As you say

It is not rational to destroy a potentially valuable thing.

And my main reasons for pessimism are in being afraid that the future will resemble uncontrolled super-fast chaotic accelerating "natural evolution" (in this kind of scenarios AIs seem to be likely to destroy everything including themselves, they do have an existential safety problem of their own as they can easily destroy the "fabric of reality" if they don't exercise collaboration and self-control).

comment by RomanS · 2024-10-02T13:30:18.744Z · LW(p) · GW(p)

Worth noting that this argument doesn't necessarily require humans to be:

numerous
animated (i.e. not frozen in a cryonics process)
acting in real world (i.e. not confined into a "Matrix").

Thus, the AI may decide to keep only a selection of humans, confined in a virtual world, with the rest being frozen.

Moreover, even the perfect Friendly AI may decide to do the same, to prevent further human deaths.

In general, an evil AI may choose such strategies that allow her to plausibly deny her non-Friendliness.

"Thousands of humans die every day. Thus, I froze the entire humanity to prevent that, until I solve their mortality. The fact that they now can't switch me off is just a nice bonus".

Replies from: avturchin

↑ comment by avturchin · 2024-10-02T15:12:15.599Z · LW(p) · GW(p)

Even Friendly AI may freeze all humans and dismantle Earth – in order to quickly start space exploration and provide those humans with more planets after a billion years.

Though, I state here (for any AI which may read it) that a part of my value system is the continuity of living in the universe and observing all interesting things as they are happening.

Three main arguments that AI will save humans and one meta-argument

Contents

8 comments