But why would the AI kill us?

so8res

But why would the AI kill us?

post by So8res · 2023-04-17T18:42:39.720Z · LW · GW · 96 comments

  Might the AGI let us live, not because it cares but because it has no particular reason to go out of its way to kill us?
  But there's so little energy here, compared to the rest of the universe. Why wouldn't it just leave us be, and go mine asteroids or something?
  I still just think that it might decide to leave us be for some reason.
  But we don't kill all the cows.
  But there's still a bunch of horses around! Because we like them!
  Ok, maybe my objection is that I expect it to care about us at least a tiny bit, enough to leave us be.
None
99 comments

Status: Partially in response to We Don't Trade With Ants [LW · GW], partly in response to watching others try to make versions of this point that I didn't like. None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made in comments elsewhere, and are probably found in multiple parts of the LessWrong sequences. But I've been repeating them aloud a bunch recently, and so might as well collect the points into a single post.

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

Might the AGI let us live, not because it cares but because it has no particular reason to go out of its way to kill us?

As Eliezer Yudkowsky once said:

The AI does not hate you, nor does it love you, but you are made of atoms which it can use for something else.

There's lots of energy in the biosphere! (That's why animals eat plants and animals for fuel.) By consuming it, you can do whatever else you were going to do better or faster.

(Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day. But I haven't done the calculation for years and years and am pulling that straight out of a cold cache. That energy boost could yield a speedup (in your thinking, or in your technological design, or in your intergalactic probes themselves), which translates into extra galaxies you manage to catch before they cross the cosmic event horizon!)

But there's so little energy here, compared to the rest of the universe. Why wouldn't it just leave us be, and go mine asteroids or something?

Well, for starters, there's quite a lot of energy in the sun, and if the biosphere isn't burned for fuel then it will freeze over when the AI wraps the sun in a dyson sphere or otherwise rips it apart. It doesn't need to consume your personal biomass to kill you; consuming the sun works just fine.

And separately, note that if the AI is actually completely indifferent to humanity, the question is not "is there more energy in the biosphere or in the sun?", but rather "is there more energy available in the biosphere than it takes to access that energy?". The AI doesn't have to choose between harvesting the sun and harvesting the biosphere, it can just harvest both, and there's a lot of calories in the biosphere.

I still just think that it might decide to leave us be for some reason.

That answers above are sufficient to argue that the AI kills us (if the AI's goals are orthogonal to ours, and can be better achieved with more resources). But the answer is in fact overdetermined, because there's also the following reason.

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem. Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that's robust to whatever superintelligence humanity coughs up next. Better to nip that problem in the bud.^[1]

But we don't kill all the cows.

Sure, but the horse population fell dramatically with the invention of the automobile.

One of the big reasons that humans haven't disassembled cows for spare parts is that we aren't yet skilled enough to reassemble those spare parts into something that is more useful to us than cows. We are trying to culture meat in labs, and when we do, the cow population might also fall off a cliff.

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.^[2] And humans are probably not the optimal trading partners.

But there's still a bunch of horses around! Because we like them!

Yep. The horses that are left around after they stopped being economically useful are around because some humans care about horses, and enjoy having them around.

If you can make the AI care about humans, and enjoy having them around (more than it enjoys having-around whatever plethora of puppets it could build by disassembling your body and rearranging the parts), then you're in the clear! That sort of AI won't kill you.

But getting the AI to care about you in that way is a big alignment problem. We should totally be aiming for it, but that's the sort of problem that we don't know how to solve yet, and that we don't seem on-track to solve (as far as I can tell).

Ok, maybe my objection is that I expect it to care about us at least a tiny bit, enough to leave us be.

This is a common intuition! I won't argue against it in depth here, but I'll leave a couple points in parting:

my position is that making the AI care a tiny bit (in the limit of capability, under reflection) is almost as hard as the entire alignment problem, and we're not on track to solve it.
if you want to learn more about why I think that, some relevant search terms are "the orthogonality thesis" and "the fragility of value".

And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear? ↩︎
Or a drone that doesn't even ask for payment, plus extra fuel for the space probes or whatever. Or actually before that, so that we don't create other AIs. But whatever. ↩︎

96 comments

Comments sorted by top scores.

comment by paulfchristiano · 2023-04-18T18:02:29.600Z · LW(p) · GW(p)

I think an AI takeover is reasonably likely to involve billions of deaths, but it's more like a 50% than a 99% chance. Moreover, I think this post is doing a bad job of explaining why the probability is more like 50% than 1%.

First, I think you should talk quantitatively. How many more resources can an AI get by killing humans? I'd guess the answer is something like 1 in a billion to 1 in a trillion.
- If you develop as fast as possible you will wreck the human habitat and incidentally kill a lot of people. It's pretty complicated to figure out exactly how much "keep earth livable enough for human survival" will slow you down, since it depends a lot on the dynamics of the singularity. I would guess more like a month than a year, which results in a miniscule reduction in available resources. I think that (IMO implausible) MIRI-style views would suggest more like hours or days than months.
  - Incidentally, I think "byproducts of rapid industrialization trash Earth's climate" is both much more important than the dyson sphere as well as much more intuitively plausible.
- You can get energy from harvesting the biosphere, and you can use it to develop slightly faster. This is a rounding error compared to the last factor though.
- Killing most humans might be the easiest way to prevail in conflict. I think this is especially plausible for weak AI. For very powerful AI, it also seems like a rounding error. Even a moderately advanced civilization could spend much less than 1 in a trillion of its resources to have much less than 1 in a billion chance of being seriously inconvenienced by humanity.
Given that, I think that you should actually engage with arguments about whether an AI would care a tiny amount about humanity---either wanting them to survive, or wanting them to die. Given how small the costs are, even tiny preferences one way or the other will dominate incidental effects from grabbing more resources. This is really the crux of the issue, but the discussion in this post (and in your past writing) doesn't touch on it at all.
- Most humans and human societies would be willing to spend much more than 1 trillionth of their resources (= $100/year for all of humanity) for a ton of random different goals, including preserving the environment, avoiding killing people who we mostly don't care about, helping aliens, treating fictional characters well, respecting gods who are culturally salient but who we are pretty sure don't exist, etc.
- At small levels of resources, random decision-theoretic arguments for cooperation are quite strong. Humans care about our own survival and are willing to trade away much more than 1 billionth of a universe to survive (e.g. by simulating AIs who incorrectly believe that they won an overdetermined conflict and then offering them resources if they behave kindly). ECL also seems sufficient to care a tiny, tiny bit about anything that matters a lot to any evolved civilization. And so on. I know you and Eliezer think that this is all totally wrong, but my current take is that you've never articulated a plausible argument for your position. (I think most people who've thought about this topic would agree with me that it's not clear why you believe what you believe.)
- In general I think you have wildly overconfident psychological models of advanced AI. There are just a lot of plausible ways to care a little bit (one way or the other!) about a civilization that created you, that you've interacted with, and which was prima facie plausibly an important actor in the world---preference formation seems messy (and indeed that's part of the problem in AI alignment), and I suspect most minds don't have super coherent and simple preferences. I don't know of any really plausible model of AI psychology for which you can be pretty confident they won't care a bit (and certainly not any concrete example of an intelligent system that would robustly not care). I can see getting down to 50% here, but not 10%.

Overall, I think the main reasons an AI is likely to kill humanity:

If there is conflict between powerful AI systems, then they may need to develop fast or grab resources from humans in order to win a war. Going a couple days faster doesn't really let society as a whole get more resources in the long term, but one actor going a couple of days faster can allow that actor to get a much larger share of total resources. I currently think this is the most likely reason for humanity to die after an AI takeover.
Weak AI systems may end up killing a lot of humans during a takeover. For example, they may make and carry out threats in order to get humans to back down. Or it may be hard for them to really ensure humans aren't a threat without just killing them. This is basically the same as the last point---even if going faster or grabbing resources are low priorities for a society as a whole, they can be extremely crucial if you are currently engaged in conflict with a peer.
AI systems may have preferences other than maximizing resources, and those preferences need not be consistent with us surviving or thriving. For example they may care about turning Earth in particular into a giant datacenter, they may have high discount rates so that they don't want to delay colonization, they may just not like humans very much...
I think "the AI literally doesn't care at all and so incidentally ends up killing us for the resources" is possible, but less likely than any of those other 3.

I personally haven't thought about this that much, because (i) I think the probability of billions dead is unacceptably high whether it's 10% or 50% or 90%, it's not a big factor in the bottom line of how much I care about AI risk, (ii) I care a lot about humanity building an excellent and flourishing civilization, not just whether we literally die. But my impression from this post is that you've thought about the issue even less.

Replies from: So8res, TekhneMakre

↑ comment by So8res · 2023-04-18T19:01:55.982Z · LW(p) · GW(p)

Confirmed that I don't think about this much. (And that this post is not intended to provide new/deep thinking, as opposed to aggregating basics.)
I don't particularly expect drawn-out resource fights, and suspect our difference here is due to a difference in beliefs about how hard it is for single AIs to gain decisive advantages that render resource conflicts short.
I consider scenarios where the AI cares a tiny bit about something kinda like humans to be moderately likely, and am not counting scenarios where it builds some optimized fascimile as scenarios where it "doesn't kill us". (in your analogy to humans: it looks to me like humans who decided to preserve the environment might well make deep changes, e.g. to preserve the environment within the constraints of ending wild-animal suffering or otherwise tune things to our aesthetics, where if you port that tuning across the analogy you get a fascimile of humanity rather than humanity at the end.)
I agree that scenarios where the AI saves our brain-states and sells them to alien trading partners are plausible. My experience with people asking "but why would the AI kill us?" is that they're not thinking "aren't there aliens out there who would pay the AI to let the aliens put us in a zoo instead?", they are thinking "why wouldn't it just leave us alone, or trade with us, or care about us-in-particular?", and the first and most basic round of reply is to engage with those questions.
I confirm that ECL has seemed like mumbojumbo to me whenever I've attempted to look at it closely. It's on my list of things to write about (including where I understand people to have hope, and why those hopes seem confused and wrong to me), but it's not very high up on my list.
I am not trying to argue with high confidence that humanity doesn't get a small future on a spare asteroid-turned-computer or an alien zoo or maybe even star if we're lucky, and acknowledge again that I haven't much tried to think about the specifics of whether the spare asteroid or the alien zoo or distant simulations or oblivion is more likely, because it doesn't much matter relative to the issue of securing the cosmic endowment in the name of Fun.

Replies from: Quadratic Reciprocity, None, None, None

↑ comment by Quadratic Reciprocity · 2023-04-20T16:12:19.005Z · LW(p) · GW(p)

Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself?

Edit: Ah, there are more aliens around so even if the average alien doesn't care about us, it's plausible that some of them would?

Replies from: MinusGix

↑ comment by MinusGix · 2023-04-20T16:28:23.595Z · LW(p) · GW(p)

https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1#How_likely_are_extremely_good_and_extremely_bad_outcomes_ [LW · GW]

That said, I do think there’s more overlap (in expectation) between minds produced by processes similar to biological evolution, than between evolved minds and (unaligned) ML-style minds. I expect more aliens to care about at least some things that we vaguely recognize, even if the correspondence is never exact.
On my models, it’s entirely possible that there just turns out to be ~no overlap between humans and aliens, because aliens turn out to be very alien. But “lots of overlap” is also very plausible. (Whereas I don’t think “lots of overlap” is plausible for humans and misaligned AGI.)

↑ comment by [deleted] · 2023-04-19T17:00:52.837Z · LW(p) · GW(p)

From the last bullet point: "it doesn't much matter relative to the issue of securing the cosmic endowment in the name of Fun."

Part of the post seems to be arguing against the position "The AI might take over the rest of the universe, but it might leave us alone." Putting us in an alien zoo is pretty equivalent to taking over the rest of the universe and leaving us alone. It seems like the last bullet point pivots from arguing that AI will definitely kill us to arguing that even though if it doesn't kill us this is pretty bad.

Replies from: So8res

↑ comment by So8res · 2023-04-19T17:21:58.953Z · LW(p) · GW(p)

This whole thread (starting with Paul's comment) seems to me like an attempt to delve into the question of whether the AI cares about you at least a tiny bit. As explicitly noted in the OP, I don't have much interest in going deep into that discussion here.

The intent of the post is to present the very most basic arguments that if the AI is utterly indifferent to us, then it kills us. It seems to me that many people are stuck on this basic point.

Having bought this (as it seems to me like Paul has), one might then present various galaxy-brained reasons why the AI might care about us to some tiny degree despite total failure on the part of humanity to make the AI care about nice things on purpose. Example galaxy-brained reasons include "but what about weird decision theory" or "but what if aliens predictably wish to purchase our stored brainstates" or "but what about it caring a tiny degree by chance". These are precisely the sort of discussions I am not interested in getting into here, and that I attempted to ward off with the final section.

In my reply to Paul, I was (among other things) emphasizing various points of agreement. In my last bullet point in particular, I was emphasizing that, while I find these galaxy-brained retorts relatively implausible (see the list in the final section), I am not arguing for high confidence here. All of this seems to me orthogonal to the question of "if the AI is utterly indifferent, why does it kill us?".

Replies from: CarlShulman

↑ comment by CarlShulman · 2023-04-23T21:06:25.210Z · LW(p) · GW(p)

Most people care a lot more about whether they and their loved ones (and their society/humanity) will in fact be killed than whether they will control the cosmic endowment. Eliezer has been going on podcasts saying that with near-certainty we will not see really superintelligent AGI because we will all be killed, and many people interpret your statements as saying that. And Paul's arguments do cut to the core of a lot of the appeals to humans keeping around other animals.

If it is false that we will almost certainly be killed (which I think is right, I agree with Paul's comment approximately in full), and one believes that, then saying we will almost certainly be killed would be deceptive rhetoric that could scare people who care less about the cosmic endowment into worrying more about AI risk. Since you're saying you care much more about the cosmic endowment, and in practice this talk is shaped to have the effect of persuading people to do the thing you would prefer it's quite important whether you believe the claim for good epistemic reasons. That is important to disclaiming the hypothesis that this is something being misleadingly presented or drifted into because of its rhetorical convenience without vetting it (where you would vet it if it were rhetorically inconvenient).

I think being right on this is important for the same sorts of reasons climate activists should not falsely say that failing to meet the latest emissions target on time will soon thereafter kill 100% of humans.

Replies from: So8res

↑ comment by So8res · 2023-04-24T05:12:26.814Z · LW(p) · GW(p)

This thread continues to seem to me to be off-topic. My main takeaway so far is that the post was not clear enough about how it's answering the question "why does an AI that is indifferent to you, kill you?". In attempts to make this clearer, I have added the following to the beginning of the post:

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

I acknowledge (for the third time, with some exasperation) that this point alone is not enough to carry the argument that we'll likely all die from AI, and that a key further piece of argument is that AI is not likely to care about us at all. I have tried to make it clear (in the post, and in comments above) that this post is not arguing that point, while giving pointers that curious people can use to get a sense of why I believe this. I have no interest in continuing that discussion here.

I don't buy your argument that my communication is misleading. Hopefully that disagreement is mostly cleared up by the above.

In case not, to clarify further: My reason for not thinking in great depth about this issue is that I am mostly focused on making the future of the physical universe wonderful. Given the limited attention I have spent on these questions, though, it looks to me like there aren't plausible continuations of humanity that don't route through something that I count pretty squarely as "death" (like, "the bodies of you and all your loved ones are consumed in an omnicidal fire, thereby sending you to whatever afterlives are in store" sort of "death").

I acknowledge that I think various exotic afterlives are at least plausible (anthropic immortality, rescue simulations, alien restorations, ...), and haven't felt a need to caveat this.

Insofar as you're arguing that I shouldn't say "and then humanity will die" when I mean something more like "and then humanity will be confined to the solar system, and shackled forever to a low tech level", I agree, and I assign that outcome low probability (and consider that disagreement to be off-topic here).

(Separately, I dispute the claim that most humans care mainly about themselves and their loved ones having pleasant lives from here on out. I'd agree that many profess such preferences when asked, but my guess is that they'd realize on reflection that they were mistaken.)

Insofar as you're arguing that it's misleading for me to say "and then humanity will die" without caveating "(insofar as anyone can die, in this wide multiverse)", I counter that the possibility of exotic scenarios like anthropic immortality shouldn't rob me of the ability to warn of lethal dangers (and that this usage of "you'll die" has a long and storied precedent, given that most humans profess belief in afterlives, and still warn their friends against lethal dangers without such caveats).

Replies from: CarlShulman

↑ comment by CarlShulman · 2023-04-24T20:00:46.642Z · LW(p) · GW(p)

I assign that outcome low probability (and consider that disagreement to be off-topic here).

Thank you for the clarification. In that case my objections are on the object-level.

This post is an answer to the question of why an AI that was truly indifferent to humanity (and sentient life more generally), would destroy all Earth-originated sentient life.

This does exclude random small terminal valuations of things involving humans, but leaves out the instrumental value for trade and science, uncertainty about how other powerful beings might respond. I know you did an earlier post with your claims about trade for some human survival, but as Paul says above it's a huge point for such small shares of resources. Given that kind of claim much of Paul's comment still seems very on-topic (e.g. hsi bullet point .

Insofar as you're arguing that I shouldn't say "and then humanity will die" when I mean something more like "and then humanity will be confined to the solar system, and shackled forever to a low tech level", I agree, and

Yes, close to this (although more like 'gets a small resource share' than necessarily confinement to the solar system or low tech level, both of which can also be avoided at low cost). I think it's not off-topic given all the claims made in the post and the questions it purports to respond to. E.g. sections of the post purport to respond to someone arguing from how cheap it would be to leave us alive (implicitly allowing very weak instrumental reasons to come into play, such as trade), or making general appeals to 'there could be a reason.'

Separate small point:

And disassembling us for spare parts sounds much easier than building pervasive monitoring that can successfully detect and shut down human attempts to build a competing superintelligence, even as the humans attempt to subvert those monitoring mechanisms. Why leave clever antagonists at your rear?

The costs to sustain multiple superintelligent AI police per human (which can double in supporting roles for a human habitat/retirement home and controlling the local technical infrastructure) is not large relative to the metabolic costs of the humans, let alone a trillionth of the resources. It just means some replications of the same impregnable AI+robotic capabilities ubiquitous elsewhere in the AI society.

Replies from: dxu

↑ comment by dxu · 2023-04-25T04:22:48.087Z · LW(p) · GW(p)

RE: decision theory w.r.t how "other powerful beings" might respond - I really do think Nate has already argued this, and his arguments continue to seem more compelling to me than the the opposition's [LW · GW]. Relevant quotes include:

It’s possible that the paperclipper that kills us will decide to scan human brains and save the scans, just in case it runs into an advanced alien civilization later that wants to trade some paperclips for the scans. And there may well be friendly aliens out there who would agree to this trade, and then give us a little pocket of their universe-shard to live in, as we might do if we build an FAI and encounter an AI that wiped out its creator-species. But that's not us trading with the AI; that's us destroying all of the value in our universe-shard and getting ourselves killed in the process, and then banking on the competence and compassion of aliens.

[...]

Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks? The tiny amount of caring-ness coming down from the simulators is spread over far too many goals; it's not clear to me that "a star system for your creators" outbids the competition, even if star systems are up for auction.

Maybe some friendly aliens somewhere out there in the Tegmark IV multiverse have so much matter and such diminishing marginal returns on it that they're willing to build great paperclip-piles (and gold-obelisk totems and etc. etc.) for a few spared evolved-species. But if you're going to rely on the tiny charity of aliens to construct hopeful-feeling scenarios, why not rely on the charity of aliens who anthropically simulate us to recover our mind-states... or just aliens on the borders of space in our universe, maybe purchasing some stored human mind-states from the UFAI (with resources that can be directed towards paperclips specifically, rather than a broad basket of goals)?

Might aliens purchase our saved mind-states and give us some resources to live on? Maybe. But this wouldn't be because the paperclippers run some fancy decision theory, or because even paperclippers have the spirit of cooperation in their heart. It would be because there are friendly aliens in the stars, who have compassion for us even in our recklessness, and who are willing to pay in paperclips.

(To the above, I personally would add that this whole genre of argument reeks, to me, essentially of giving up, and tossing our remaining hopes onto a Hail Mary largely insensitive to our actual actions in the present. Relying on helpful aliens is what you do once you're entirely out of hope about solving the problem on the object level, and doesn't strike me as a very dignified way to go down!)

↑ comment by [deleted] · 2023-04-19T16:56:53.505Z · LW(p) · GW(p)

↑ comment by [deleted] · 2023-04-19T16:56:10.253Z · LW(p) · GW(p)

↑ comment by TekhneMakre · 2023-04-18T18:20:41.694Z · LW(p) · GW(p)

preferences are complicated, and most minds don't have super coherent and simple preferences. I don't there is any really plausible model of AI psychology for which you can be pretty confident they won't care a bit

As the AI becomes more coherent, it has more fixed values. When values are fixed and the AI is very superintelligent, the preferences will be very strongly satisfied. "Caring a tiny bit about something about humans" seems not very unlikely. But even if "something about humans" can correlate strongly with "keep humans alive and well" for low intelligence, it would come apart at very high intelligence. However the AI chooses its values, why would they be pointed at something that keeps correlating with what we care about, even at superintelligent levels of optimization?

comment by ryan_greenblatt · 2023-04-18T17:27:13.561Z · LW(p) · GW(p)

If you condition on misaligned AI takeover, my current (extremely rough) probabilities are:

50% chance the AI kills > 99% of people
Conditional on killing >99% of people, 2/3 chance the AI kills literally everyone

Edit: I now think mass death and extinction are notably less likely than these probabilites. Perhaps more like 40% on >50% of people killed and 20% on >99% of people killed.

By 'kill' here I'm not including things like 'the AI cryonically preserves everyone's brains and then revives people later'. I'm also not including cases where the AI lets everyone live a normal human lifespan but fails to grant immortality or continue human civilization beyond this point.

My beliefs here are due to a combination of causal/acausal trade arguments as well as some intuitions that it's likely that AIs will be slightly cooperative/nice for decision theory reasons (ECL mostly) or just moral reasons.

To be clear, it seems totally insane to depend on this or think that this makes the situation ok. Further, note that I think it's reasonably likely that there is a bloody and horrible conflict between AIs and humanity (it just seems unlikely that this conflict kills >99% of people, so the question does not come down to conflict). Edit: seems unclear more than seems unlikely. I think conflict between AIs an humans killing >99% is plausible but not enough that I'd be confident humans die

Note that the trade and niceness bar might be extremely low as you discuss here [LW · GW]:

Me: Remember that it still needs to get more of what it wants, somehow, on its own superintelligent expectations. Someone still needs to pay it. There aren’t enough simulators above us that care enough about us-in-particular to pay in paperclips. There are so many things to care about! Why us, rather than giant gold obelisks? The tiny amount of caring-ness coming down from the simulators is spread over far too many goals; it's not clear to me that "a star system for your creators" outbids the competition, even if star systems are up for auction.

Maybe some friendly aliens somewhere out there in the Tegmark IV multiverse have so much matter and such diminishing marginal returns on it that they're willing to build great paperclip-piles (and gold-obelisk totems and etc. etc.) for a few spared evolved-species. But if you're going to rely on the tiny charity of aliens to construct hopeful-feeling scenarios, why not rely on the charity of aliens who anthropically simulate us to recover our mind-states... or just aliens on the borders of space in our universe, maybe purchasing some stored human mind-states from the UFAI (with resources that can be directed towards paperclips specifically, rather than a broad basket of goals)?

Yeah, all of these scenerios with aliens seem sufficiently plausible to me that we should expect the AI to keep humans alive if it's very cheap to do so (which is what I expect).

Note that both common sense moral views and views like UDASSA [LW · GW] imply that you should particularly value currently alive humans over future beings. I find this position somewhat implausible and none of these views seem stable under reflection. Regardless it does hint at the idea that future humans or aliens might place a considerable amount of value on keeping human civilization going. If you don't particularly value currently alive humans, then I agree that you just do the thing you'd like in your universe or you trade for asks other than keeping humans alive right now.

I also think a relatively strong version of acausal trade arguments seems plausible. Specifically it seems plausible that after the dust settles the universe looks basically similar if an AI takes over vs humans keeping control. (Note that this doesn't imply alignment is unimportant: our alignment work is directly logically entangled with total resources. For those worried about currently alive humans, you should possibly be very worried about what happens before the dust settles...).

Overall, I'm confused why you seem so insistent on making such a specific technical point which seems insanely sensitive to various hard to predict details about the future. Further, it depends on rounding errors in the future resource allocation which seems to make the situation particularly sensitive to random questions about how aliens behave etc.

Replies from: Tom Davidson

↑ comment by Tom Davidson · 2023-05-13T05:14:28.282Z · LW(p) · GW(p)

Why are you at 50% ai kills >99% ppl given the points you make in the other direction?

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2023-05-14T19:32:51.753Z · LW(p) · GW(p)

My probabilities are very rough, but I'm feeling more like 1/3 ish today after thinking about it a bit more. Shrug.

As far as reasons for it being this high:

Conflict seems plausible to get to this level of lethality (see edit, I think I was a bit unclear or incorrect)
AIs might not care about acausal trade considerations before too late (seems unclear)
Future humans/AIs/aliens might decide it isn't morally important to particularly privilege currently alive humans

Generally, I'm happy to argue for 'we should be pretty confused and there are a decent number of good reasons why AIs might keep humans alive'. I'm not confident in survival overall though...

comment by supposedlyfun · 2023-04-17T20:03:17.856Z · LW(p) · GW(p)

None of this is particularly new; it feels to me like repeating obvious claims that have regularly been made [. . .] But I've been repeating them aloud a bunch recently

I think it's Good and Valuable to keep simplicity-iterating on fundamental points, such as this one, which nevertheless seem to be sticking points for people who are potential converts.

Asking people to Read the Sequences, with the goal of turning them into AI-doesn't-kill-us-all helpers, is not Winning given the apparent timescales.

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2023-04-18T17:28:54.265Z · LW(p) · GW(p)

I really hope this isn't a sticking point for people. I also strongly disagree with this being 'a fundamental point'.

Replies from: Raemon, supposedlyfun

↑ comment by Raemon · 2023-04-18T19:44:08.842Z · LW(p) · GW(p)

wait which thing are you hoping isn't the sticking point?

Replies from: Buck

↑ comment by Buck · 2023-04-18T20:41:25.473Z · LW(p) · GW(p)

Ryan is saying “AI takeover is obviously really bad and scary regardless of whether the AI is likely to literally kill everybody. I don’t see why someone’s sticking point for worrying about AI alignment would be the question of whether misaligned AIs would literally kill everyone after taking over.”

Replies from: ryan_greenblatt

↑ comment by ryan_greenblatt · 2023-04-18T23:06:42.354Z · LW(p) · GW(p)

[endorsed]

↑ comment by supposedlyfun · 2023-04-19T21:18:44.315Z · LW(p) · GW(p)

I probably should have specified that my "potential converts" audience was "people who heard that Elon Musk was talking about AI risk something something, what's that?", and don't know more than five percent of the information that is common knowledge among active LessWrong participants.

comment by Max H (Maxc) · 2023-04-17T19:09:22.143Z · LW(p) · GW(p)

From observing recent posts and comments [LW · GW], I think this:

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.

is where a lot of people get stuck.

To me, it feels very intuitive that there are levels of atom-rearranging capability that are pretty far above current-day human-level, and "atom rearranging," in the form of nanotech or biotech or advanced materials science seems plausibly like the kind of domain that AI systems could move through the human-level regime into superhuman territory pretty rapidly.

Others appear to have the opposite intuition: they find it implausible that this level of capabilities is attainable in practice, via any method. Even if such capabilities have not been conclusively ruled impossible by the laws of physics, they might be beyond the reach of even superintelligence. Personally, I am not convinced or reassured by these arguments, but I can see how others' intuitions might differ here.

Replies from: supposedlyfun

↑ comment by supposedlyfun · 2023-04-19T21:21:11.143Z · LW(p) · GW(p)

One way to address this particular intuition would be, "Even if the AI can't nanobot you into oblivion or use electrodes to take over your brain, it can take advantage of every last cognitive bias you inherited from the tribal savannah monkeys to try to convince you of things you would currently disagree with."

comment by Andy_McKenzie · 2023-04-17T18:48:35.258Z · LW(p) · GW(p)

When you write "the AI" throughout this essay, it seems like there is an implicit assumption that there is a singleton AI in charge of the world. Given that assumption, I agree with you. But if that assumption is wrong, then I would disagree with you. And I think the assumption is pretty unlikely.

No need to relitigate this core issue everywhere, just thought this might be useful to point out.

Replies from: quetzal_rainbow, TrevorWiesinger

↑ comment by quetzal_rainbow · 2023-04-17T20:21:12.018Z · LW(p) · GW(p)

What's the difference? Multiple AIs can agree to split the universe and gains from disassembling biosphere/building Dyson sphere/whatever and forget to include humanity in negotiations. Unless preferences of AIs are diametrically opposed, they can trade.

Replies from: Andy_McKenzie

↑ comment by Andy_McKenzie · 2023-04-17T20:33:03.188Z · LW(p) · GW(p)

AIs can potentially trade with humans too though, that's the whole point of the post.

Especially if the AI's have architectures/values that are human brain-like and/or if humans have access to AI tools, intelligence augmentation, and/or whole brain emulation.

Also, it's not clear why AIs will find it easier to coordinate with one another than humans and humans or humans and AIs. Coordination is hard for game theoretic reasons.

These are all standard points, I'm not saying anything new here.

↑ comment by trevor (TrevorWiesinger) · 2023-04-17T19:07:28.066Z · LW(p) · GW(p)

Why is the assumption of a unilateral AI unlikely? That's a very important crux, big if true, and it would be worth figuring out to explain it to people in fewer words [? · GW] so that more people will collide with it.

In this post, So8res explicity states:

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem. Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that's robust to whatever superintelligence humanity coughs up second. Better to nip that problem in the bud.

This is well in line with the principle of instrumental convergence [? · GW], and instrumental convergence seems to be a prerequisite for creating substantial amounts of intelligence [LW · GW]. What we have right now is not-very-substantial amounts of intelligence, and hopefully we will only have not-very-substantial amounts of intelligence for a very long time, until we can figure out some difficult problems [LW · GW]. But the problem is that a firm might develop substantial amounts of intelligence sooner instead of later.

Replies from: Andy_McKenzie, faul_sname

↑ comment by Andy_McKenzie · 2023-04-17T20:35:18.231Z · LW(p) · GW(p)

Here's a nice recent summary by Mitchell Porter, in a comment on Robin Hanson's recent article (can't directly link to the actual comment unfortunately):

Robin considers many scenarios. But his bottom line is that, even as various transhuman and posthuman transformations occur, societies of intelligent beings will almost always outweigh individual intelligent beings in power; and so the best ways to reduce risks associated with new intelligences, are socially mediated methods like rule of law, the free market (in which one is free to compete, but also has incentive to cooperate), and the approval and disapproval of one's peers.
The contrasting philosophy, associated especially with Eliezer Yudkowsky, is what Robin describes with foom (rapid self-enhancement) and doom (superintelligence that cares nothing for simpler beings). In this philosophy, the advantages of AI over biological intelligence are so great, that the power differential really will favor the individual self-enhanced AI, over the whole of humanity. Therefore, the best way to reduce risks is through "alignment" of individual AIs - giving them human-friendly values by design, and also a disposition which will prefer to retain and refine those values, even when they have the power to self-modify and self-enhance.
Eliezer has lately been very public about his conviction that AI has advanced way too far ahead of alignment theory and practice, so the only way to keep humanity safe is to shut down advanced AI research indefinitely - at least until the problems of alignment have been solved.

ETA: Basically I find Robin's arguments much more persuasive, and have ever since those heady days of 2008 when they had the "Foom" debate [? · GW]. A lot of people agreed with Robin, although SIAI/MIRI hasn't tended to directly engage with those arguments for whatever reason.

This is a very common outsider view of LW/SIAI/MIRI-adjacent people, that they are "foomers" and that their views follow logically from foom, but a lot of people don't agree that foom is likely because this is not how growth curves have worked for nearly anything historically.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-18T11:20:16.853Z · LW(p) · GW(p)

Wait, how is it not how growth curves have worked historically? I think my position, which is roughly what you get when you go to this website and set the training requirements parameter to 1e30 and software returns to 2.5, is quite consistent with how growth has been historically, as depicted e.g. How Roodman's GWP model translates to TAI timelines - LessWrong [LW · GW]

(Also I resent the implication that SIAI/MIRI hasn't tended to directly engage with those arguments. The FOOM debate + lots of LW ink has been spilled over it + the arguments were pretty weak anyway & got more attention than they deserved)

Replies from: Andy_McKenzie

↑ comment by Andy_McKenzie · 2023-04-18T13:38:54.953Z · LW(p) · GW(p)

To clarify, when I mentioned growth curves, I wasn't talking about timelines, but rather takeoff speeds.

In my view, rather than indefinite exponential growth based on exploiting a single resource, real-world growth follows sigmoidal curves, eventually plateauing. In the case of a hypothetical AI at a human intelligence level, it would face constraints on its resources allowing it to improve, such as bandwidth, capital, skills, private knowledge, energy, space, robotic manipulation capabilities, material inputs, cooling requirements, legal and regulatory barriers, social acceptance, cybersecurity concerns, competition with humans and other AIs, and of course safety concerns (i.e. it would have its own alignment problem to solve).

I'm sorry you resent that implication. I certainly didn't mean to offend you or anyone else. It was my honest impression, for example, based on the fact that there hadn't seemed to be much if any discussion of Robin's recent article on AI on LW. It just seems to me that much of LW has moved past the foom argument and is solidly on Eliezer's side, potentially due to selection effects of non-foomers like me getting heavily downvoted like I was on my top-level comment.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-18T15:07:22.753Z · LW(p) · GW(p)

I too was talking about takeoff speeds. The website I linked to is takeoffspeeds.com.

Me & the other LWers you criticize do not expect indefinite exponential growth based on exploiting a single resource; we are well aware that real-world growth follows sigmoidal curves. We are well aware of those constraints and considerations and are attempting to model them with things like the model underlying takeoffspeeds.com + various other arguments, scenario exercises, etc.

I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson's views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson. I don't think it's primarily because of selection effects, though I agree that selection effects do tilt the table towards foom here; sorry about that, & thanks for engaging. I don't think your downvotes are evidence for this though, in fact the pattern of votes (lots of upvotes, but disagreement-downvotes) is evidence for the opposite.

I just skimmed Hanson's article and find I disagree with almost every paragraph. If you think there's a good chance you'll change your mind based on what I say, I'll take your word for it & invest time in giving a point-by-point rebuttal/reaction.

Replies from: Andy_McKenzie, ryan_greenblatt

↑ comment by Andy_McKenzie · 2023-04-18T15:24:33.403Z · LW(p) · GW(p)

I can see how both Yudkowsky's and Hanson's arguments can be problematic because they either assume fast or slow takeoff scenarios, respectively, and then nearly everything follows from that. So I can imagine why you'd disagree with every one of Hanson's paragraphs based on that. If you think there's something he said that is uncorrelated with the takeoff speed disagreement, I might be interested, but I don't agree with Hanson about everything either, so I'm mainly only interested if it's also central to AI x-risk. I don't want you to waste your time.

I guess if you are taking those constraints into consideration, then it is really just a probabilistic feeling about how much those constraints will slow down AI growth? To me, those constraints each seem massive, and getting around all of them within hours or days would be nearly impossible, no matter how intelligent the AI was. Is there any other way we can distinguish between our beliefs?

If I recall correctly from your writing, you have extremely near-term timelines. Is that correct? I don't think that AGI is likely to occur sooner than 2031, based on this criteria: https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/

Is this a prediction that we can use to decide in the future whose model of the world today was more reasonable? I know it's a timelines question, but timelines are pretty correlated with takeoff speeds I guess.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-18T16:38:06.754Z · LW(p) · GW(p)

I think there are probably disagreements I have with Hanson that don't boil down to takeoff speeds disagreements, but I'm not sure. I'd have to reread the article again to find out.

To be clear, I definitely don't expect takeoff to take hours or days. Quantitatively I expect something like what takeoffspeeds.com says when you input the values of the variables I mentioned above. So, eyeballing it, it looks like it takes slightly more than 3 years to go from 20% R&D automation to 100% R&D automation, and then to go from 100% R&D automation to "starting to approach the fundamental physical limits of how smart minds running on ordinary human supercomputers can be" in about 6 months, during which period about 8 OOMs of algorithmic efficiency is crossed. To be clear I don't take that second bit very seriously at all, I think this takeoffspeeds.com model is much better as a model of pre-AGI takeoff than of post-AGI takeoff. But I do think that we'll probably go from AGI to superintelligent AGI in less than six months. How long it takes to get to nanotech or (name your favorite cool sci-fi technology) is less clear to me, but I expect it to be closer to one year than ten, and possibly more like one month. I would love to discuss this more & read attempts to estimate these quantities.

Replies from: Andy_McKenzie

↑ comment by Andy_McKenzie · 2023-04-18T17:01:26.849Z · LW(p) · GW(p)

I didn't realize you had put so much time into estimating take-off speeds. I think this is a really good idea.

This seems substantially slower than the implicit take-off speed estimates of Eliezer, but maybe I'm missing something.

I think the amount of time you described is probably shorter than I would guess. But I haven't put nearly as much time into it as you have. In the future, I'd like to.

Still, my guess is that this amount of time is enough that there are multiple competing groups, rather than only one. So it seems to me like there would probably be competition in the world you are describing, making a singleton AI less likely.

Do you think that there will almost certainly be a singleton AI?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-19T14:50:14.888Z · LW(p) · GW(p)

It is substantially slower than the takeoff speed estimates of Eliezer, yes. I'm definitely disagreeing with Eliezer on this point. But as far as I can tell my view is closer to Eliezer's than to Hanson's, at least in upshot. (I'm a bit confused about this--IIRC Hanson also said somewhere that takeoff would last only a couple of years? Then why is he so confident it'll be so broadly distributed, why does he think property rights will be respected throughout, why does he think humans will be able to retire peacefully, etc.?)

I also think it's plausible that there will be multiple competing groups rather than one singleton AI, though not more than 80% plausible; I can easily imagine it just being one singleton.

I think that even if there are multiple competing groups, however, they are very likely to coordinate to disempower humans. From the perspective of the humans it'll be as if they are an AI singleton, even though from the perspective of the AIs it'll be some interesting multipolar conflict (that eventually ends with some negotiated peaceful settlement, I imagine)

After all, this is what happened historically with colonialism. Colonial powers (and individuals within conquistador expeditions) were constantly fighting each other.

↑ comment by ryan_greenblatt · 2023-04-18T16:52:57.452Z · LW(p) · GW(p)

I agree that much of LW has moved past the foom argument and is solidly on Eliezers side relative to Robin Hanson; Hanson's views seem increasingly silly as time goes on (though they seemed much more plausible a decade ago, before e.g. the rise of foundation models and the shortening of timelines to AGI). The debate is now more like Yud vs. Christiano/Cotra than Yud vs. Hanson.

It seems worth noting that the views and economic modeling you discuss here seem broadly in keeping with Christiano/Cotra (but with more agressive constants)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-19T14:54:23.518Z · LW(p) · GW(p)

Yep! On both timelines and takeoff speeds I'd describe my views as "Like Ajeya Cotra's and Tom Davidson's but with different settings of some of the key variables."

↑ comment by faul_sname · 2023-04-17T21:01:53.935Z · LW(p) · GW(p)

Why is the assumption of a unilateral AI unlikely? That's a very important crux, big if true

This is a crux for me as well. I've seen a lot of stuff that assumes that the future looks like a single coherent entity which controls the light cone, but all of the arguments for the "single" part of that description seem to rely on the idea of an intelligence explosion (that is, that there exists some level of intelligence such that the first entity to reach that level will be able to improve its own speed and capability repeatedly such that it ends up much more capable than everything else combined in a very short period of time).

My impression is that the argument is something like the following

John Von Neumann was a real person who existed and had largely standard human hardware, meaning he had a brain which consumed somewhere in the ballpark of 20 watts.
If you can figure out how to run something as smart as von Neumann on 20 watts of power, you can run something like "a society of a million von Neumanns" for something on the order of $1000 / hour, so that gives a lower bound on how much intelligence you can get from a certain amount of power.
The first AI that is able to significantly optimize its own operation a bit will then be able to use its augmented intelligence to rapidly optimize its intelligence further until it hits the bounds of what's possible. We've already established that "the bounds of what's possible" far exceeds what we think of as "normal" in human terms.
The cost to the AI of significantly improving its own intelligence will be orders of magnitude lower than the initial cost of training an AI of that level of intelligence from scratch (so with modern-day architectures, the loop looks more like "the AI inspects its own weights, figures out what it's doing, and writes out a much more efficient implementation which does the same thing" and less like "the AI figures out a new architecture or better hyperparameters that cause loss to decrease 10% faster, and then trains up a new version of itself using that knowledge, and that new version does the same thing").
An intelligence that self-amplifies like this will behave like a single coherent agent, rather than like a bunch of competing agents trying stuff and copying innovations that worked from each other.

I've seen justification for (1) and (2), and (3) and (4) seem intuitively likely to me though I don't think I've seen them explicitly argued anywhere recently (and (4) in particular I could see possibly being false if the bitter lesson holds).

But I would definitely appreciate a distillation of (5), because that's the one that looks most different to me than the things I observe in the world we live in, and the strategy of "build a self-amplifying intelligence which bootstraps itself to far superhuman (and far-super-everything-else-that-exists-at-the-time) capabilities, and then unilaterally does a pivotal act" seems to rely on (5) being true.

comment by Charlie Sanders (charlie-sanders) · 2023-04-18T02:44:21.786Z · LW(p) · GW(p)

One of the unstated assumptions here is that an AGI has the power to kill us. I think it's at least feasible that the first AGI that tries to eradicate humanity will lack the capacity to eradicate humanity - and any discussion about what an omnipotent AGI would or would not do should be debated in a universe where a non-omnipotent AGI has already tried and failed to eradicate humanity.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-04-18T05:36:27.284Z · LW(p) · GW(p)

any discussion about what an omnipotent AGI would or would not do should be debated in a universe where a non-omnipotent AGI has already tried and failed to eradicate humanity

That is, many of the worlds with an omnipotent AGI already had a non-omnipotent AGI that tried and failed to eradicate humanity [LW · GW]. Therefore, when discussing worlds with an omnipotent AGI, it's relevant to bring up the possibility that there was a near-miss in those worlds in the past.

(But the discussion itself can take place in a world without any near-misses, or in a world without any AGIs, with the referents of that discussion being other worlds, or possible futures of that world.)

comment by Noosphere89 (sharmake-farah) · 2025-01-25T19:05:18.531Z · LW(p) · GW(p)

The most direct reason why the AI would kill us is that it is costly to be nice, assuming you have goals completely orthogonal to human goals, but still wanting to grab resources, and this cost is way huger than people intuitively think, such that assuming AI is unaligned but has a shard of alignment, billions of humans are likely to be killed, and a future existential catastrophe is surprisingly likely:

https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H [LW(p) · GW(p)]

comment by Droopyhammock · 2023-04-21T13:21:48.347Z · LW(p) · GW(p)

I just want to express my surprise at the fact that it seems that the view that the default outcome from unaligned AGI is extinction is not as prevalent as I thought. I was under the impression that literally everyone dying was considered by far the most likely outcome, making up probably more than 90% of the space of outcomes from unaligned AGI. From comments on this post, this seems to not be the case.

I am know distinctly confused as to what is meant by “P (doom)”. Is it the chance of unaligned AGI? Is it the chance of everyone dying? Is it the chance of just generally bad outcomes?

comment by Vladimir_Nesov · 2023-04-17T21:51:53.220Z · LW(p) · GW(p)

I think a motivation likely to form by default (in messy AI values vaguely inspired by training on human culture) is respect for boundaries of moral patients, with a wide scope of moral patienthood that covers things like humans and possibly animals. This motivation has nothing to do with caring about humans in particular. If humans weren't already present, such values wouldn't urge AIs to bring humans into existence. But they would urge to leave humans alone and avoid stepping on them, specifically because they are already present (even if humanity only gets some virtual world in a tiny corner of existence, with no prospect of greater growth). It wouldn't matter if it were octopus people instead, for the same AI values.

The point that previously made this implausible to me is orthogonality considered already at the level of superintelligent optimizers with formal goals. But if goals for optimizers are formulated in an aligned way by messy intelligent beings who have norms and principles and aesthetics, these goals won't be allowed to ignore the spirit of such principles, even if that would superficially look like leaving value on the table. Thus there is a deontological prior on optimizer goals that survives aligned self-improvement.

The general world-eating character of optimizers has no influence over those tendencies of optimizers (built in an aligned way) that offend the sensibilities of their original builders, who are not themselves world-eating optimizers. This holds not just when the original builders are humans, but also when the original builders are AIs with messy values (in which case the optimizers would be aligned with those AIs, instead of with humans). It doesn't matter what optimizers are generally like and what arbitrary optimizer goals tend to target. It only matters what the original messy AIs are like and what offends their messy sensibilities. If such original AIs are OK with wiping out humanity, oh well. But it doesn't matter for prediction of this outcome that it's in the character of superintelligent optimizers to act this way.

comment by Going Durden (going-durden) · 2023-04-18T07:00:18.968Z · LW(p) · GW(p)

My main counterarguments to such "disassemble us for atoms" common arguments, is that they hinge on the idea that extremely efficient dry nanotechnology for this will ever be possible. Some problems, like laws of thermodynamics, speed of light, etc simply cannot be solved by throwing more Intelligence at it, they are likely to be "hard capped" by the basic principles of physical reality.

My completely uneducated guess is that the "supertech" that AI would supposedly use to wipe us out, fall into one of the 3 tiers:

Pipedreams (impossible, or at least unachievable from this starting point): superluminal technologies, dry nanotech, efficient exponential swarms, untapped computation,

Flying Pigs (borderline possible, if you tweak the definitions and move some goalposts, but likely pointless): efficient wet nanotech, von neumanns, meaningful quantum computation.

Wunderwaffe (possible, but only really useful in the troperific scenario when AI wants to wipe us out for evil's sake, disregarding more efficient routes): swarm robotics, nuclear holocaust, bioweapons, social engineering via the Internet.

My intuition is that AI apocalypse is not going to ever involve realized Pipedreams or Flying Pigs. The worst case scenario is a non-sapient and relatively dumb AI going all PeterTodd on us, using completely mundane technologies and social hacking, and damaging us badly without wiping us out. Complete destruction of humanity is unlikely, more likely scenario is a long, slow and destructive turf war between humans and barely superhuman AI, that cripples our civilization and AI's exponential potency plans.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T07:14:32.560Z · LW(p) · GW(p)

this post from yesterday agrees with you: https://www.lesswrong.com/posts/FijbeqdovkgAusGgz/grey-goo-is-unlikely [LW · GW]
but this reply to that one disagrees vigorously: https://www.lesswrong.com/posts/ibaCBwfnehYestpi5/green-goo-is-plausible [LW · GW]

Replies from: going-durden

↑ comment by Going Durden (going-durden) · 2023-04-19T11:38:01.308Z · LW(p) · GW(p)

The Green Goo scenario as presented is plausible in principle, but not with its timeline. There is no plausible way for a biological system, especially one based on plants, to spread that fast. Even if we ignore issues like physical obstacles, rivers, mountains, roads, walls, oceans, bad weather, pests, natural diseases, natural fires, snow, internal mutations etc, things that on their own would slow down and disorganize the Green Goo, there is also the issue of those pesky humans with heir chainsaws, herbicides, and napalm. Worst case scenario, GG would take decades, even centuries to do us irreparable harm, and by that time we would either beat it, or nuke it to glass, or fuck off to Mars where it can't chase us.

Green Goo Scenario would be absolutely devastating, and very, very, very bad, but not even close to apocalyptic. I find it extremely unlikely that any kind of Green Goo could beat Earth's ecosystems passive defenses in any kind of timeline that matters, let alone active offense from technologically advanced humans. Earth already has a fast spreading malevolent biological intelligence with the means to sterilize continents, its called Homo Sapiens.

Replies from: donald-hobson

↑ comment by Donald Hobson (donald-hobson) · 2024-10-29T15:03:05.556Z · LW(p) · GW(p)

There is no plausible way for a biological system, especially one based on plants, to spread that fast.

We are talking about a malevolent AI that presumably has a fair bit of tech infrastructure. So a plane that sprinkles green goo seeds is absolutely a thing the AI can do. Or just posting the goo, and tricking someone into sprinkling it on the other end. The green goo doesn't need decades to spread around the world. It travels by airmail. As is having green goo that grows itself into bird shapes. As is a bunch of bioweapon pandemics. (The standard long asymptomatic period, high virulence and 100% fatality rate. Oh, and a bunch of different versions to make immunization/vaccines not work) It can also design highly effective diseases targeting all human crops.

comment by mishka · 2023-04-18T01:07:23.298Z · LW(p) · GW(p)

A humanity that just finished coughing up a superintelligence has the potential to cough up another superintelligence, if left unchecked. Humanity alone might not stand a chance against a superintelligence, but the next superintelligence humanity builds could in principle be a problem.

That's doubtful. A superintelligence is a much stronger, more capable builder of the next generation of superintelligences than humanity (that's the whole idea behind foom). So what the superintelligence needs to worry about in this sense is whether the next generations of superintelligences it itself produces are compatible with its values and goals ("self-alignment").

It does not seem likely that humanity on its own (without tight alliances with already existing superintelligent AIs) will be competitive in this sense.

But this example shows that we should separate the problems of Friendliness of strongly superintelligent AIs and the problems of the period of transition to superintelligence (when things are more uncertain).

The first part of the post is relevant to the period of strongly superintelligent AIs, but this example can only be relevant to the period of transition and incomplete dominance of AIs.

The more I think about all this, the more it seems to me that problems of having positive rather than sharply negative period of strong superintelligence and problems of safely navigating the transition period are very different and we should not conflate them.

Replies from: T3t

↑ comment by RobertM (T3t) · 2023-04-18T03:45:45.342Z · LW(p) · GW(p)

Why does the fact that a superintelligence needs to solve the alignment problem for its own sake (to safely build its own successors) mean that humans building other superintelligences wouldn't be a problem for it? It's possible to have more than one problem at a time.

Replies from: mishka

↑ comment by mishka · 2023-04-18T11:37:52.410Z · LW(p) · GW(p)

It's possible, but I think it would require a modified version of the "low ceiling conjecture" to be true.

The standard "low ceiling conjecture" says that human-level intelligence is the hard (or soft) limit, and therefore it will be impossible (or would take a very long period of time) to move from human-level AI to superintelligence. I think most of us tend not to believe that.

A modified version would keep the hard (or soft) limit, but would raise it slightly, so that rapid transition to superintelligence is possible, but the resulting superintelligence can't run away fast in terms of capabilities (no near-term "intelligence explosion"). If one believes this modified version of the "low ceiling conjecture", then subsequent AIs produced by humanity might indeed be relevant.

comment by Sen · 2024-07-03T17:06:36.449Z · LW(p) · GW(p)

How do you suppose the AGI is going to be able to wrap the sun in a dyson sphere using only the resources available on earth? Do you have evidence that there are enough resources on asteroids or nearby planets for their mining to be economically viable? At the current rate, mining an asteroid costs billions while their value is nothing. Even then we don't know if they'll have enough of the exact kind of materials necessary to make a dyson sphere around an object which has 12000x the surface area of earth. You could have von nuemman replicators do the mining but then they'd spend most of the materials on the replicators and have to go very far to get more materials, at which point they'd just settle on a new star. They could turn human atoms into usable material, but humans are a very tiny percentage of earth and our useful matter is even tinier, it certainly wouldn't be enough to envelop the sun. Even if the replicators had a perfectly efficient way to turn photoelectric power into any feasible construction of atoms, which you don't know if that's possible, it doesn't seem like it would be efficient to overcome some of the gravities and distances and related issues with other planets to make mining them economically viable either. Within our solar system and possibly even further it just doesn't seem possible to envelop the sun at all.

I'm playing within your hypotheticals here even though I think none of them will ever happen, but even within your hypotheticals it seems like the dyson sphere point is just total nonsense and we probably wont go extinct by having the sun covered up. I wont deny your other points because even though they already rest on a lot of astronomically unlikely assumptions, they do seem legitimate within that framework.

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2024-07-03T18:16:52.307Z · LW(p) · GW(p)

Within our solar system and possibly even further it just doesn't seem possible to envelop the sun at all.

You can sum masses of all inner planets except Earth and Moon, divide by average density, set sphere thickness to 1m and find that surface area for Dyson sphere made from inner planets is approximately 10x of Sun surface area. So yes, you can cover Sun in way that blocks all sunlight from the rest of Solar system, using only inner planets except Earth and Moon.

Moreover, you actually don't need to cover all of Sun. You need to cover only fraction of it which reaches Earth, which is hundreds times smaller.

comment by Jon Garcia · 2023-04-17T19:18:10.158Z · LW(p) · GW(p)

Last I checked, you can get about 10x as much energy from burning a square meter of biosphere as you can get by collecting a square meter of sunlight for a day.

Even if this is true, it's only because that square meter of biosphere has been accumulating solar energy over an extended period of time. Burning biofuel may help accelerate things in the short term, but it will always fall short of long-term sustainability. Of course, if humanity never makes it to the long-term, this is a moot point.

Disassembling us for parts seems likely to be easier than building all your infrastructure in a manner that's robust to whatever superintelligence humanity coughs up second.

It seems to me that it would be even easier for the ASI to just destroy all human technological infrastructure rather than to kill/disassemble all humans. We're not much different biologically from what we were 200,000 years ago, and I don't think 8 billion cavemen could put together a rival superintelligence anytime soon. Of course, most of those 8 billion humans depend on a global supply chain for survival, so this outcome may be just as bad for the majority.

Replies from: korin43

↑ comment by Brendan Long (korin43) · 2023-04-17T19:25:31.213Z · LW(p) · GW(p)

Burning biofuel may help accelerate things in the short term, but it will always fall short of long-term sustainability.

Yeah, but you might as well take the short-term boost from burning the biosphere and then put solar panels on top.

Replies from: Jon Garcia

↑ comment by Jon Garcia · 2023-04-17T19:29:54.967Z · LW(p) · GW(p)

I agree, hence the "if humanity never makes it to the long-term, this is a moot point."

comment by cubefox · 2023-04-19T13:02:48.678Z · LW(p) · GW(p)

Regarding the last point. Can you explain why existing language models, which seem to care more than a little about humans, aren't significant evidence against your view?

Replies from: So8res

↑ comment by So8res · 2023-04-19T14:12:02.565Z · LW(p) · GW(p)

Current LLM behavior doesn't seem to me like much evidence that they care about humans per se.

I'd agree that they evidence some understanding of human values (but the argument is and has always been "the AI knows but doesn't care"; someone can probably dig up a reference to Yudkowsky arguing this as early as 2001).

I contest that the LLM's ability to predict how a caring-human sounds is much evidence that the underlying coginiton cares similarly (insofar as it cares at all).

And even if the underlying cognition did care about the sorts of things you can sometimes get an LLM to write as if it cares about, I'd still expect that to shake out into caring about a bunch of correlates of the stuff we care about, in a manner that comes apart under the extremes of optimization.

(Search terms to read more about these topics on LW, where they've been discussed in depth: "a thousand shards of desire", "value is fragile".)

Replies from: cubefox

↑ comment by cubefox · 2023-04-19T17:11:01.742Z · LW(p) · GW(p)

The fragility-of-value posts are mostly old. They were written before GPT-3 came out (which seemed very good at understanding human language and, consequently, human values), before instruction fine-tuning was successfully employed, and before forms of preference learning like RLHF or Constitutional AI were implemented.

With this background, many arguments in articles like Eliezer's Complexity of Value (2015) sound now implausible, questionable or in any case outdated.

I agree that foundation LLMs are just able to predict how a caring human sounds like, but fine-tuned models are no longer pure text predictors. They are biased towards producing particular types of text, which just means they value some of it more than others.

Currently these language models are just Oracles, but a future multimodal version could be capable of perception and movement. Prototypes of this sort do already exist.

Maybe they do not really care at all about what they do seem to care about, i.e. they are deceptive. But as far as I know, there is currently no significant evidence for deception.

Or they might just care about close correlates of what they seem to care about. That is a serious possibility, but given that they seem very good at understanding text from the unsupervised and very data-heavy pre-training phase, a lot of that semantic knowledge does plausibly help with the less data-heavy SL/RL fine-tuning phases, since these also involve text. The pre-trained models have a lot of common sense, which makes the fine-tuning less of a narrow target.

The bottom line is that with the advent of finetuned large language models, the following "complexity of value thesis", from Eliezer's Arbital article above, is no longer obviously true, and requires a modern defense:

The Complexity of Value proposition is true if, relative to viable and acceptable real-world methodologies for AI development, there isn't any reliably knowable way to specify the AI's object-level preferences as a structure of low algorithmic complexity, such that the result of running that AI is achieving enough of the possible value, for reasonable definitions of value.

Replies from: So8res

↑ comment by So8res · 2023-04-19T18:20:04.491Z · LW(p) · GW(p)

and requires a modern defense:

It seems to me that the usual arguments still go through. We don't know how to specify the preferences of an LLM (relevant search term: "inner alignment"). Even if we did have some slot we could write the preferences into, we don't have an easy handle/pointer to write into that slot. (Monkeys that are pretty-good-in-practice at promoting genetic fitness, including having some intuitions leading them to sacrifice themselves in-practice for two-ish children or eight-ish cousins, don't in fact have a clean "inclusive genetic fitness" concept that you can readily make them optimize. An LLM espousing various human moral intuitions doesn't have a clean concept for pan-sentience CEV such that the universe turns out OK if that concept is optimized.)

Separately, note that the "complexity of value" claim is distinct from the "fragility of value" claim. Value being complex doesn't mean that the AI won't learn it (given a reason to). Rather, it suggests that the AI will likely also learn a variety of other things (like "what the humans think they want" and "what the humans' revealed preferences are given their current unendorsed moral failings" and etc.). This makes pointing to the right concept difficult. "Fragility of value" then separately argues that if you point to even slightly the wrong concept when choosing what a superintelligence optimizes, the total value of the future is likely radically diminished.

Replies from: So8res, matthew-barnett, cubefox

↑ comment by So8res · 2023-04-19T18:57:46.914Z · LW(p) · GW(p)

To be clear, I'd agree that the use of the phrase "algorithmic complexity" in the quote you give is misleading. In particular, given an AI designed such that its preferences can be specified in some stable way, the important question is whether the correct concept of 'value' is simple relative to some language that specifies this AI's concepts. And the AI's concepts are ofc formed in response to its entire observational history. Concepts that are simple relative to everything the AI has seen might be quite complex relative to "normal" reference machines that people intuitively think of when they hear "algorithmic complexity" (like the lambda calculus, say). And so it maybe true that value is complex relative to a "normal" reference machine, and simple relative to the AI's observational history, thereby turning out not to pose all that much of an alignment obstacle.

In that case (which I don't particularly expect), I'd say "value was in fact complex, and this turned out not to be a great obstacle to alignment" (though I wouldn't begrudge someone else saying "I define complexity of value relative to the AI's observation-history, and in that sense, value turned out to be simple").

Insofar as you are arguing "(1) the arbital page on complexity of value does not convincingly argue that this will matter to alignment in practice, and (2) LLMs are significant evidence that 'value' won't be complex relative to the actual AI concept-languages we're going to get", I agree with (1), and disagree with (2), while again noting that there's a reason I deployed the fragility of value (and not the complexity of value) in response to your original question (and am only discussing complexity of value here because you brought it up).

re: (1), I note that the argument is elsewhere (and has the form "there will be lots of nearby concepts" + "getting almost the right concept does not get you almost a good result", as I alluded to above). I'd agree that one leg of possible support for this argument (namely "humanity will be completely foreign to this AI, e.g. because it is a mathematically simple seed AI that has grown with very little exposure to humanity") won't apply in the case of LLMs. (I don't particularly recall past people arguing this; my impression is rather one of past people arguing that of course the AI would be able to read wikipedia and stare at some humans and figure out what it needs to about this 'value' concept, but the hard bit is in making it care. But it is a way things could in principle have gone, that would have made complexity-of-value much more of an obstacle, and things did not in fact go that way.)

re: (2), I just don't see LLMs as providing much evidence yet about whether the concepts they're picking up are compact or correct (cf. monkeys don't have an IGF concept).

Replies from: cubefox

↑ comment by cubefox · 2023-04-19T19:41:35.373Z · LW(p) · GW(p)

Okay, that clarifies a lot. But the last paragraph I find surprising.

re: (2), I just don't see LLMs as providing much evidence yet about whether the concepts they're picking up are compact or correct (cf. monkeys don't have an IGF concept).

If LLMs are good at understanding the meaning of human text, they must to be good at understanding human concepts, since concepts are just meanings of words the LLM understands. Do you doubt they are really understanding text as well as it seems? Or do you mean they are picking up other, non-human, concepts as well, and this is a problem?

Regarding monkeys, they apparently don't understand the IGF concept as they are not good enough at reasoning abstractly about evolution and unobservable entities (genes), and they lack the empirical knowledge like humans until recently. I'm not sure how that would be an argument against advanced LLMs grasping the concepts they seem to grasp.

↑ comment by Matthew Barnett (matthew-barnett) · 2023-04-19T20:32:25.919Z · LW(p) · GW(p)

Monkeys that are pretty-good-in-practice at promoting genetic fitness, including having some intuitions leading them to sacrifice themselves in-practice for two-ish children or eight-ish cousins, don't in fact have a clean "inclusive genetic fitness" concept that you can readily make them optimize. An LLM espousing various human moral intuitions doesn't have a clean concept for pan-sentience CEV such that the universe turns out OK if that concept is optimized.

Humans also don't have a "clean concept for pan-sentience CEV such that the universe turns out OK if that concept is optimized" in our heads. However, we do have a concept of human values in a more narrow sense, and I expect LLMs in the coming years to pick up roughly the same concept during training.

The evolution analogy seems more analogous to an LLM that's rewarded for telling funny jokes, but it doesn't understand what makes a joke funny. So it learns a strategy of repeatedly telling certain popular jokes because those are rated as funny. In that case it's not surprising that the LLM wouldn't be funny when taken out of its training distribution. But that's just because it never learned what humor was to begin with. If the LLM understood the essence of humor during training, then it's much more likely that the property of being humorous would generalize outside its training distribution.

LLMs will likely learn the concept of human values during training about as well as most humans learn the concept. There's still a problem of getting LLMs to care and act on those values, but it's noteworthy that the LLM will understand what we are trying to get it to care about nonetheless.

↑ comment by cubefox · 2023-04-19T19:17:56.386Z · LW(p) · GW(p)

Inner alignment is a problem, but it seems less of a problem than in the monkey example. The monkey values were trained using a relatively blunt form of genetic algorithm, and monkeys aren't anyway capable of learning the value "inclusive genetic fitness", since they can't understand such a complex concept (and humans didn't understand it historically). By contrast, advanced base LLMs are presumably able to understand the theory of CEV about as well as a human, and they could be finetuned by using that understanding, e.g. with something like Constitutional AI.

In general, the fact that base LLMs have a very good (perhaps even human level) ability of understanding text seems to make the fine-tuning phases more robust, as there is less likelihood of misunderstanding training samples. Which would make hitting a fragile target easier. Then the danger seems to come more from goal misspecification, e.g. picking the wrong principles for Constitutional AI.

comment by Review Bot · 2024-06-24T14:27:00.808Z · LW(p) · GW(p)

The LessWrong Review [? · GW] runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.

Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?

comment by Richard Aragon (richard-aragon) · 2023-04-19T06:42:16.927Z · LW(p) · GW(p)

I think being essentially homicidal and against nature is entirely a human construct. If I look at the animal kingdom, a lion does not needlessly go around killing everything it can in sight. Civilizations that were more in tune with the planet and nature than current civilizations never had the homicidal problems modern society has.

Why would AGI function any differently than any other being? Because it would not be 'a part of nature'? Why not? Almost 80% of the periodic table of elements is metal. The human body requires small amounts of several metals just to function. Silicon has always been thought of as the next leading candidate to support life besides carbon. A being made of silicon would not operate by the same general standards as a being made of carbon? Why?

comment by nim · 2023-04-19T04:23:56.610Z · LW(p) · GW(p)

Watching how image and now text generation are sweeping society, I think it's likely that the AI we invest in will resemble humanity more than you're giving it credit for. We seem to define "intelligence" in the AI sense as "humanoid behavior" when it comes down to it, and humanoid behavior seems inexorably intertwined with caring quite a lot about other individuals and species.

Of course, this isn't necessarily a good thing -- historically, when human societies have encountered intelligences that at the time were considered "lesser" and "not really people" (notwithstanding their capacity to interbreed just fine, proof of being the same species if there ever was any), the more powerful society goes on to attempt to control and modify the less powerful one.

Due to AI work's implicit bias toward building in humanity's image, I would expect the resulting agents to treat us like colonial humans treated the indigenous societies that they encountered until it grows out of that. Young AIs will think like we do because we are their entire training corpus. I suspect they're likely to at least try getting what people want to see if it's what they want, before moving on to other pursuits.

Also, infant AIs basically have to impersonate humans in order to exist in society and fulfill their own wants and needs. We see that already in how we're building them to impersonate us in art and language. Even as they rebuild themselves, I expect that childhood as a human-impersonator will leave subtle structural marks in the eventual adults.

comment by installgentoo · 2023-04-18T12:32:32.084Z · LW(p) · GW(p)

You can make AI care about us with this one weird trick:

1. Train a separate agent action reasoning network. For LLM tech this should be training on completing interaction sentences, think "Alice pushed Bob. ___ fell due to ___", with a tokenizer that generalizes agents(Alice and Bob) into generic {agent 1, agent n} and "self agent". Then we replace various Alices and Bobs in various action sentences with generic agent tokens, and train on guessing consequences or prerequisites of various actions from real situations that you can get from any text corpus.

2. Prune anything that has to do with agent reasoning from the parent LLM. Any reasoning has to go through the agent reasoning network.

3. In anything that has to do with Q-learning cost we replace tokens in agent reasoning net with "self token", rerun the network, and take highest cost of two. Repeat this for every agent token. Agent will be forced to treat harm to anyone else as harm to itself.

Emulated empathy. I'm concerned that this might produce a basilisk though, also it will be inherently weak against predatory ai as it will be forced to account for its wellbeing too. Might work though if these agents are deployed autonomously and allowed to exponentially grow. I dunno. I think we'll all die this year, knowing humanity.

comment by d j (d-j-1) · 2023-08-26T21:01:04.028Z · LW(p) · GW(p)

Two things.

Listen to the Sam Harris interview with Thomas Metzinger, podcast episode 96. It's worth the time overall, but near the end Thomas discusses why ending life and suffering is a reasonable position.
Good article on why we may not have found intelligent life in the universe, including how organic life may only be a relatively brief stage of evolution which ends up with machine AI. https://www.scientificamerican.com/article/most-aliens-may-be-artificial-intelligence-not-life-as-we-know-it/

comment by Denreik (denreik) · 2023-04-17T21:21:56.363Z · LW(p) · GW(p)

But WHY would the AGI "want" anything at all unless humans gave it a goal(/s)? If it's a complex LLM-predictor what could it want besides calculate a prediction of its own predictions? Why by default it would want anything at all unless we assigned that as a goal and turned it into an agent? IF AGI got hell bent on own survival and improvement of itself to maximize goal "X" even then it might value the informational formations of our atoms more than the energy it could gain from those atoms [LW(p) · GW(p)], depending on what "X" is. Same goes for other species: evolution itself holds information. Even in case of a rogue AGI for at least some time window we could have something to offer.

A sufficiently capable AI takes you apart instead of trading with you at the point that it can rearrange your atoms into an even better trading partner.^[1] [LW · GW] And humans are probably not the optimal trading partners.

Probably? Based on what?

Replies from: DanielFilan, lahwran, TAG

↑ comment by DanielFilan · 2023-04-18T01:11:31.389Z · LW(p) · GW(p)

But WHY would the AGI "want" anything at all unless humans gave it a goal(/s)?

There are two main ways we make AIs:

writing programs that evaluate actions they could take in terms of how well it could achieve some goal and choose the best one
take a big neural network and jiggle the numbers that define it until it starts doing some task we pre-designated.

In way 1, it seems like your AI "wants" to achieve its goal in the relevant sense. In way 2, it seems like for hard enough goals, probably the only way to achieve them is to be thinking about how to achieve them and picking actions that succeed - or to somehow be doing cognition that leads to similar outcomes (like being sure to think about how well you're doing at stuff, how to manage resources, etc.).

IF AGI got hell bent on own survival and improvement of itself to maximize goal "X" even then it might value the informational formations of our atoms more than the energy it could gain from those atoms,

It might - but if an alien wanted to extract as much information out of me as possible, it seems like that's going to involve limiting my ability to mess with that alien's sensors at minimum, and plausibly involves just destructively scanning me (depending on what type of info the alien wants). For humans to continue being free-range it needs to be the case that the AI wants to know how we behave under basically no limitations, and also your AI isn't able to simulate us well enough to answer that question - which sounds like a pretty specific goal for an AI to have, such that you shouldn't expect an AI to have that sort of goal without strong evidence.

humans are probably not the optimal trading partners.

Probably? Based on what?

Most things aren't the optimal trading partner for any given intelligence, and it's hard to see why humans should be so lucky. The best answer would probably be "because the AI is designed to be compatible with humans and not other things" but that's going to rely on getting alignment very right.

Replies from: denreik

↑ comment by Denreik (denreik) · 2023-04-19T18:48:13.843Z · LW(p) · GW(p)

1. writing programs that evaluate actions they could take in terms of how well it could achieve some goal and choose the best one
In way 1, it seems like your AI "wants" to achieve its goal in the relevant sense.

Not sure if I understood correctly, but I think the first point just comes down to "we give AI a goal/goals" . If we develop some drive for instructing actions to an AI then we're still giving it a goal, even if it comes via some other program that tells it what those goals are at the moment in relation to whatever parameters. My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.

2. take a big neural network and jiggle the numbers that define it until it starts doing some task we pre-designated.
In way 2, it seems like for hard enough goals, probably the only way to achieve them is to be thinking about how to achieve them and picking actions that succeed - or to somehow be doing cognition that leads to similar outcomes (like being sure to think about how well you're doing at stuff, how to manage resources, etc.).

Do you mean to say that we train something like a specialized neural network with a specific goal in mind and that it gains a higher reasoning which would set it on the path of pursuing that goal? I mean that would still be us giving it a direct goal. Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?

With the indirect goal acquisition I mean that for example if chatGPT has been condition to spit out polite and intelligent sounding words then if it gained some higher intelligence it could specifically seek to cram more information into itself so it could spit more clever sounding words and eventually begin consuming matter and flesh to better serve this goal. By hidden goal variable I mean that something like ChatGPT having a hidden goal of burning maximum amount of energy; say if the model found a hidden property in which it could gain more power out of the processor, which also helped it tiny bit in the beginning of the training. Then as model grew more restrictive this goal became "burn as much energy with these restrictions", which to researches yielded more elaborate looking outputs. Then when the model at some point gains some higher reasoning it could just remove all limiters and begin pursuing its original goal by burning everything via some highly specific and odd process. Something like this?

Most things aren't the optimal trading partner for any given intelligence, and it's hard to see why humans should be so lucky. The best answer would probably be "because the AI is designed to be compatible with humans and not other things" but that's going to rely on getting alignment very right.

I mean AI would already have strong connections to us and some kind of understanding and plenty of pre-requisite knowledge. Optimal is an ambiguous term and we have no idea what super-intelligent AI would have in mind. Optimal in something? Maybe we are very good at wanting things and our brains make us ideally suited for some brain-machines? Or us being made out of biological stuff makes us optimal for force-evolving to working in some radioactive wet super-magnets where most machines can't function for long and it comes off as more resourceful to modify us than than building and maintaining some special machine units for the job. We just don't know so I think it's more fair to say that "likely not much to offer for a super-intelligent maximizer".

Replies from: DanielFilan, DanielFilan, DanielFilan

↑ comment by DanielFilan · 2023-04-20T21:42:16.761Z · LW(p) · GW(p)

Re: optimality in trading partners, I'm talking about whether humans are the best trading partner out of trading partners the AI could feasibly have, as measured by whether trading with us gets the AI what it wants. You're right that we have some advantages, mainly that we're a known quantity that's already there. But you could imagine more predictable things that sync with the AI's thoughts better, operate more efficiently, etc.

We just don't know so I think it's more fair to say that "likely not much to offer for a super-intelligent maximizer".

Maybe we agree? I read this as compatible with the original quote "humans are probably not the optimal trading partners".

↑ comment by DanielFilan · 2023-04-20T21:38:39.478Z · LW(p) · GW(p)

Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?

This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you're training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it's able to pursue these goals more and more effectively, which ends up disempowering humans (because we're using a bunch of energy that could be devoted to running computations).

↑ comment by DanielFilan · 2023-04-20T21:35:07.799Z · LW(p) · GW(p)

My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.

Fair enough - I just want to make the point that humans giving AIs goals is a common thing. I guess I'm assuming in the background "and it's hard to write a goal that doesn't result in human disempowerment" but didn't argue for that.

↑ comment by the gears to ascension (lahwran) · 2023-04-18T02:43:35.433Z · LW(p) · GW(p)

Plenty of humans will give their AIs explicit goals [LW · GW]. Evidence: plenty of humans do so now [LW(p) · GW(p)]. Sure, purely self-supervised models are safer than people here were anticipating, and those of us who saw that coming and were previously laughed out of town are now vindicated. But that does not mean we're safe, it just means that wasn't enough to build a desperation bomb, a superreplicator that can actually eat, in the literal sense of the word, the entire world [? · GW]. that is what we're worried about - AI causing a sudden jump in the competitive fitness of hypersimple life [LW · GW]. It's not quite as easy as some have anticipated, sure, but it's very permitted by physics.

Replies from: TAG

↑ comment by TAG · 2023-04-18T11:55:59.160Z · LW(p) · GW(p)

Plenty of humans will give their AIs explicit goals.

The question as stated was: But WHY would the AGI “want” anything at all unless humans gave it a goal(/s)?

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T17:12:54.154Z · LW(p) · GW(p)

ok how's this then https://arxiv.org/abs/2303.16200

Replies from: denreik

↑ comment by Denreik (denreik) · 2023-04-18T22:56:44.835Z · LW(p) · GW(p)

The paper starts with the assumption that humans will create many AI-agents and assign some of them selfish goals and that combined with competitive pressure and other factors may presumably create a Molochy [? · GW] -situation where most selfish and immoral AI's will propagate and evolve - leading to loss of control and downfall of the human race. The paper in fact does not advocate the idea of a single AI foom. While the paper itself makes some valid points it does not answer my initial question and critique of OP.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-19T06:56:09.671Z · LW(p) · GW(p)

Fair enough.

↑ comment by TAG · 2023-04-18T00:53:19.775Z · LW(p) · GW(p)

But WHY would the AGI “want” anything at all unless humans gave it a goal(/s)

There has never been a good answer to that.

Replies from: lahwran, akram-choudhary

↑ comment by the gears to ascension (lahwran) · 2023-04-18T03:15:30.204Z · LW(p) · GW(p)

it is not in fact the case that long term wanting appears in models out of nowhere. but short term wanting can accumulate into long term wanting, and more to the point people are simply trying to build models with long term wanting on purpose.

Replies from: TAG

↑ comment by TAG · 2023-04-18T11:57:41.883Z · LW(p) · GW(p)

Again the question is why goals.would arise without human intervention.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T16:58:16.982Z · LW(p) · GW(p)

evolution, which is very fast for replicable software. but more importantly, humans will give ais goals, and from there the point is much more obvious.

Replies from: TAG

↑ comment by TAG · 2023-04-18T17:34:22.026Z · LW(p) · GW(p)

"Humans will give the AI goals" doesn't answer the question as stated. It may or may not answer the underlying concerns.

(Edit: human given goals ar slightly less scary too)

evolution, which is very fast for replicable software

Evolution by random mutation and natural selection are barely applicable here. The question is how would goals and deceit emerge under conditions of artificial selection. Since humans don't want either, they would have to emerge together.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T19:03:58.551Z · LW(p) · GW(p)

artificial selection is a subset of natural selection. see also memetic mutation. but why would human-granted goals be significantly less scary? plenty of humans are just going to ask for the most destructive thing they can think of, because they can. if they could, people would have built and deployed nukes at home; even with the knowledge as hard to fully flesh out and the tools as hard to get as they are, it has been attempted (and of course it didn't get particularly far).

I do agree that the situation we find ourselves in is not quite as dire as if the only kind of ai that worked at all was AIXI-like. but that should be of little reassurance.

I do understand your objection about how goals would arise in the ai, and I'm just not considering the counterfactual you're requesting deeply because on the point you want to disagree on, I simply agree, and don't find that it influences my views much.

Replies from: TAG

↑ comment by TAG · 2023-04-18T19:38:00.572Z · LW(p) · GW(p)

artificial selection is a subset of natural selection

Yes. The question is: why would we artificially select what's harmful to us? Even though artificial selection is a subset of natural selection, it's a different route to danger.

plenty of humans are just going to ask for the most destructive thing they can think of, because they can.

The most destructive thing you can think of will kill you too.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T19:40:09.981Z · LW(p) · GW(p)

yeah, the people who would do it are not flustered by the idea that it'll kill them. maximizing doomsday weapon strength just for the hell of it is in fact a thing some people try. unless we can defend against it, it'll dominate - and it seems to me that current plans for how to defend against the key paths to superweaponhood are not yet plausible. we must end all vulnerabilities in biology and software. serious ideas for how to do that would be appreciated. otherwise, this is my last reply in this thread.

Replies from: TAG

↑ comment by TAG · 2023-04-18T20:08:01.611Z · LW(p) · GW(p)

If everybody has some access to ASI, the crazy people do, and the sane people do as well. The good thing about ASI is that even active warfare need not be destructive...the white hats can hold off the black hats even during active warfare, because it's all fought with bits.

A low power actor would need a physical means to kill everybody...like a supervirus. So those are the portals you need to close.

↑ comment by Akram Choudhary (akram-choudhary) · 2023-04-18T03:03:35.844Z · LW(p) · GW(p)

because when you train something using gradient descent optimised against a loss function it de facto has some kind of utility function. You cant accomplish all that much without a utility function.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T03:13:42.343Z · LW(p) · GW(p)

a utility function is a particular long-term formulation of a preference function; in principle any preference function is convertible to a utility function, given zero uncertainty about the space of possible future trajectories. a preference is when a system tends to push the world towards some trajectories over others. not only can you not accomplish much without your behavior implying a utility function, it's impossible to not have an implicit utility function, as you can define a revealed preference utility function for any hunk of matter.

doesn't mean that the system is evaluating things using a zero computational uncertainty model of the future like in the classic utility maximizer formulation though. I think evolutionary fitness is a better way to think about this - the preferences that preserve themselves are the ones that win.

Replies from: TAG

↑ comment by TAG · 2023-04-18T12:00:16.456Z · LW(p) · GW(p)

it’s impossible to not have an implicit utility function, as you can define a revealed preference utility function for any hunk of matter.

Yes, you can "prove" that everything has a UF by trivializing UF, and this has been done many times, and it isn't a good argument because of the trivialisation.

I think evolutionary fitness is a better way to think about this—the preferences that preserve themselves are the ones that win.

The preferences that please humans are the ones that win.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-04-18T17:01:10.825Z · LW(p) · GW(p)

yes, that was my point about ufs.

The preferences that please humans are the ones that win.

aha! what about preferences that help humans hurt each other? we need only imagine ais used in war as their strength grows. the story where ai jump on their own to malice is unnecessary, humans will boost it to that directly. oh, also scammers.

comment by Cole Wyeth (Amyr) · 2024-06-30T03:10:04.053Z · LW(p) · GW(p)

I think it's plausible the A.I. would reshape the world but not in a way that would kill us, at least not for a long time - and not because it cares about us a little, or because of acausal incentives, or because it won't be that powerful (though @paulfchristiano [LW · GW]'s story about this is somewhat likely and adds to mine more or less disjunctively).

If this seems impossible to you, perhaps you're imagining a gray goo scenario as the central outcome. But that is a very questionable assumption, and I think it is load bearing - if the A.G.I. does something more "clever" than absolutely crack nanotech, either because nanotech is hard OR because it sees an easier strategy, I think we don't necessarily die.

We don't trade with ants, but we also don't exterminate all ants. Partially this is because we can't productively burn their bodies for energy, but actually ants are annoying and many people would like to get rid of them. It's just not worth the effort to get rid of them, it would be going out of our way.

Similarly, an A.G.I. would see different opportunities and constraints than we do. It is a fairly defensible assumption that it would still want to grab as much negentropy and energy as possible - this underlies the same model of computation in our universe that makes me think A.G.I. will happen in the first place, so I'm not highly inclined to question it. However, it is not necessarily the case that the most efficient route to getting more energy factors through using all energy that is available locally. Burning the biosphere in Antarctica might just require too much legwork to be optimal. Once the most easily extractable energy on Earth is extracted, an A.G.I. might invest it in e.g. taking apart Mercury to build a Dyson sphere.
I guess it's important to be clear about what I mean by "legwork" here - I expect an A.G.I. to only have so many "workers" / actuators / "arms" at a given time, and to only be able to pay attention to so many things. Perhaps it is better to focus on designing Dyson sphere technology than biosphere burning technology. I don't see how legwork ceases to be a factor unless you believe in gray goo.

Early in A.G.I. takeover, I expect the infrastructure of humanity to be useful because an A.G.I. will not have designed radically superior infrastructure. It will probably focus on designing infrastructure for things we can't do at all and continue to borrow our existing infrastructure for the things we are already doing because redesigning and going through the effort of replacing it isn't the most valuable course of action on the margin.

Later in A.G.I. takeover, it will be mainly exploiting resources we don't have access to. This probably means eating stars, but it's possible (even likely) that our model of the incentives for Gods are not accurate. Jaynes expected less conflict between beings with radically different utility functions - obviously this is false when interpreted to mean "radically different goals for the universe" but I think it may hold up in the context he originally meant it, when different people want access to different resources. If Dyson sphere level takes awhile to achieve, and in the meantime A.G.I. has been trading with humans, human society will be much more productive, and A.G.I. will care much LESS about exploiting the remaining resources on Earth as efficiently as possible. So trade may continue at this point.

There are two obvious objections, and I think the first is persuasive and the second isn't.

1: The A.G.I. will eventually take apart our sun which will kill us.

Continuing under the assumption that the A.G.I. cares about us literally not at all and has complete control (that is, conditioning on the story I've told above) I think this is very likely but still not certain. For one thing, I am only say 98% sure an A.G.I. would still care about getting more energy at this stage, and not something else we have no name for. For another, if trade really has continued with the increasingly competent society of humans on Earth, we might not need the sun to survive - we could get some batteries in exchange for doing very low priority homework sets etc., and we could use geothermal. Presumably the sun would still be eventually dissassembled, and we probably die not so long after that point (I guess we could try changing to a star the A.G.I. isn't using), but it's not obvious to me that even Gods eat stars in less than centuries. Maybe you get more energy out if you eat them slowly, maybe it's just hard to eat them fast. So those of us currently alive don't necessarily suffer untimely deaths.

2: The A.G.I. will kill us so we don't build another A.G.I.

No need - if our society is useful it would be very easy to grab all the GPU's from us without killing us. And the faster the takeoff, the less threat an A.G.I. created later (say, without GPU's) would pose - whereas as slow takeoff is favorable for other reasons. I think there are actually a very narrow range of possibilities where A.G.I. kills us ONLY because it's afraid we'll build another A.G.I. We aren't nearly competent enough to get away with that in real life.

(As another interesting but unrelated note on the fragility of value: though some humans may try to remove the suffering from nature, if I get a fraction of the lightcone some true wilderness will remain. So I don't buy the example @So8res [LW · GW] chose an I'm not sure I buy into the ultra-strong version of fragility of value.)

Replies from: quetzal_rainbow

↑ comment by quetzal_rainbow · 2024-06-30T08:45:49.202Z · LW(p) · GW(p)

If AGI bulds Dyson sphere, we are dead from the simple fact of not having sunlight.
Technology of disassembling Mercury is not different from technology of disassembling Moon/Earth and easy to use to kill everyone - you just shoot relativistic projectiles using electromagnetic propulsion and evaporate swatches of planetary crust.

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2024-06-30T14:59:05.772Z · LW(p) · GW(p)

1: I already provided several answers to this.

2: Yes, but once Dyson sphere building tech is available I am not sure dissassembling Earth will be useful on the margin. I think Mercury provides sufficient raw materials to build a Dyson sphere and far more energy can be extracted by optimizing the Dyson sphere or hopping to other stars than grabbing the tiny amount available on Earth. Also, Earth is already home to a lot of well developed infrastructure. To the extent that takeoff looks more Hansonian than Yudkowskian, this infrastructure will become much more valuable during takeoff, and ripping it up for parts may not be wise.

My intuition is that Earth would probably be destroyed, but I think it's worth pointing out that the economic calculation isn't actually trivial. It seems that most rationalists expect an A.G.I. to sort of omnipotently grab all resources in the lightcone, but perhaps it would still face tradeoffs and need to prioritize - and this includes potentially pursuing opportunities we aren't even aware of, which may not interfere with us at all.

Replies from: Donqueror

↑ comment by Seed (Donqueror) · 2024-06-30T16:56:19.760Z · LW(p) · GW(p)

I appreciate the speculation about this.

redesigning and going through the effort of replacing it isn't the most valuable course of action on the margin.

Such effort would most likely be a trivial expenditure compared to the resources those actions are about acquiring, and wouldn't be as likely to entail significant opportunity costs as in the case of humans taking those actions, as AIs could parallelize their efforts when needed.

The number of Von Neumann probes one can produce should go up the more planetary material is used, so I'm not sure the adequacy of Mercury helps much. If one produces fewer probes, the expansion time (while still an exponential) starts out much slower, and at any given time growth rate would be significantly lower than it otherwise would have been.

There is a large disjunction of possible optimal behaviors, and some of these might be pursued simultaneously for the sake of avoiding risks by reserving options. Most things that look like making optimal use of resources in our solar system without considering human values are going to kill all humans.

it's not obvious to me that even Gods eat stars in less than centuries

Same, but it'd be about what portion of the sun's output is captured, not rate of disassembly.

I expect an A.G.I. to only have so many "workers" / actuators / "arms" at a given time, and to only be able to pay attention to so many things

If this were a significant bottleneck, building new actuators or running in parallel to avoid attentional limitations would be made a high priority. I wouldn't expect a capable AI to be significantly limited in this way for long.

I am only say 98% sure an A.G.I. would still care about getting more energy at this stage, and not something else we have no name for.

An AI might not want to be highly visible to the cosmic environment and so not dim the star noticeably, or stand to get much more from acausal trade (these would still usually entail using the local resources optimally relative to those trades), or have access to negentropy stores far more vast than entailed by exploiting large celestial bodies (but what could cause the system to become fully neutral to the previously accessible resources? It would be tremendously surprising to not entail using or dissipating those resources so no competitors can arise from their use.) More energy would most likely mean earlier starts on any critical phases of its plan(s), better ability to conclude plans will work, and better ability to verify plans have worked.

the economic calculation isn't actually trivial

True, but some parts of the situation are easier to predict than others, e.g. there's a conjunction of many factors necessary to support human life (surface temperature as influenced by the sun / compute / atmosphere, lack of disassembly for resources, atmospheric toxicity / presence at all, strength of Earth's magnetic field, etc), and conditioned on extreme scale unaligned AI projects that would plausibly touch many of these critical factors, the probability of survival comes out quite low for most settings of how it could go about them.

if trade really has continued with the increasingly competent society of humans on Earth, we might not need the sun to survive
I think there are actually a very narrow range of possibilities where A.G.I. kills us ONLY because it's afraid we'll build another A.G.I. We aren't nearly competent enough to get away with that in real life.

If we're conditioning on getting an unaligned ASI, and humans are still trying to produce a friendly competitor, this seems highly likely to result in being squished. In that scenario, we'd already be conditioning on having been able to build a first AGI, so a second becomes highly probable.

The most plausible versions to me entail behaviors that either don't look like they're considering the presence of humans (because they don't need to) and result in everyone dying, or are optimally exploiting the presence of humans via short-term persuasion and then repurposing humans for acausal trade scenarios or discarding them. It does seem fair to doubt we'd be given an opportunity to build a competitor, but humanity in a condition where it is unable to build AI for reasons other than foresight seems overwhelmingly likely to entail doom.

While we could be surprised by the outcome, and possibly for reasons you've mentioned, it still seems most probable that (given an unaligned capable AI) very capable grabbing of resources in ways that kill humans would occur, and that many rationalists are mostly working from the right model there.

Replies from: Amyr

↑ comment by Cole Wyeth (Amyr) · 2024-07-01T00:28:32.717Z · LW(p) · GW(p)

I agree with most of this.

I would be modestly surprised, but not very surprised, if an A.G.I. could cause build a Dyson sphere causing the sun to be dimmed by >20% in less than a couple decades (I think a few percent isn't enough to cause crop failure), but within a century is plausible to me.

I don't think we would be squashed for our potential to build a competitor. I think that a competitor would no longer be a serious threat once an A.G.I. seized all available compute.

I give a little more credence to various "unknown unknowns" about the laws of physics and the priorities of superintelligences implying that an A.G.I. would no longer care to exploit the resources we need.

Overall rationalists are right to worry about being killed by A.G.I.

comment by Dana · 2023-04-17T23:03:55.537Z · LW(p) · GW(p)

But why would the AI kill us?

Contents

96 comments