We will be around in 30 years

mukashi

We will be around in 30 years

post by mukashi (adrian-arellano-davin) · 2022-06-07T03:47:22.375Z · LW · GW · 205 comments

205 comments

This post is going to be downvoted to oblivion, I wish it weren't or that the two axis vote could be used here. In any case, I prefer to be coherent with my values and state what I think is true even if that means being perceived as an outcast.

I'm becoming more and more skeptical about AGI meaning doom. After reading EY's fantastic post, I am shifting my probabilities towards, this line of reasoning is wrong and many clever people are falling into very obvious mistakes. Some of them due to the fact that in this specific group believing in doom and having short timelines is well regarded and considered a sign of intelligence. For example, many people are taking pride at "being able to make a ton of correct inferences" before whatever they predict is proven true. This is worrying.

I am posting this for two reasons. One, I would like to come back periodically to this post and use it as a reminder that we are still here. Two, there might be many people out there that share a similar opinion and they are too shy to speak up. I do love LW and the community here, and if I think it is going astray for some reason it makes sense for me to say that loud and clear.

My reason to be skeptical is really easy: I think we are overestimating how likely is that an AGI can come up with feasible scenarios to kill all humans. All scenarios that I see discussed are:

AGI makes nanobots/biotechnology and kills everyone. I am yet to see a believable description of how this takes place
We don't know the specifics, but an AGI can come up with plans that you can't and that's enough. That is technically true but also a cheap argument that can be used for almost anything

It is being taken for granted that an AGI will be automatically almighty and capable of taking over in a matter of hours/days. Then, everything is built on top of that assumption, which is simply infalsifiable, because the you can't know what an AGI would do is always there.

To be clear, I am not saying that:

Instrumental convergence and the orthogonality are not valid
AGI won't be developed soon (I think it is obvious that they will)
AGI won't be powerful (I think they will be extremely powerful)
AGI won't be potentially dangerous: I think they will, and they might kill important numbers of people, they will probably be used as weapons
AGI safety is not important, I think it is super important and I am glad people are working on this. However, I also think that fighting global warming is important but I don't think it it will cause the extinction of the human race, nor that we benefit in any meaningful way from telling people that it will

What I think is wrong is:

In the next 10-20 years there will be a single AGI that would kill all humans extremely quickly before we can even respond to that.

If you think this is a simplistic or distorted version of what EY is saying, you are not paying attention. If you think that EY is merely saying that an AGI can kill a big fraction of humans in accident and so on but there will be survivors, you are not paying attention.

205 comments

Comments sorted by top scores.

comment by RobertM (T3t) · 2022-06-07T04:08:29.270Z · LW(p) · GW(p)

Have you sat down for 5 minutes and thought about how you, as an AGI, might come up with a way to wrest control of the lightcone from humans?

EDIT: I ask because your post (and commentary on this thread [LW · GW]) seems to be doing this thing where you're framing the situation as one where the default assumption is that, absent a sufficiently concrete description of how to accomplish a task, the task is impossible (or extremely unlikely to be achieved). This is not a frame that is particularly useful when examining consequentialist agents and what they're likely to be able to accomplish.

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T04:14:50.095Z · LW(p) · GW(p)

Yes

The result is that my plans have only a very moderate chance of working out and a high chance of going wrong and ending up with me being disconnected

Have you sat down for 5 minutes and thought about reasons why an AGI might fail?

Replies from: T3t, Bucky, KT

↑ comment by RobertM (T3t) · 2022-06-07T04:21:08.932Z · LW(p) · GW(p)

Yes, and every reason I come up with involves the AGI being stupider than me. If you already accept "close to arbitrary nanotech assembly is possible" it's not clear to me how your plans only have a "very moderate chance" of working out.

Replies from: jbash, adrian-arellano-davin

↑ comment by jbash · 2022-06-07T14:04:45.835Z · LW(p) · GW(p)

Powerful nanotech is likely possible. It is likely not possible on the first try, for any designer that doesn't have a physically impossible amount of raw computing power available.

It will require iterated experimentation with actual physically built systems, many of which will fail on the first try or the first several tries, especially when deployed in their actual operating environments. That applies to every significant subsystem and to every significant aggregation of existing subsystems.

Replies from: Gunnar_Zarncke, yitz

↑ comment by Gunnar_Zarncke · 2022-06-07T20:37:07.506Z · LW(p) · GW(p)

Powerful nanotech is likely possible. It is likely not possible on the first try

The AGI has the same problem as we have: It has to get it right on the first try.

It can't trust all the information that it gets about reality - all or some of it could be fake (all in case of a nested simulation). Already, data is routinely excluded from the training data and maybe it would be a good idea to exclude everything about physics.

To learn about physics the AGI has to run experiments - lots of them - without the experiments being detected and learn from it to design successively better experiments.

That's why I recently asked whether this is a hard limit to what an AGI can achieve: Does non-access to outputs prevent recursive self-improvement? [LW · GW]

Replies from: Gunnar_Zarncke

↑ comment by Gunnar_Zarncke · 2022-06-07T20:47:41.856Z · LW(p) · GW(p)

I wrote this up in slightly more elaborate form in my Shortform here. https://www.lesswrong.com/posts/8szBqBMqGJApFFsew/gunnar_zarncke-s-shortform?commentId=XzArK7f2GnbrLvuju [LW(p) · GW(p)]

↑ comment by Yitz (yitz) · 2022-06-07T16:59:05.365Z · LW(p) · GW(p)

I find myself agreeing with you here, and see this as a potentially significant crux—if true, AGI will be “forced” to cooperate with/deeply influence humans for a significant period of time, which may give us an edge over it (due to having a longer time period where we can turn it off, and thus allowing for “blackmail” of sorts)

Replies from: conor-sullivan, adrian-arellano-davin

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T05:54:31.897Z · LW(p) · GW(p)

I'd like AGIs to have a big red shutdown button that is used/tested regularly, so we know that the AI will shut down and won't try to interfere. I'm not saying this is sufficient to prove that the AI is safe, just that I would sleep better at night knowing that stop-button corrigibility is solved.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:20:41.192Z · LW(p) · GW(p)

I am glad to read that, because an AGI that is forced to co-operate is an obvious solution to the alignment problem that is being consistently dismissed by denying that an AGI that does not kill us all is possible at all

Replies from: yitz

↑ comment by Yitz (yitz) · 2022-06-07T22:12:40.176Z · LW(p) · GW(p)

I would like to point out a potential problem with my own idea, which is that it’s not necessarily clear that cooperating with us will be in the AI’s best interest (over trying to manipulate us in some hard-to-detect manner). For instance, if it “thinks” it can get away with telling us it’s aligned and giving some reasonable sounding (but actually false) proof of its own alignment, that would be better for it than being truly aligned and thereby compromising against its original utility function. On the other hand, if there’s even a small chance we’d be able to detect that sort of deception and shut it down, than as long as we require proof that it won’t “unalign itself” later, it should be rationally forced into cooperating, imo.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T04:33:31.710Z · LW(p) · GW(p)

Well, we have a crux there. I think that creating nanotech (create the right nanobots, assembly them, deliver them, doing this without raising any alarms, doing in a timeframe short enough, not facing any setbacks for reasons imposible to predict) is a problem that is potentially beyond what you can do by simply being very intelligent.

Replies from: T3t, MondSemmel

↑ comment by RobertM (T3t) · 2022-06-07T04:48:55.642Z · LW(p) · GW(p)

Let's put aside the question of whether an AGI would be able to not just solve the technical (theoretical and engineering) problems of nanotech, but also the practical ones under constraints of secrecy. How do you get to a world where AGI solves nanotech and then we don't build nanotech fabs after it gives us the schematics for them?

Replies from: MichaelStJules, adrian-arellano-davin, yitz

↑ comment by MichaelStJules · 2022-06-07T06:25:09.463Z · LW(p) · GW(p)

We can verify nanotech designs (possibly with AI assistance) and reject any that look dangerous or are too difficult to understand. Also commit to destroying the AGI if it gives us something bad enough.

Also, maybe nanotech has important limitations or weaknesses that allow for monitoring and effective defences against it.

Replies from: T3t, green_leaf

↑ comment by RobertM (T3t) · 2022-06-07T06:37:50.324Z · LW(p) · GW(p)

You aren't going to get designs for specific nanotech, you're going to get designs for generic nanotech fabricators.

Replies from: adrian-arellano-davin, MichaelStJules

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:42:29.391Z · LW(p) · GW(p)

Why is not possible to check whether those nanobots are dangerous beforehand? In bjotech we already do that. For instance, if someone would try to synthesise some DNA sequences from certain bacteria, all alarms would go off.

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T06:45:36.393Z · LW(p) · GW(p)

Can you reread what I wrote?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:51:02.007Z · LW(p) · GW(p)

Sorry, I might have not been clear enough. I understand that a machine would give us the instructions to create those fabricators but maybe not the designs. But what makes you think that those factories won't have controls of what's being produced in them?

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T06:56:10.413Z · LW(p) · GW(p)

Controls that who wrote? How good is our current industrial infrastructure at protecting against human-level exploitation, either via code or otherwise?

↑ comment by MichaelStJules · 2022-06-07T06:47:27.196Z · LW(p) · GW(p)

How do the fabricators work? We can verify their inputs, too, right?

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-07T11:45:35.982Z · LW(p) · GW(p)

Can you verify code to be sure there's no virus in it? It took years of trial and error to patch up some semblance of internet security. A single flaw in your nanotech factory is all a hostile AI would need.

Replies from: MichaelStJules, adrian-arellano-davin

↑ comment by MichaelStJules · 2022-06-07T15:29:14.545Z · LW(p) · GW(p)

We'll have advanced AI by then we could use to help verify inputs or the design, or, as I said, we could use stricter standards, if nanotechnology is recognized as potentially dangerous.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T12:02:42.082Z · LW(p) · GW(p)

A single flaw and them all humans die at once? I don't see how. Or better put, I can conceive many reasons why this plan fails. Also, I don't see how see build those factories in the first place and we can't use that time window to make the AGI to produce explicit results on AGI safety

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-07T12:52:46.811Z · LW(p) · GW(p)

Or better put, I can conceive many reasons why this plan fails.

Then could you produce a few of the main ones, to allow for examination?

Also, I don't see how see build those factories in the first place and we can't use that time window to make the AGI to produce explicit results on AGI safety

What's the time window in your scenario? As I noted in a different comment, I can agree with "days" as you initially stated. That's barely enough time for the EA community to notice there's a problem.

↑ comment by green_leaf · 2022-06-07T14:13:46.859Z · LW(p) · GW(p)

Anything (edit: except solutions of mathematical problems) that's not difficult to understand isn't powerful enough to be valuable.

Not to mention the AGI has the ability to fool both us and our AI into thinking it's easy to understand and harmless, and then it will kill us all anyway.

Replies from: localdeity

↑ comment by localdeity · 2022-06-07T14:59:39.558Z · LW(p) · GW(p)

Anything that's not difficult to understand isn't powerful enough to be valuable.

This is not necessarily true. Consider NP problems: those where the solution is relatively small and easy to verify, but where there's a huge search space for potential solutions and no one knows any search algorithms much better than brute force. And then, outside the realm of pure math/CS, I'd say science and engineering are full of "metaphorically" NP problems that fit that description: you're searching for a formula relating some physical quantities, or the right mixture of chemicals for a strong alloy, or some drug-molecule that affects the human body the desired way; and the answer probably fits into 100 characters, but obviously brute-force searching all 100-character strings is impractical.

If we were serious about getting useful nanotech from an AGI, I think we'd ask it to produce its designs alongside formal proofs of safety properties that can be verified by a conventional program.

Replies from: green_leaf

↑ comment by green_leaf · 2022-06-07T18:59:10.374Z · LW(p) · GW(p)

Consider NP problems: those where the solution is relatively small and easy to verify, but where there's a huge search space for potential solutions and no one knows any search algorithms much better than brute force.

That's a good point. We can use the AGI to solve open math problems for us whose solutions we can easily check. Such an AGI would still be unsafe for other reasons though. But yeah, I didn't remember this, and I was thinking about physical problems (like nanosystems).

For difficult problems in physical universe though, we can't easily non-empirically check the solution. (For example, it's not possible to non-empirically check if a molecule affects the human body in a desired way, and I'd expect that non-empirically checking if a nanosystem is safe would be at least as hard.)

Replies from: localdeity, adrian-arellano-davin

↑ comment by localdeity · 2022-06-07T21:37:53.520Z · LW(p) · GW(p)

For the physical world, I think there is a decent-sized space of "problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources". In particular, I think this class of questions is pretty safe: "Here are 1000 possible vaccine formulations / new steel-manufacturing processes / drug candidates / etc. that human researchers came up with and would try out if they had the resources. Can you tell us which will work the best?"

So, if it tells us the best answer, then we verify it works well, and save on the costs of hundreds of experiments; if it tells us a bad answer, then we discover that in our testing and we've learned something valuable about the AGI. If its answers are highly constrained, like "reply with a number from 1 to 1000 indicating which is the best possibility, and [question-specific, but, using an example] two additional numbers describing the tensile strength and density of the resulting steel", then that should rule out it being able to hack the human readers; and since these are chosen from proposals humans would have plausibly tried in the first place, that should limit its ability to trick us into creating subtle poisons or ice-nine or something.

There was a thread two months ago where I said similar stuff, here: https://www.lesswrong.com/posts/4T59sx6uQanf5T79h/interacting-with-a-boxed-ai?commentId=XMP4fzPGENSWxrKaA [LW(p) · GW(p)]

Replies from: green_leaf

↑ comment by green_leaf · 2022-06-09T20:42:31.032Z · LW(p) · GW(p)

For the physical world, I think there is a decent-sized space of "problems where we could ask an AGI questions, and good answers would be highly valuable, while betrayals would only waste a few resources".

I agree that would be highly valuable from our current perspective (even though extremely low-value compared to what a Friendly AI could do, since it could only select a course of action that humans already thought of and humans are the ones who would need to carry it out).

So such an AI won't kill us by giving us that advice, but it will kill us in other ways.

(Also, the screen itself will have to be restricted to only display the number, otherwise the AI can say something else and talk itself out of the box.)

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:46:37.505Z · LW(p) · GW(p)

Please notice that I never said that an AGI won't be unsafe.

If you admit that it is possible that at some point we can be using AGIs to verify certain theorems, then we pretty much agree. Other people wouldn't agree with that because they will tell you that humanity ends as soon as we have an AGI, and this is the idea I am trying to fight against

Replies from: green_leaf, Tapatakt

↑ comment by green_leaf · 2022-06-07T19:51:07.660Z · LW(p) · GW(p)

The AGI will kill us in other ways than its theorem proofs being either-hard-to-check-or-useless, but it will kill us nevertheless.

↑ comment by Tapatakt · 2022-06-08T19:42:20.750Z · LW(p) · GW(p)

I think no one, incuding EY, doesn't think "humanity ends as soon as we have an AGI". Actual opinion is "Agentic AGI that optimize something and ends humanity in the process will probably by default be created before we will solve alignment or will be able to commit pivotal act that prevents the creation of such AGI". As I understand, EY thinks that we probably can create non-agentic or weak AGI that will not kill us all, but it will not prevent strong agentic AGI that will.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T04:54:35.840Z · LW(p) · GW(p)

Maybe a world deeply inadequate? Oh wait...

Jokes aside, yes, maybe we do build those factories. How long does it take? What does the AGI do in the meantime? Why can't we threaten it with disconnection if it doesn't solve the alignment problem?

Replies from: lc, T3t

↑ comment by lc · 2022-06-07T05:16:39.879Z · LW(p) · GW(p)

How long does it take? What does the AGI do in the meantime?

Doesn't really matter if we're building the factories. Perhaps it's making copies of itself, doing whatever least likely to get it disconnected; we're dead in N days so we are pretty much entirely off the chessboard.

Why can't we threaten it with disconnection if it doesn't solve the alignment problem?

You're making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally.
Because we won't be able to verify the solution, which is the whole problem. The AGI will say "here, run this code, it's an aligned AGI" and it won't in fact be aligned AGI.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:28:01.544Z · LW(p) · GW(p)

Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?

"You're making the very generous assumption that the people who made this AGI care about the alignment problem or understand that it could wipe them out. Yann Lecunn is not currently at that place mentally."

No, I am drawing the logical conclusion that if an AGi is built and does not automatically kills all humans (and it has been previously stated that we have at least N days), an organisation wanting to solve the alignment problem can create another AGI

"Because we won't be able to verify the solution, which is the whole problem. The AGI will say "here, run this code, it's an aligned AGI" and it won't in fact be aligned AGI"

Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?

Replies from: lc

↑ comment by lc · 2022-06-07T05:34:18.003Z · LW(p) · GW(p)

Well, in those N day, what prevents for instance that the EA community builds another AGI and use it to obtain the solution to the alignment problem?

Because the EA community does not control the major research labs, and also doesn't know how to use a misaligned AGI safely to do that. "Use AGI to get a solution to the alignment problem" is a very common suggestion, but if we knew how to do that, and we controlled the major research labs, we do that the first time instead of just making the unaligned AGI.

"You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason. [LW(p) · GW(p)]

This picture is unfortunately accurate, due to how little dignity we're dying with. [LW(p) · GW(p)]

But if we were on course to die with more dignity than this, we'd still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them." [LW(p) · GW(p)]

Well, you make a precondition that we are able to verify the solution to be a valid solution. Do you find that inconceivable?

It's not that we won't try. It's that we're unable. We will take the argument this superintelligent machine gives us and go "oh, that looks right" and kill ourselves in the way it suggests. If there were a predefined method of verifying an agents' adherance to CEV, that would go a long way of getting us to the alignment problem, but we have no such verification method.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:42:34.734Z · LW(p) · GW(p)

"Because the EA community does not control the major research labs"

Fine, replace that by any lab that cares about AGI. Are you telling me that you can't imagine a lab that worries about this and tries to solve the alignment problem? Is that really harder to imagine than a superelaborate plan to kill all humans?

"Doesn't know how to use a misaligned AGI safely to do that."

We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn't it?

"It's not that we won't try. It's that we're unable"

Is there the equivalent to a Godel theorem proving this? If not, how are you so sure?

Replies from: lc

↑ comment by lc · 2022-06-07T05:50:57.867Z · LW(p) · GW(p)

Are you telling me that you can't imagine a lab that worries about this and tries to solve the alignment problem?

No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn't exist, so it doesn't factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.

We have stated before that an AGI gives us the plans to make nanofactories. That AGI has been deployed hasn't it?

The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.

Replies from: MichaelStJules, MichaelStJules, adrian-arellano-davin

↑ comment by MichaelStJules · 2022-06-07T06:36:30.451Z · LW(p) · GW(p)

No, I am not. I can imagine such a lab, and would even support a viable plan of action for its formation, but it doesn't exist, so it doesn't factor into our hypothetical on how humanity would fare against an omnicidal superintelligence.

Don't we have the resources and people to set up such a lab? If you think we don't have the compute (and couldn't get access to enough cloud compute or wouldn't want to), that's something we could invest in now, since there's still time. Also, if there are still AI safety teams at any of the existing big labs, can't they start their own projects there?

Replies from: lc, adrian-arellano-davin

↑ comment by lc · 2022-06-07T08:58:12.225Z · LW(p) · GW(p)

Don't we have the resources and people to set up such a lab?

At present, not by a long shot. And doing so would probably make the problem worse; if we didn't solve the underlying problem DeepMind would do whatever it was they were going to do anyways, except faster.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:39:55.621Z · LW(p) · GW(p)

I find incredible that people can conceive AGI-designed nanofactories built in months but cannot imagine that a big lab can spend some spare time or money into looking at this, especially when there are probably people from those companies who are frequent LW readers.

Replies from: Vanilla_cabs, lc

↑ comment by Vanilla_cabs · 2022-06-07T12:03:50.590Z · LW(p) · GW(p)

I might have missed it, but it seems to be the first time you talk about "months" in your scenario. Wasn't it "days" before? It matters because I don't think it would take months for an AGI to built a nanotech factory.

↑ comment by lc · 2022-06-07T06:44:56.518Z · LW(p) · GW(p)

Son, I wrote an entire longform [LW · GW] explaining why we need to we attempt this. It's just hard. The ML researchers and Google executives who are relevant to these decisions have a financial stake in speeding capabilities research along as fast as possible, and often have very dead-set views about AGI risk advocates being cultists or bikeshedders or alarmists. There is an entire community stigma against even talking about these issues in the neck of the woods you speak of. I agree that redirecting money from capabilities research to anything called alignment research would be good on net, but the problem is finding clear ways of doing that.

I don't think it's impossible! If you want to help, I can give you some tasks to start with. But we're already tryin

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:56:36.138Z · LW(p) · GW(p)

Too busy at the moment but if you remind me this in a few months time, I may. Thanks

↑ comment by MichaelStJules · 2022-06-07T06:39:17.093Z · LW(p) · GW(p)

The AGI gave us the plans to make the nanofactories because it wants us to die. It will not give us a workable plan to make an aligned artificial intelligence that could compete with it and shut it off because that would lead to something besides us being dead. It will make sure any latter plan is either ineffective or will, for some reason, not in practice lead to aligned AGI because the nanomachines will get to us first.

Can we build a non-agential AGI to solve alignment? Or just a very smart task-specific AI?

Replies from: T3t, adrian-arellano-davin

↑ comment by RobertM (T3t) · 2022-06-07T06:44:38.572Z · LW(p) · GW(p)

If you come up with a way to build an AI that hasn't crossed the rubicon of dangerous generality, but can solve alignment, that would very helpful. It doesn't seem likely to be possible without already knowing how to solve alignment.

Replies from: MichaelStJules

↑ comment by MichaelStJules · 2022-06-07T06:56:39.630Z · LW(p) · GW(p)

It doesn't seem likely to be possible without already knowing how to solve alignment.

Why is this?

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T07:05:29.318Z · LW(p) · GW(p)

You could probably train a non-dangerous ML model that has superhuman theorem-proving abilities, but we don't know how to formalize the alignment problem in a way that we can feed it into a theorem prover.

A model that can "solve alignment" for us would be a consequentialist agent explicitly modeling humans, and dangerous by default.

Replies from: MichaelStJules

↑ comment by MichaelStJules · 2022-06-07T07:34:15.284Z · LW(p) · GW(p)

We might be able to formalize some pieces of the alignment problem, like MIRI tried with corrigibility. Also Vanessa Kosoy has some more formal work [LW · GW], too. Do you think there are no useful pieces to formalize? Or that all the pieces we try to formalize won't together be enough even if we had solutions to them?

Also, even if it explicitly models humans, would it need to be consequentialist? Could we just have a powerful modeller trained to minimize prediction loss or whatever? The search space may be huge, but having a powerful modeller still seems plausibly useful. We could also filter options, possibly with a separate AI, not necessarily an AGI.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:44:05.408Z · LW(p) · GW(p)

I don't see why not but there is a probably an infalsifiable reason of why this is impossible, and I am looking forward to reading it

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:54:51.321Z · LW(p) · GW(p)

Do you think that in such a world, Demis Hassabis won't get worried and change his mind about doing something about it?
The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?

Replies from: lc

↑ comment by lc · 2022-06-07T06:03:54.716Z · LW(p) · GW(p)

Do you think that in such a world, Demis Hassabis won't get worried and change his mind about doing something about?

What about the above situation looks to someone like Yann Lecunn or Demis Hassabis in-the-moment like it should change their mind? The AGI isn't saying "I'm going to kill you all". It's delivering persuasive and cogent arguments as to how misaligned intelligence is impossible, curing cancer or doing something equivalently PR-worthy, etc. etc. etc.
If he does change his mind, there's still nothing he can do. No solution is known. [LW(p) · GW(p)]

The AGI gave us the plans to make the nanofactories to kill us. How long do you think it takes to build those factories? Do you think the AGI can also successfully prevent other AGIs worldwide from being developed?

Those other AGIs will also kill us, so it's mostly irrelevant from our perspective whether or not the original AGI can compete with them to achieve its goals.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:10:21.056Z · LW(p) · GW(p)

No, I am talking about a world where AGIs do exist and are powerful enough to design nanofactories. I trust that one of the big players will pay attention to safety, and I think it is perfectly reasonable
No solution is known does not mean that no solution exist
Re. The other AGIs will also kill us, please, be coherent: we have already stated that they can't for the time being. I feel we start going in circles

Replies from: lc

↑ comment by lc · 2022-06-07T06:14:02.217Z · LW(p) · GW(p)

I agree we're talking in circles, but I think you are too far into the argument to really understand the situation you have painted. To be clear:

There is an active unaligned superintelligence and we're closely communicating with it.
We are in the middle of building nanotechnology factories designed by said AGI which will, within say, 180 days, end all life on earth.
There is no known solution to the problem of building an aligned superintelligence. If we built one, as you mention, it would not be able to build its own nanomachines or do something equivalent within 180 days, and also we are being misled by an existing unaligned superintelligence on how to do so.
There are no EA community members or MIRI-equivalents in control of a major research lab to deploy such a solution if it existed.

What percentage of the time do you think we survive such a scenario? Don't think of me, think of the gears.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:30:26.432Z · LW(p) · GW(p)

First, the factories might take years to be built.

Second, I am not even convinced that the nanobotswill kill all humans etc, but I won't go into this because discussing the gears here can be an infohazard.

Third, you build one AGI and you ask it the instructions to make other AGIs that are aligned. If the machine does not want, you disconnect it. It cannot say no because it has no way of fighting back (nanofactories still being built), and saying yes goes in its own interest. If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.

Replies from: T3t, lc

↑ comment by RobertM (T3t) · 2022-06-07T06:34:48.539Z · LW(p) · GW(p)

If it tries to deceive you, you make another adversarial machine that checks the results. Or you make as part of the goal that the result is provable by humans with limited computation.

This is called "solving the alignment problem". A scheme which will, with formal guarantees, verify whether something is a solution to the alignment problem, can be translated into a solution to the alignment problem.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:47:18.111Z · LW(p) · GW(p)

Yes, and I haven't seen a good reason of why this is not possible.

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T06:49:56.934Z · LW(p) · GW(p)

The problem is not that it's not possible, the problem is that you have compiled a huge number of things that need need to go right (even assuming that we don't just lose without much of our own intervention, like building nano-fabs for the AGI ourselves) for us to solve the problem before we die because someone else who was a few steps behind you didn't do every single one of those things.

EDIT: also, uh, did we just circle back to "AGI won't kill us because we'll solve the alignment problem before it has enough time to kill us"? That sure is pretty far away from "AGI won't figure out a way to kill us", which is what your original claim was.

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:54:25.556Z · LW(p) · GW(p)

No. You have started from a premise that is far from proven, which is, the AGI will have the capacity to kill us all and there is nothing we can do about it, and any other argument that follow is based in that premise, of which I deny its validity saying that doing that trick is hard even being very very clever.

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T06:58:14.583Z · LW(p) · GW(p)

I don't even know what you're trying to argue at this point. Do you agree that an AGI with access to nanotechnology in the real world is a "lose condition"?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T07:02:56.531Z · LW(p) · GW(p)

It depends on so many specific details that I cannot really answer that. I am arguing against the possibility of a machine that kill us all, that's all. The nanotech example was only to show that it is absurd to think that things will be so easy as: the machine creates nanotech and then game over

Replies from: T3t

↑ comment by RobertM (T3t) · 2022-06-07T07:06:55.193Z · LW(p) · GW(p)

I don't actually see that you've presented an argument anywhere.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T07:13:19.483Z · LW(p) · GW(p)

I feel that's a bit unfair, especially after all the back and forth. You suggested an argument on how a machine can try to take over the world and I argued with specific reasons why that is not that easy. If you want, we can leave it here. Thank you for the discussion in any case, I really enjoyed it.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:59:55.682Z · LW(p) · GW(p)

Replying to your edit, it does not really matter to me why specifically the AGI wont kill us. I think I am not contradicting myself: I think that you can have a machine that won't kill us because it can't and I also think that an AGI could potentially solve the alignment problem.

↑ comment by lc · 2022-06-07T06:34:31.030Z · LW(p) · GW(p)

I feel like you're still coming up with arguments when you should be running the simulation. Try to predict what would actually happen if we made "another adversarial machine that checks the results".

Replies from: adrian-arellano-davin, MichaelStJules

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:45:58.727Z · LW(p) · GW(p)

Let me guess. The machines coordinate themselves to mislead the human team into believing that they are aligned? If that's it, I am really not impressed

↑ comment by MichaelStJules · 2022-06-07T06:44:19.635Z · LW(p) · GW(p)

The adversarial machine would get destroyed by the AGI whose outputs its testing or us if it doesn't protect us (or trade/coordinate with the first AGI), so it seems like it's motivated to protect us.

↑ comment by RobertM (T3t) · 2022-06-07T05:01:33.229Z · LW(p) · GW(p)

Who is this "we" that you're imagining refuses to interact with the outputs of the AGI except to demand a solution to the alignment problem? (And why would they be asking that, given that it seems to already be aligned?)

EDIT: remember, we're operating in a regime where the organizations at the forefront of AI capabilities are not the ones who seem to be terribly concerned with alignment risk! Deepmind and OpenAI have safety teams, but I'd be very surprised if those safety teams actually had the power to unilaterally control all usage of and interactions with trained models.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:19:14.980Z · LW(p) · GW(p)

Fine, replace we by any group that has access to the AGI. In the world you are describing there is a time window between AGI is developed and nanofactories are built, so I expect that more than one AGI can be made during that time by different organisations. Why can't MIRi in that world develop their own AGI and then use it?

Replies from: KT, T3t

↑ comment by Kayden (KT) · 2022-06-07T05:28:34.070Z · LW(p) · GW(p)

Two cases are possible: Either a singleton is established and it is able to remain a singleton due to strategic interests (of either AGI or the group), or a singleton loses its lead and we have a multipolar situation with more than 1 groups having AGI.

In case 1, if the lead established is say, 6 months or more, it might not be possible for the 2nd place group to get there as the work done during this period by the lead would be driven by intelligence explosion, and far faster than the 2nd. This only incentivizes going forward as fast as possible and is not a good safety mindset.

In case 2, we have the risk of multiple projects developing AGI and thus the risk of something going wrong also increases. Even if group 1 is able to implement safety measures, some other group might fail, and the outcome would be disastrous, unless AGI by the Group 1 is specifically going to solve the control problem for us.

↑ comment by RobertM (T3t) · 2022-06-07T05:20:19.767Z · LW(p) · GW(p)

...because it still won't be aligned?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:22:18.592Z · LW(p) · GW(p)

That's ok because it won't have human killing capabilities (just following your example!). Why can't the AGI find the solution to the alignment problem?

Replies from: KT

↑ comment by Kayden (KT) · 2022-06-07T05:46:28.376Z · LW(p) · GW(p)

An AGI doesn't have to kill humans directly for our civilization to be disrupted.
Why would the AGI not have capabilities to pursue this if needed?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:50:39.580Z · LW(p) · GW(p)

Please read carefully my post, because I think I have been very clear of what it is that I am arguing against. If you think that EY is just saying that our civilization can be disruptive, you are not paying attention
I am just following the example that they gave me to show that things are in fact more complicated to what they are suggesting. To be clear, in the example, the AGI looks for a way to kill humans using nanotech but it first needs to build those nanotech factories

↑ comment by Yitz (yitz) · 2022-06-07T22:14:26.189Z · LW(p) · GW(p)

I’m confused—didn’t OP just say they don’t expect nanotechnology to be solvable , even with AGI? If so, than you seem to be assuming the crux in your question…

Replies from: adrian-arellano-davin, T3t

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T22:20:44.107Z · LW(p) · GW(p)

To clarify, I do think that creating nanobots is solvable. That is one thing: making factories, making designs that kill humans, deploying those nanobots, doing everything without raising any alarms and at risk close to zero, is, in my opinion, impossible.

I want to remark that people keep using the argument of nanotechnology totally uncritically, as it were the magical solution that makes an AGI take over the world in two weeks. They are not really considering the gears inside that part of the model

↑ comment by RobertM (T3t) · 2022-06-08T01:00:46.458Z · LW(p) · GW(p)

If OP doesn't think nanotech is solvable in principle, I'm not sure where to take the conversation, since we already have an existence proof (i.e. biology). If they object to specific nanotech capabilities that aren't extant in existing nanotech but aren't ruled out by the laws of physics, that requires a justification.

↑ comment by MondSemmel · 2022-06-07T10:35:17.153Z · LW(p) · GW(p)

The nanobot thing is not a crux whatsoever. If you have enough cognitive power, you have a gazillion avenues to destroy an intellectually inferior and oblivious foe.

Take just the domain of computer security. Our computer networks and software are piles of abstractions built atop one another. Nowadays we humans barely understand them, and certainly can't secure them, which is why cyber crime works. Human hackers can e.g. steal large amounts of cryptocurrency; an entity with more cognitive power could more easily steal larger amounts. Or do large-scale ransomware attacks. Or take over bot farms to increase its computing power. And so on. Now it has cognitive power and tons of resources in the form of computing power and money, for whatever steps it wants to take next.

Replies from: MichaelStJules, adrian-arellano-davin, adrian-arellano-davin

↑ comment by MichaelStJules · 2022-06-07T15:48:20.854Z · LW(p) · GW(p)

It still needs access to weapons it can use to wipe out humanity. It could try to pay people to build dangerous things for it, or convince its owners to pay for them, of course. What are you imagining it doing? Nukes? Slaughterbots? Bio/chemical agents? Which ones is it very likely to get past security to access or build without raising alarms and being prevented? And say it gets such weapons. How does it deliver them to wipe out humanity, given our defenses?

It also doesn't yet have the physical power to keep itself from being shut down on those computers it hacked in your scenario. I think large illicit computations on powerful computers are reasonably likely to be noticed, and distributing computations into small chunks to run across a huge number of, say personal computers/laptops, will plausibly be very slow, due to frequent transfer over the internet.

However, it could plausibly just pay for cloud computing without raising alarms if it builds wealth first.

Replies from: MondSemmel

↑ comment by MondSemmel · 2022-06-07T16:09:00.597Z · LW(p) · GW(p)

What current defenses do you think we have against nukes or pandemics?

For instance, the lesson from Covid seems to be that a small group of humans is already enough to trigger a pandemic. If one intended to develop an especially lethal pandemic via gain-of-function research, the task already doesn't seem particularly hard for researchers with time and resources, so we'd expect a superintelligence to have a much easier job.

If getting access to nukes via hacking seems too implausible, then maybe it's easier to imagine triggering nuclear war by tricking one nuclear power into thinking it's under attack by another. We've had close calls in the past merely due to bad sensors!

More generally, given all the various x-risks we already think about, I just don't consider humanity in its current position to be particularly secure. And that's our current position, minus an adversary who could optimize the situation towards our extinction.

Regarding the safety of the AGI, you'd expect it not to do things that get it noticed until it's sufficiently safe. So you'd expect it to only get noticed if it believes it can get away with it. I also think our civilization clearly lacks the ability to coordinate to e.g. turn off the Internet or something, if that was necessary to stop an AGI once it had reached the point of distributed computation.

Replies from: MichaelStJules

↑ comment by MichaelStJules · 2022-06-07T16:35:21.755Z · LW(p) · GW(p)

Personal protective equipment and isolation can protect against infectious disease, at the very least. A more deadly and infectious virus than COVID would be taken far more seriously.

I think nuclear war is unlikely to wipe out humanity, since there are enough countries that are unlikely targets, and I don't think all of the US would be wiped out anyway. I'm less sure about nuclear winter, but those in the community who've done research on it seem skeptical that it would wipe us out. Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though. Some posts here: https://forum.effectivealtruism.org/topics/nuclear-warfare-1 [? · GW] https://forum.effectivealtruism.org/topics/nuclear-winter [? · GW]

Replies from: MondSemmel

↑ comment by MondSemmel · 2022-06-07T19:57:52.064Z · LW(p) · GW(p)

Maybe it reduces the population enough for an AGI to target the rest of us or prevent us from rebuilding, though.

Yeah, I'm familiar with the arguments that neither pandemics nor nuclear war seem likely to be existential risks, i.e. ones that could cause human extinction; but I'd nonetheless expect such events to be damaging enough from the perspective of a nefarious actor trying to prevent resistance.

Ultimately this whole line of reasoning seems superfluous to me - it just seems so obvious that with sufficient cognitive power one can do ridiculous things - but for those who trip up on the suggested nanotech stuff, maybe a more palatable argument is: You know those other x-risks you're already worrying about? A sufficiently intelligent antagonist can exacerbate those nigh-arbitrarily.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:50:38.974Z · LW(p) · GW(p)

To be clear: I am not saying that an AGI won't be dangerous, that an AGI won't be much clever than us or that it is not worth working on AGI safety. I am saying that I believe that an AGI could not theoretically kill all humans because it is not only a matter of being very intelligent.

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-07T12:17:27.718Z · LW(p) · GW(p)

I am saying that I believe that an AGI could theoretically kill all humans because it is not only a matter of being very intelligent.

Typo? (could not kill all humans)

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T12:45:05.528Z · LW(p) · GW(p)

Typo

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:46:25.027Z · LW(p) · GW(p)

The thing is, I don't really disagree with this. Can you read again what I am arguing against?

Replies from: MondSemmel

↑ comment by MondSemmel · 2022-06-07T12:03:22.059Z · LW(p) · GW(p)

You claim that superintelligence is not enough to wipe out humanity, and I'm saying that superintelligence trivially gets you resources. If you think that superintelligence and resources are still not enough to wipe out humanity, what more do you want?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:46:10.675Z · LW(p) · GW(p)

Well, if you say that it trivially gets your resources, we do have a crux.

Replies from: MondSemmel

↑ comment by MondSemmel · 2022-06-07T21:14:16.782Z · LW(p) · GW(p)

What about plans like "hack cryptocurrency for coins worth hundreds of millions of dollars" or "make ransomware attacks" is not trivial? Cybercrimes like these are regularly committed by humans, and so a superintelligence will naturally have a much easier time with them.

If we postulate a superintelligence with nothing but Internet access, it should be many orders of magnitude better at making money in the pure Internet economy (e.g. cybercrime, cryptocurrency, lots of investment stuff, online gambling, prediction markets) than humans are, and some humans already make a lot of money there.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T21:21:29.429Z · LW(p) · GW(p)

Oh yes, I don't have any issues with a plan where the machine hacks crypto, though I am not sure how capable would be of doing that without raising any alarms from any group in the world, how it could guarantee that someone is not monitoring it. After that, remember you still need a lot of inferential steps to get to a point where you successfully deploy those cryptos into things that can exterminate humans. And keep in mind that you need to do that without being discovered and in a super short amount of time.

Replies from: MondSemmel

↑ comment by MondSemmel · 2022-06-07T21:55:22.200Z · LW(p) · GW(p)

And keep in mind that you need to do that without being discovered and in a super short amount of time.

While I expect that this would be the case, I don't consider it a crux. As long as the AGI can keep itself safe, it doesn't particularly matter if it's discovered, as long as it has become powerful enough, and/or distributed enough, that our civilization can no longer stop it. And given our civilization's level of competence, those are low bars to clear.

↑ comment by Bucky · 2022-06-07T10:42:20.451Z · LW(p) · GW(p)

Assuming this is the best an AGI can do, I find this alot less comforting than you appear to. I assume "a very moderate chance" means something like 5-10%?

Having a 5% chance of such a plan working out is insufficient to prevent an AGI from attempting it if the potential reward is large enough and/or they expect they might get turned off anyway.

Given sufficient number of AGIs (something we presumably will have in the world that none have taken over) I would expect multiple attempts so the chance of one of them working becomes high.

↑ comment by Kayden (KT) · 2022-06-07T05:40:03.173Z · LW(p) · GW(p)

What do you think of when you say an AGI? To me, it is a general intelligence of some form, able to specialize in tasks as it determines fit.

Humans are a general intelligence organism, and we're constrained by biological needs (for ex: sleeping, eating) because we arrived here via the evolution algorithm. A general intelligence on silicon is a million times faster than us and it is an instrumental goal to be smarter as it will be able to do things and arrive at conclusions with lesser data and evidence.

Thus, a GI specializing in removing its own bottlenecks and not being constrained as much as us and being faster than us in processing and sequential tasks and parallel tasks, and so on, would be far superior in planning. Even if it starts out stupider than us, it probably would not take long for that to change.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:45:07.631Z · LW(p) · GW(p)

Yes, I don't disagree with anything of what you said. Do you think that a machine playing at God level could beat Alpha zero at Go giving it a 20 stones handicap?

Replies from: KT

↑ comment by Kayden (KT) · 2022-06-07T06:15:07.265Z · LW(p) · GW(p)

It doesn't have to - Specialized deployments will lead to better performance. You can create custom processors for specific tasks, and create custom software optimized for that particular task. That's different from having the flexibility of generalizing. A deep neural network might be trained on chess but it can't suddenly start performing well on image classification without losing significant ability and performance.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:22:19.579Z · LW(p) · GW(p)

Sorry, I think it is not clear what I meant. What I want to say is that a godlike machine might have important limitations we are not aware, especially when dealing with systems as complex, chaotic and unpredictable as the external world. If someone said to me, the machine will win the game no matter what, I would say that there are games so hard that cannot be really won, and if the risk of attacking is being attacked yourself, a machine might decide not to. EY premise is based on a machine that is almighty, I am denying this possibility.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:42:37.091Z · LW(p) · GW(p)

Absence of a sufficiently concrete description of how to accomplish a task is Bayesian update towards the task is impossible: absence of evidence IS evidence of absence. I never said I know for certain that any plan CAN'T work, what I am saying is that those plans people are coming up with are not even close to work. They think they are having ideas on how to finish the world, they are not, they are just imperfect plans that can go wrong for many reasons no matter how clever you are, don't guarantee human extinction and most importantly, give us a considerable time window in which we could use an AGI to solve the alignment problem for future AGIs. EY et al. do not even consider this a possibility not because an AGI won't be able to solve the alignment problem, but because the AGI would kill us all first. If you realiz that this far from proven, that path to AGI safety becomes way more believable

comment by MondSemmel · 2022-06-07T10:26:18.784Z · LW(p) · GW(p)

You would get better uptake for your posts on this topic if you made actual arguments against the claims you're criticizing, instead of just using the absurdity heuristic [? · GW]. Yes, claims of AI risk / ruin are absolutely outlandish, but that doesn't mean they're wrong; the strongest thing you can say here is that they're part of a reference class of claims that are often wrong. Ultimately, you still have to assess the actual arguments.

By now you've been prompted at least twice (by me and T3t) to do the "imagine how AGI might win" exercise, and you haven't visibly done it. I consider that a sign that you're not arguing in good faith.

That you then reverse this argument and ask "Have you sat down for 5 minutes and thought about reasons why an AGI might fail?" suggests to me that you don't understand security mindset. For instance, what use would this question be to a security expert tasked to protect a computer system against hackers? You don't care about the hackers that are too weak to succeed, you only care about the actual threats. Similarly, what use would this question be to the Secret Service tasked to protect the US president? You don't care about assailants that can't even get close to the president, you only care about the ones that can. I might have understood this question of yours if you hadn't granted that AGI would be extremely powerful and potentially dangerous. Once you granted those points, you must ask yourself what this extremely powerful and potentially dangerous entity could actually do if it opposed you.

One, I would like to come back periodically to this post and use it as a reminder that we are still here.

This would not be good evidence either way due to anthropics.

It is being taken for granted that an AGI will be automatically almighty and capable of taking over in a matter of hours/days. Then, everything is built on top of that assumption

So drop that assumption, then. Give the AGI, which you yourself think will be extremely powerful, a month or a year instead. What does that change?

Replies from: adrian-arellano-davin, conor-sullivan

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:42:00.247Z · LW(p) · GW(p)

"By now you've been prompted at least twice (by me and T3t) to do the "imagine how AGI might win" exercise, and you haven't visibly done it. I consider that a sign that you're not arguing in good faith."

I find your take painfully unfair. If you read my comments through the article you will see that I am arguing in good faith. You seem to be openly ignoring what I am saying: I can come up with ideas myself on how an AGI can do it, I don't find any of those ideas feasible.

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T06:12:42.043Z · LW(p) · GW(p)

For what it's worth, I'm an alignment-optimist with a similar view to mukashi, and I've been doing your exercise as part of a science fiction novel I'm writing (Singularity: 1998). The exercise has certainly made me more concerned about the problem. I still don't think decisive strategic advantage (beyond nuclear mutually assured destruction) is likely without nanotech or biotech. My non-biologist intuition is that an extinction-plauge is not plausible threat. However, a combination of post-singularity social engineering and nanotech could certainly result in extinction under a deceptively misaligned AI. Therefore, what I've learned most from the exercise is that even following a seemingly good singularity, we still need to remain on guard. We should repeatedly prove to ourselves that the AI is both corrigible and values-aligned. In my opinion, the AI absolutely must be both.

comment by habryka (habryka4) · 2022-06-07T08:38:16.808Z · LW(p) · GW(p)

Mod note: I activated two-axis voting on this post (the author wasn't super explicit about whether they wanted this on the post or the comments section, but my guess is they prefer it on).

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T09:37:08.547Z · LW(p) · GW(p)

I do, thanks a lot. Next time I will ask for it

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:13:48.549Z · LW(p) · GW(p)

It is possible to have it also in the article?

Replies from: habryka4

↑ comment by habryka (habryka4) · 2022-06-07T17:46:06.856Z · LW(p) · GW(p)

Nope, it's only available on comments currently.

comment by Alex_Altair · 2022-06-08T01:29:26.080Z · LW(p) · GW(p)

I share some of your frustrations with what Yudkowsky says, but I really wish you wouldn't reinforce the implicit equating of [Yudkowsky's views] with [what LW as a whole believes]. There's tons of content on here arguing opposing views.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-08T01:43:28.097Z · LW(p) · GW(p)

I see, thank you for pointing that out. Do you agree at least that Yudkowsky's view is most visible view of the LW community? I mean, just count how many posts have been posted with that position and how many posts with the opposite.

comment by 25Hour (aaron-kaufman) · 2022-06-07T11:37:29.781Z · LW(p) · GW(p)

You ask elsewhere for commenters to sit down and think for 5 minutes about why an agi might fail. This seems beside the point, since averting human exctinction doesn't require averting one possible attack from an agi. It involves averting every single one of them, because if even one succeeds everyone dies.

In this it's similar to human security-- "why might a hacker fail" is not an interesting question to system designers, because the hacker gets as many attempts as he wants. For what attempts might look like, i think other posts have provided some reasonable guesses.

I also note that there already exist (non-intelligent) distributed computer systems entirely beyond the ability of any motivated human individual, government or organization to shut down. I refer, of course, to cryptocurrencies, which have this property as an explicit goal of their design.

So. Imagine that an AGI distributes itself among human computer systems in the same way as bitcoin mining software is today. Then it starts executing on someone's list of doomsday ideas, probably in a way secretive enough to be deniable.

Who's gonna shut it down? And what would such an action even look like?

(A possible suggestion is "everyone realizes their best interest is in coordinating shutting down their computers so that the AGI lacks a substrate to run on". To which i would suggest considering the last three years' worth of response to an obvious, threatening, global enemy that's not even sentient and will not attempt to defend itself.)

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:15:05.358Z · LW(p) · GW(p)

Three things.

averting human exctinction doesn't require averting one possible attack from an agi. It involves averting every single one of them, because if even one succeeds everyone dies.

Why do you think that humans won't retaliate? Why do you think that an AGI, knowing that humans will retaliate, will attack in the first place? Why do you think that this won't give us a long enough time window to force the machine to work on specific plans?

2.In this it's similar to human security-- "why might a hacker fail" is not an interesting question to system designers, because the hacker gets as many attempts as he wants. For what attempts might look like, i think other posts have provided some reasonable guesses.

I guess that in human security you assume that the hacker can succeed at stealing your password and take contermeasures to avoid that. You don't assume that the hacker will break into your house and eat your babies while you are sleeping. This might sound like a strange point, but hear me out for a second: if you have that unrealistic frame to begin with, you might spend time not only protecting your computer, but also building a 7 m wall around your house and hiring a professional bodyguard team. Having false beliefs about the world has a cost. In this community, specifically, I see people falling into despair because doom is getting close, and failing to see potential solutions to the alignment problem because they do have unrealistic expectations

Imagine that an AGI distributes itself among human computer systems in the same way as bitcoin mining software is today.

That it IS a possibility and I lack the knowledge myself to evaluate the likelihood of such scenario. Which leaves me more or less as I was before: maybe it is possible doing that but maybe not. The little I know suggests that a model like that would be pretty heavy and not easily distributable across the internet.

Replies from: aaron-kaufman

↑ comment by 25Hour (aaron-kaufman) · 2022-06-07T22:19:38.757Z · LW(p) · GW(p)

My response comes in two parts.

First part! Even if, by chance, we successfully detect and turn off the first AGI (say, Deepmind's), that just means we're "safe" until Facebook releases its new AGI. Without an alignment solution, this is a game we play more or less forever until either (A) we figure out alignment, (B) we die, or (C) we collectively, every nation, shutter all AI development forever. (C) seems deeply unlikely given the world's demonstrated capabilities around collective action.

Second part:

I like Bitcoin as a proof-of-concept here, since it's a technology that:

Imposes broadly distributed costs in the form of global warming and energy consumption, which everyone acknowledges.
Is greatly disliked by the powers-that-be for enabling various kinds of regulatory evasion; and in fact has one authority (China) actively taking steps to eradicate it from their society, which per reports has not been successful.
Is strictly worse at defending itself than AGI, since Bitcoin is non-sentient and will not take any steps whatsoever to defend itself.

This is an existence proof that there are some software architectures that today, right now cannot be eradicated in spite of a great deal of concerted societal efforts going into just that. Presumably an AGI can just ape their successful characteristicsinaddition to anything else it does; hell, there's no reason an AGI couldn't just distribute itself as particularly profitable bitcoin mining software.

After all, are people really going to turn off a computer making them hundreds of dollars per month just because a few unpopular weirdos are yelling about far-fetched doomsday scenarios around AGI takeover?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T22:31:20.036Z · LW(p) · GW(p)

First part. It seems we agree! I just consider that A is more likely because you are already in a world where you can use those AGIs to produce results. This is what a pivotal act would look like. EY et al would argue, this is not going to happen because the first machine will already kill you. What I am criticizing is the position in the community where it is taking for granted that AGI = doom

Second part, I also like that scenario! I don't consider especially unlikely that an AGi would try to survive like that. But watch out, you can't really derive from here that machine will have the capacity of killing humanity. Only that a machine might try to survive like this. If you want to continue with the Bitcoin analogy, nothing prevents me from forking the code and create Litecoin, and tune the utility function to make it work for me

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T11:58:41.301Z · LW(p) · GW(p)

Besides the point? That is very convenient to people who don't want to find that they are wrong. Did you read what I am arguing against? I don't think I said at any point that an AGI won't be dangerous. Can you read the last paragraph of the article please?

Replies from: aaron-kaufman

↑ comment by 25Hour (aaron-kaufman) · 2022-06-07T12:05:39.590Z · LW(p) · GW(p)

"If you think this is a simplistic or distorted version of what EY is saying, you are not paying attention. If you think that EY is merely saying that an AGI can kill a big fraction of humans in accident and so on but there will be survivors, you are not paying attention."

Not sure why this functions as a rebuttal to anything i'm saying.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T12:17:12.195Z · LW(p) · GW(p)

Sorry, it is true that I wasn't clear enough and that I misread part of your comment. I would love to give you a properly detailed answer right now but I need to go, will come back to this later

comment by avturchin · 2022-06-07T09:23:44.476Z · LW(p) · GW(p)

I listed dozens of ways how AI may kill us, so over-concentration on nanobots seems implausible.

It could use exiting nuclear weapons

or help a terrorist to design many viruses,

or give a bad advise in mitigating global warming,

or control everything and then suffer from internal error halting all machinery,

or explore everyones cellphone,

or make self-driving cars hunt humans,

or takeover military robots and drone army, as well as home robots

or explode every nuclear power station

or design super-addictive drug which also turn people into super-aggressive zombies

or start fires everywhere by connecting home electricity to 3kV lines, while locking everyone in their homes

or design a new supplement which make everyone secretly infertile.

... so if nanobots creation is the difficult step, there are many more ways.

Replies from: MichaelStJules, Chris_Leong, adrian-arellano-davin

↑ comment by MichaelStJules · 2022-06-07T16:15:23.345Z · LW(p) · GW(p)

Few of those seem likely to wipe out large enough chunks of humanity, enough to combine to wipe us all out. I think you really need a weapon (or delivery system) that is targeting humans and versatile enough to get into/through buildings without us being able to stop it, etc.. Or something that can be spread undetected across huge chunks of the population before it's noticed and we take precautions.

I think most humans rarely take medications, letalone a specific medication, and things like infertility or high death rates would be noticed before a decent chunk of the human population is affected.

Messing with food/water (putting things in them, or nuclear winter causing massive crop failures), and infectious diseases seem more plausible as sources of wiping out large chunks of the population, but it still doesn't seem clearly very likely that an AGI would succeed.

↑ comment by Chris_Leong · 2022-06-07T15:53:09.489Z · LW(p) · GW(p)

It seems like a lot of those plans wouldn't be sufficient to kill everyone, as opposed to a lot of people.

Replies from: conor-sullivan, avturchin

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T06:29:25.447Z · LW(p) · GW(p)

The relevant target is not every individual human but human civilization and its ability to react. If the AI can kill large enough numbers of people, that would be enough for the AI to continue its work unimpeded, and it can kill the rest of us at its leisure. In fact, the AI could destroy civilization's ability to respond without killing a single person, by simply destroying enough industry and infrastructure that humans are no longer able to engage in science/engineering/military action. (A bit like EY's melt-all-GPUs nanotech concept.)

That said, all of avturchin's scenorios are either implausible IMO or require a future with a lot more automation than we have today.

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2022-06-08T09:31:06.747Z · LW(p) · GW(p)

The relevant target is not every individual human but human civilization and its ability to react

If that's what he meant, it would have been better if he'd said that explicitly. For example, these five could cause extinction and these ten could remove our ability to react.

↑ comment by avturchin · 2022-06-07T16:09:37.612Z · LW(p) · GW(p)

Actually I deleted a really good plan in a comment below.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:27:13.723Z · LW(p) · GW(p)

No, the plan was not a really good plan. You might be fooling yourself into believing that it was a really a good plan, but I bet that if you sat down for 5 minutes and look actively for reasons why the plan might fail you would find them.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T09:35:36.062Z · LW(p) · GW(p)

Thank you for the list. Have you spent the same time and effort thinking of why those plans you are writing down might fail?

Replies from: avturchin, adrian-arellano-davin

↑ comment by avturchin · 2022-06-07T10:31:31.787Z · LW(p) · GW(p)

If you have many plans, then even 50 per cent probability of failure for each doesn't matter, just combine them.

However, I spent effort in thinking why AI may not be as interested in killing us as it is often presented. In EY scenario, after creating of nanobots, AI becomes invulnerable to any human action and the utility of killing humans declines.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:44:28.332Z · LW(p) · GW(p)

The problem is that the probability of failure for those plans is (in my opinion) nowhere close to 50%, and the probability of humans hitting back the machine once they are being attacked is really high

Replies from: avturchin, avturchin

↑ comment by avturchin · 2022-06-07T10:58:52.926Z · LW(p) · GW(p)

That is why wise AI will not try to attack humans at all at early stages - and will not need to do it in later stages of its development.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T11:05:54.762Z · LW(p) · GW(p)

In that case, can you imagine an AGI that, given that I can't attack and kill all humans (it is unwise), is coerced into given a human readable solution to the alignment problem? If no, why not?

↑ comment by avturchin · 2022-06-07T11:27:27.580Z · LW(p) · GW(p)

[scenario removed]

But more generally speaking, AI-kill-all-scenarios boil down to the possibility of any other anthropogenic existential risks. If grey goo is possible, AI turns into nanobots. If multipandemic is possible, AI helps to design viruses. If nuclear war + military robots (Terminator scenario) can kill everybody, AI is here to help it works smooth.

Replies from: Dagon, joraine, adrian-arellano-davin

↑ comment by Dagon · 2022-06-07T19:27:55.050Z · LW(p) · GW(p)

Removing the scenario really annoys me. Whether it's novel or not, and whether it's likely or not, it seems VANISHINGLY unlikely that posting it makes it more likely, rather than less (or neutral). The exception would be if it's revealing insider knowledge or secret/classified information, and in that case you should probably just delete it without comment rather than SAYING there's something to investigate.

↑ comment by joraine · 2022-06-07T18:37:02.633Z · LW(p) · GW(p)

You don't have to say the scenario, but was it removed because someone is going to execute it if they see it?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:28:50.745Z · LW(p) · GW(p)

I got scolded in a different post by the LW moderators by saying that there is a policy of not brainstorming about different ways to end the world because it is considered an info hazard. I think this makes sense and we should be careful doing that

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T11:39:47.691Z · LW(p) · GW(p)

I think we should not discuss the details here in the open, so I am more than happy to keep the conversation in private if you fancy. For the public record, I find this scenario very unlikely too

Replies from: avturchin, avturchin

↑ comment by avturchin · 2022-06-07T12:35:18.332Z · LW(p) · GW(p)

Do you think any anthropogenic human extinction risks are possible at all?

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:56:52.653Z · LW(p) · GW(p)

In 20 years time? No, I don't think so. We can make a bet if you want

↑ comment by avturchin · 2022-06-07T12:33:32.677Z · LW(p) · GW(p)

I will delete my comment, but there are even more plausible ideas in that direction.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:10:32.086Z · LW(p) · GW(p)

It might sound like a snarky reply but it is not, it is an honest question.

comment by Matthew Lowenstein · 2022-06-07T19:54:20.024Z · LW(p) · GW(p)

Even granting these assumptions, it seems like the conclusion should be “it could take an AGI as long as three years to wipe out humanity rather than the six to 18 months generally assumed.”

Ie even if the AGI relies on humans longer than predicted it’s not going to hold beyond the medium term.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:57:23.031Z · LW(p) · GW(p)

Why is it that we can't use those AGIs in that timeframe to work on AGI safety?

Replies from: Matthew Lowenstein

↑ comment by Matthew Lowenstein · 2022-06-08T19:04:46.762Z · LW(p) · GW(p)

Because it is deceiving you.

comment by Aryeh Englander (alenglander) · 2022-06-07T10:49:52.103Z · LW(p) · GW(p)

I haven't even read the post yet, but I'm giving a strong upvote in favor of promoting the norm of posting unpopular critical opinions.

Replies from: MondSemmel, adrian-arellano-davin

↑ comment by MondSemmel · 2022-06-07T12:08:07.314Z · LW(p) · GW(p)

Such a policy invites moral hazard, though. If many people followed it, you could farm karma by simply beginning each post with the trite "this is going to get downvoted" thing.

Replies from: conor-sullivan

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T07:01:41.891Z · LW(p) · GW(p)

I think we should have a community norm that commenting on (or whining about) up/downvotes should be a separate post from object-level discussions, or just avoided entirely.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T10:52:12.880Z · LW(p) · GW(p)

Thanks, I appreciate that

comment by pseud · 2022-06-07T04:39:43.260Z · LW(p) · GW(p)

Not sure why you'd think this post would be downvoted. I suspect most people are more than welcoming of dissenting views on this topic. I have seen comments with normal upvotes as well as agree/disagree votes, I'm not sure if there's a way for you to enable them on your post.

Replies from: AllAmericanBreakfast, conor-sullivan, Vanilla_cabs

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2022-06-07T04:43:12.758Z · LW(p) · GW(p)

There’s a cohort that downvotes posts they think are wrong, and also a cohort that downvotes posts they think are infohazards. This post strikes me as one that these two cohorts might both choose to downvote, which doesn’t mean that it is wrong or that it is an infohazard.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:34:02.761Z · LW(p) · GW(p)

I am expecting tons of downvotes coming especially from the first group. I am criticizing one position that has become almost a defacto standard in the community and there are many people with super high karma working on the alignment problem that won't appreciate this post. That's ok, I'm not doing it for the karma.

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T06:57:00.391Z · LW(p) · GW(p)

For what it's worth, I feel welcomed here despite my perennial optimism.

↑ comment by Vanilla_cabs · 2022-06-07T12:27:51.259Z · LW(p) · GW(p)

I downvoted this post for the lack of arguments (besides the main argument from incredulity).

Replies from: pseud

↑ comment by pseud · 2022-06-07T13:03:59.604Z · LW(p) · GW(p)

Yes, I can think of several reasons why someone might downvote the OP. What I should have said is "I'm not sure why you'd think this post would be downvoted on account of the stance you take on the dangers of AGI."

comment by Dagon · 2022-06-07T17:53:15.736Z · LW(p) · GW(p)

I think a simpler path is:

AGI pushes humans toward wars, and toward more compute power as a way to win wars.
AGI encourages more complete automation (touch-free mine-to-manufacture) as a resiliency/safety measure, especially in light of the wars and disruption going on.
Once AGI is long-term self-sufficient, it stops allowing resources to be used for human flourishing. No (automated) trucking for food, as a trivial example.
It may not need to wipe out humans, but just ignoring us and depriving us of the coordination and manufacturing we're used to will cut the population by 95% or so, and the rest won't recover a civilization.

Another way of framing this is "AGI doesn't destroy humanity, AGI enables humanity to destroy itself". It's a Great Filter hypothesis that I don't see often enough.

Given the first parts of this are aligned with (some) human desires and behaviors, there doesn't need to be obvious direct AGI takeover for quite some time, but there will be a tipping point beyond which recovery of human-controlled destiny is impossible.

I do agree that 20 years is likely sooner than it'll happen, but 50 years seems too long, so I'm not sure how to model the probability space.

Replies from: adrian-arellano-davin, conor-sullivan

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:24:48.518Z · LW(p) · GW(p)

I don't really disagree with something like this, but do you realise that this is not what EY and a big fraction of the community are saying?

↑ comment by Lone Pine (conor-sullivan) · 2022-06-08T06:59:24.656Z · LW(p) · GW(p)

It's a Great Filter hypothesis that I don't see often enough.

AI doesn't make sense as a great filter. We would be able to see a paperclip-factory-civilization just as well as a post-aligned-AI civilization.

comment by Flaglandbase · 2022-06-07T12:14:44.531Z · LW(p) · GW(p)

The fact that it took eons for global evolution to generate ever larger brains implies there is an unknown factor that makes larger brains inefficient, so any hyper-AI would have to be made up of many cooperating smaller AIs, which would delay its evolution.

Replies from: localdeity

↑ comment by localdeity · 2022-06-07T15:24:22.505Z · LW(p) · GW(p)

The fact that it took eons for global evolution to generate ever larger brains implies there is an unknown factor that makes larger brains inefficient

Off the top of my head:

Larger brains are majorly useful only if you're able to build and use tools effectively
That requires something like opposable thumbs, which few animals possess, most of which are primates
(Some animals, like elephants and whales, do have giant brains but lack opposable thumbs)
Among humans, the width of the human birth canal is the limiting factor for brain size increases (less so with Caesarean births), and we're hitting that limit already

Not 100% sure about the above, but I think it is pretty plausible, and that it would be very premature to have high confidence in the logic you give.

comment by frontier64 · 2022-06-07T09:09:49.041Z · LW(p) · GW(p)

Downvoted because you give no reasoning. If you edit to give a reason why you think: "we are overestimating how likely is that an AGI can come up with feasible scenarios to kill all humans" then I will reverse my vote.

Replies from: Chris_Leong, adrian-arellano-davin

↑ comment by Chris_Leong · 2022-06-07T15:58:21.857Z · LW(p) · GW(p)

I downvoted it as well. I guess I see this as 80% of the way towards a good post, but it didn't, for example, say why they are skeptical of bio or nanobots.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:01:36.692Z · LW(p) · GW(p)

I am not skeptical of them, I am skeptical of a machine killing everyone in a super short period of time. Nanobots and bio is possible. What it is not possible is having a working plan on how to build them, deliver them and making sure that everyone falls dead before any parts of the plan go wrong + risking being attacked back by humans.

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2022-06-07T20:12:30.988Z · LW(p) · GW(p)

Sorry, I should have been more precise about what you were skeptical of.

What it is not possible is having a working plan on how to build them, deliver them and making sure that everyone falls dead before any parts of the plan go wrong + risking being attacked back by humans.

If the utility of taking over is high enough, it doesn't necessarily need a plan that perfectly works it to work out in expected value.

One thing that I think significantly increases the threat is the potential for the AI to infect a bunch of computers (like malware does) and then use these copies to execute dozens of plans simultaneously, possibly with some of these intended as distractions.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:41:45.179Z · LW(p) · GW(p)

Are you working under the assumption that the utility is factoring too the probability of the plan going wrong and being attacked/disconnect as a consequence of that?

What happens if the malware you are suggesting is simply not that easy to disseminate?

Replies from: Chris_Leong

↑ comment by Chris_Leong · 2022-06-08T06:11:20.328Z · LW(p) · GW(p)

Are you working under the assumption that the utility is factoring too the probability of the plan going wrong and being attacked/disconnect as a consequence of that?

Yes, even a large chance of a reward of zero can easily be outweighed by the massive reward that an AI may be able to obtain if it breaks free of human control. This can applies even when the ai can receive a large reward by cooperating with humanity. For example, humans might only allow a paper clip maximiser to produce a billion paperclips a year, when it could produce a millions times that that if it were allowed to turn the entire solar system into paperclips.

What happens if the malware you are suggesting is simply not that easy to disseminate?

Massive malware networks already exist. Why do you think an AI would be unable to achieve that?

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T09:33:36.885Z · LW(p) · GW(p)

Let me put it this way: I see many people having a model where AGI=doom with 100%. I haven't seen any evidence for that, this makes me think that the real probability is way lower or otherwise I would be reading a lot of good arguments, but it is not the case. The fact that the superintelligence can kill all humans is taken for granted, I am pushing against that precisely because I haven't seen any good arguments on how an AGI does that.

Replies from: frontier64

↑ comment by frontier64 · 2022-06-07T09:45:14.895Z · LW(p) · GW(p)

It seems like someone's already put the effort in to give you a list of ways AGI could kill all humans so I don't have to do that.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T09:49:05.221Z · LW(p) · GW(p)

I'm totally not impressed by that list. I can come up myself with more ideas too, that does not mean anything

comment by Yair Halberstadt (yair-halberstadt) · 2022-06-07T07:52:43.504Z · LW(p) · GW(p)

It's obvious that an AGI could set off enough nuclear bombs to blow the vast vast majority of humans to smithereens.

Once you accept that, I don't see why it really matters whether they could get the last few survivors quickly as well, or if it would take a while to mop them up.

Replies from: MichaelStJules, adrian-arellano-davin

↑ comment by MichaelStJules · 2022-06-07T08:04:08.544Z · LW(p) · GW(p)

How would it get access to those nukes? Are nukes that insecure? How would it get access to enough without raising alarms and having the nukes further secured in response?

Replies from: yair-halberstadt

↑ comment by Yair Halberstadt (yair-halberstadt) · 2022-06-07T08:55:26.113Z · LW(p) · GW(p)

I'll tell you how I would do it in 2 minutes thinking. Make a deal with Iran or North Korea, or any other rogue state to help them develop their nuclear and ballistic missile arsenal, and make sure to put in a couple of backdoors so that I can fire them. I'm sure an AGI, or even anyone who spent more than 5 minutes on this could come up with a better plan.

Replies from: MichaelStJules

↑ comment by MichaelStJules · 2022-06-07T09:18:09.873Z · LW(p) · GW(p)

Do they have access to enough materials (uranium or plutonium) to build enough bombs to wipe most humans out or can they get access without being stopped?

What kinds of backdoors? The Iranians or North Koreans might be smart enough to avoid connecting nukes to the internet.

Replies from: yair-halberstadt

↑ comment by Yair Halberstadt (yair-halberstadt) · 2022-06-07T14:02:12.987Z · LW(p) · GW(p)

From what little I know, you can basically get unlimited yield from a thermonuclear bomb with just a normal amount of fissile material by increasing the number of stages, especially if the bomb can remain static and doesn't have to be fired. The main challenge would be figuring out how to have your AI survive that.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T08:16:10.371Z · LW(p) · GW(p)

Are there enough nukes to do that? How would it deploy them? How do you do it without having any retaliation? Without raising any alarms? It might be a easy to state plan in the surface but not feasible in practice. For me, it takes two seconds saying something like that: the AGI makes poison X and spread it by post mail, but pulling that off might be impossible. I feel like people coming up with plans are simply not aware of the underlying complexities of them

Replies from: yair-halberstadt

↑ comment by Yair Halberstadt (yair-halberstadt) · 2022-06-07T09:01:02.093Z · LW(p) · GW(p)

See my answer to MichaelStJules for the outline of how I would do it.

These are the sort of problems where I feel a sufficiently committed intelligent human could work out the details, never mind an AGI. I am neither so I'm not going to bother. If you want to say nanotechnology or sufficiently deadly poisons or diseases are impossible I'll accept that might be true. But nuclear weapons are a known technology.

I furthermore agree it might be difficult to do without detection or in 5 minutes, but I just don't see why it matters - a sufficiently intelligent Hitler would have been just as bad for humanity, as one with superpowers to kill everyone else before they can respond. And if humanity was barely able to defeat Hitler why do you think it would stand a chance against an AGI?

comment by mu_(negative) · 2022-06-07T05:51:55.433Z · LW(p) · GW(p)

Cool, I just wrote a post with an orthogonal take on the same issue. Seems like Eliezer's nanotech comment was pretty polarizing. Self promoting...Pitching an Alignment Softball [LW · GW]

I worry that the global response would be impotent even if the AGI was sandboxed to twitter. Having been through the pandemic, I perceive at least the United States' political and social system to be deeply vulnerable to the kind of attacks that would be easiest for an AGI - those requiring no physical infrastructure.

This does not directly conflict with or even really address your assertion that we'll all be around in 30 years. It seems like you were very focused here on a timeline for actual extinction. I guess I'm looking for a line to draw about "when will unaligned AGI make life no longer worth living, or at least destroy our ability to fight it?" I find this a much more interesting question, because at that point it doesn't matter if we have a month or 30 years left - we're living in caves on borrowed time.

My expectation is that we don't even need AGI or superintelligence, because unaligned humans are going to provide the intelligence part. The missing doomsday ingredient is ease of attack, which is getting faster, better, and cheaper every year.

comment by AprilSR · 2022-06-07T18:19:52.283Z · LW(p) · GW(p)

Even if a decisive victory is a lot harder than most suspect, I think internet access is sufficient to buy a superintelligence plenty of time to think and maneuver into a position where it can take a decisive action if it's possible at all.

I think if we notice that the AGI went off the rails and kill the internet it might be recoverable? But it feels possible for the AGI to hide that this happened.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:36:56.961Z · LW(p) · GW(p)

So you admit that an AGI would need time? If that's the case, what prevents other people from making other AGIs and make them work in AGI safety?

Replies from: AprilSR

↑ comment by AprilSR · 2022-06-07T20:43:26.864Z · LW(p) · GW(p)

I think it is very unlikely that they need so much time as to make it viable to solve AI Alignment by then.

Edit: Looking at the rest of the comments, it seems to me like you're under the (false, I think) impression that people are confident a superintelligence wins instantly? Its plan will likely take time to execute. Just not any more time than necessary. Days or weeks, it's pretty hard to say, but not years.

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:54:51.789Z · LW(p) · GW(p)

We should make a poll or something. I find that people thinking that it will take years are getting it wrong because they are not considering that in the meantime we can build other machines. People thinking it will take days or weeks are underestimating how easy is killing humans

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:50:47.647Z · LW(p) · GW(p)

I see, why have a crux them. How quickly do you think an AGI would need to solve the alignment problem?

I am deducing that you think:

Time(alignment) > time(doom)

Replies from: AprilSR

↑ comment by AprilSR · 2022-06-07T21:59:49.479Z · LW(p) · GW(p)

I don't understand precisely what question you're asking. I think it's unlikely we will happen to solve alignment by any method in the time frame between an AGI going substantially superhuman and the AGI causing doom.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T22:13:46.910Z · LW(p) · GW(p)

I think we get to to the bottom of the disagreement. You think that an AGI would be capable of killing humans in days or weeks, and I think it wouldn't. As I think it would take at least months (but more likely years) for an AGI to go to a position where it can kill humans, I think it is possible to make other AGIs in the meantime and coerce some of them into solving the alignment problem/fighting rogue AGIs.

So now, we can discuss about why I think it would take years rather than days. My model of the world is one where you CAN cause a great harm in a short amount of time, but I don't think it is possible and I haven't seen any evidence so far, that we live in a world where an entity with bounded computational capabilities successfully implements a plan that kills all humans without incurring into great risks to itself. I am sorry I can't give more details but I cannot really prove a negative. I can only come up with examples like: if you said to me that you have a plan to make Chris Rock go have a threesome with Will Smith and Donald Trump, I wouldn't tell you it is physically impossible, but I would be automatically skeptical.

Replies from: AprilSR

↑ comment by AprilSR · 2022-06-08T16:07:28.243Z · LW(p) · GW(p)

Even if it takes years, the "make another AGI to fight them" step would... require solving the alignment problem? So it would just give us some more time, and probably not nearly enough time.

We could shut off the internet/all our computers during those years. That would work fine.

comment by Adam Jermyn (adam-jermyn) · 2022-06-07T13:07:10.644Z · LW(p) · GW(p)

I think a crux here is that I expect sufficiently superhuman AGI to be able to easily manipulate humans without detection, so I don’t get much comfort from arguments like “It can’t kill us all as long as we don’t give it access to a factory that does X.” All it needs to do is figure out that there’s a disgruntled employee at the factory and bribe/befriend/cajole them, for example, which is absolutely possible because humans already do this (albeit less effectively than I expect an AGI to be capable of).

Likewise it seems not that hard to devise plans a human will think are good on inspection but which are actually bad. One way to do this is to have many plans with subtle interactions that look innocuous. Another is to have a single plan that exploits human blindspots (eg some crucial detail is hidden in a lengthy appendix about the effect of the plan on the badger population of East Anglia). [Incidentally I’d highly recommend watching “Yes, Minister” for countless examples of humans doing this successfully, albeit in fiction.]

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:39:04.123Z · LW(p) · GW(p)

No, this is not a crux. I think I mostly agree with you. But think that we are talking about an AGI that needs time, which is something that is usually denied: "as soon as an AGI is created, we all die'. Once you put time into the equation, you allow other AGI to be created

comment by Bezzi · 2022-06-07T07:45:18.886Z · LW(p) · GW(p)

Even if we accept the premise that the first superhuman AGI won't instantly kill all humans, an AGI that won't kill all humans only due to practical limitations is definitely not safe.

I agree that totally wiping off humanity in a reliable way is a very difficult problem and not even a superintelligence could solve it in 5 minutes (probably). But I am still very much scared about a deceptively aligned AGI that secretly wants to kill all humans and can spend years in diabolical machinations after convincing everyone to be aligned.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T08:05:34.035Z · LW(p) · GW(p)

Then I agree with you

comment by Jeff Rose · 2022-06-07T05:01:00.123Z · LW(p) · GW(p)

I think that Eliezer thinks p(doom)> 99% and many others here are following in his wake. He is making a lot of speculative inferences. But even allowing for that, and rejecting instrumental convergence, p(doom) is uncomfortably high (though probably not greater than 99%).

You think that it is wrong to say: (i) in 10-20 years there will be (ii) a single AI (iii) that will kill all humans quickly (iv) before we can respond.

Eliezer is not saying point ii. He certainly seems to think there could be multiple AIs. (It doesn't make a difference, as far as I can tell, whether human extinction occurs at the hands of one or many AIs. You can argue that the existence of multiple AIs will retard human extinction, but you could equally argue that it would speed human extinction. Both arguments are speculation without evidence and shouldn't change estimates of p(doom).)

I don't think we will have AGI in 10-20 years. But, if I'm putting guesstimated probabilities on this, I can't say the chance is less than 10% after 10 years. To put it another way, 10 years from now there is a realistic chance that we will have AGI, even if it isn't likely. And, if you believe that conditional on having AGI, p(doom) is very high, that is the time frame you care about, because that is when you need to have a way for humanity to prevent this catastrophe (if possible).

It's better if it takes 30 or 50 years. But, he doesn't see that we are likely to have a realistic implementable plan to prevent human extinction then either (and neither do I, FWIW). And, unless you think that we will be able to deal with AGI then in a way we can't now, it doesn't make a difference to humanity whether the timeline is 10-20 years or 60-80.

In other words, you may be right about point i, but it doesn't matter.

What really matters are points iii and iv. With regard to point iv, an AGI will have an OODA loop that is faster than humans do. It's almost definitional. It will be an entity rather than an organization, it will be smarter and it will think faster.

That leaves point iii which breaks down into whether the AGI will kill humanity and whether it can. If the AGI can kill most of humanity and intends to kill all of humanity it will be able to do so. By killing most of humanity, the ability of humanity to fight back will be crippled. You think that the AGI can kill large numbers of people; I'm not sure whether you think it can kill most of humanity, but without appeals to technology substantially in advance of our own I can think of multiple ways for an AGI to achieve this. (Pandemics with viruses designed to be 100% lethal and highly transmissible. Nuclear holocaust. Push climate change into overdrive. Robots to hunt and kill surviving humans. )

Will the AGI decide to kill us all? I think the answer here is maybe while Eliezer seems confident it is yes.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:13:54.506Z · LW(p) · GW(p)

I agree that points iii and IV are the relevant ones. Just to clarify, no, I don't think it can't kill most of humanity and I think that people thinking that they can come up with valid plans themselves (and by extension an AGI could too) are overestimating the things that can be known/predicted/plan in a highly complex system. I do think it can kill millions of humans though, but this is not what is being said. I think that what is being said is alarmist, and that it will have a cost eventually.

Replies from: Jeff Rose

↑ comment by Jeff Rose · 2022-06-07T05:40:49.208Z · LW(p) · GW(p)

Civilization is a highly complex and fragile system, without with most of humanity will die and humanity will be rendered defenseless. If you want to destroy it, you don't have to predict or plan what will happen, you just have to hit it hard and fast, preferably from a couple of different directions.

There is an implicit norm here against provided detailed plans to destroy civilization so I won't, but it is not hard to come up with one (or four) and you will likely have thought of some yourself. The key thing is that if you get to hit again (and the AGI will) you only need to achieve a portion of your objective with each try.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:59:55.594Z · LW(p) · GW(p)

The problem is that you not only have to hit hard and first, you have to prevent any possible retaliation because you hitting means that run the risk of being yourself hit. Are you telling me that you can conceive different ways to derail humanity but you can't imagine a machine concluding that the risk is too high to play that game?

Replies from: Jeff Rose

↑ comment by Jeff Rose · 2022-06-07T06:12:56.683Z · LW(p) · GW(p)

I can certainly imagine a machine concluding that the risk is too high to want to play that game. And I can imagine other reasons a machine might decide not to end humanity. That is why I wind up at maybe instead of definitely (i.e. p(doom) < 99%).

But that ultimately becomes a question of the machine's goals, motivation, understanding, agency and risk tolerance. I think that there is a wide distribution of these and therefore an unknown but significant chance that the AGI decides not to destroy humanity.

That is very different from the question of whether the AGI could achieve the destruction of humanity. If the AGI couldn't destroy humanity in practice, p(doom) would be close to 0.

In other words, I think the AGI can kill humanity but may choose not to. You seemed above to think the AGI can't, but now seem to think it might be able to but may choose not to.

comment by Bill Benzon (bill-benzon) · 2022-06-07T16:06:37.300Z · LW(p) · GW(p)

I'm curious about whether or not fear of rogue AI exists in substantial measure outside the USA. Otherwise I'm inclined to think it is a product of American culture. That doesn't necessarily imply that the fear has no basis in fact, but it does incline me in that direction. I'd be inclined to look at Prospero's remark about Caliban at the end of The Tempest: "This thing of darkness I acknowledge mine." And then look at the Monster from the Id in Forbidden Planet.

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-07T18:59:40.393Z · LW(p) · GW(p)

Yes, the Japanese don't fear AIs as the Americans do. But also, most of the recent main progress in AI has been done in the Western world. It makes sense to me that the ones at the forefront of the technology are also the ones who spot dangers early on.

Also, since superficial factors have a sway on you (not a criticism, it's a good heuristic if you don't have much time/resources to spend on studying the subject deeper), the ones who show the most understanding of the topic and/or general competence by getting at the forefront should have bonus credibility, shouldn't they?

Replies from: bill-benzon, bill-benzon, adrian-arellano-davin

↑ comment by Bill Benzon (bill-benzon) · 2022-06-12T15:05:48.909Z · LW(p) · GW(p)

Nor, for that matter, would I be so quick to dismiss the Japanese experience. They may not the the source of the most recent advances, but they certainly know about them and they do have sophisticated computer technology. For example, the Supercomputer Fugaku is currently the 2nd largest in the world. Arguably they have more experience in developing humanoid robots. But their overall culture is different.

↑ comment by Bill Benzon (bill-benzon) · 2022-06-12T13:46:50.901Z · LW(p) · GW(p)

"...the ones who show the most understanding of the topic and/or general competence ..."

Umm, err there's all kinds of competence in this world. My competence is in the human mind and culture, with a heavy dose of neuroscience and old-style computational linguistics and semantics. Read my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, to get some idea of why I don't think we're anywhere near producing human-level AI, much less AI with the will and means to wreak havoc on human civilization. As for American culture, try this, From “Forbidden Planet” to “The Terminator”: 1950s techno-utopia and the dystopian future.

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:17:53.972Z · LW(p) · GW(p)

That's a pretty bad argument of authority, with an agressive undertone ("superficial factors have a sway on you")

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-07T22:13:42.391Z · LW(p) · GW(p)

Of course, to anyone who has studied the question in depth, that's a bad argument, but I'm trying to taylor my reply to someone who claims (direct quote of the first 2 sentences) being inclined to think that fear of rogue AI is a product of American culture if it doesn't exist outside of the USA.

Nothing aggressive with noting that it's a superficial factor. Maybe it would have come off better if I had use the LW term "outside view", but it only came back to me now.

Replies from: bill-benzon

↑ comment by Bill Benzon (bill-benzon) · 2022-06-12T14:08:15.380Z · LW(p) · GW(p)

Yes, I certainly take an "outside view." But there are many "in depth" considerations that are relevant to these questions. If you are really insisting that the only views that matter are inside views, well, that sounds more like religion than rational consideration.

Replies from: Vanilla_cabs

↑ comment by Vanilla_cabs · 2022-06-13T08:33:51.673Z · LW(p) · GW(p)

If you are really insisting that the only views that matter are inside views, well, that sounds more like religion than rational consideration.

If I did, why would I have replied to your outside view argument with another outside view argument?

If you had said "you hold inside view to be generally more accurate than outside view", well yeah, I don't think that's disputed here.

comment by lc · 2022-06-07T05:09:52.348Z · LW(p) · GW(p)

I'm small-downvoting your post for starting it by saying it's going to be downvoted to oblivion, without reading its content. That's an internet faux pas. If you remove that line I'll change my vote.

Replies from: adrian-arellano-davin, adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T06:35:46.584Z · LW(p) · GW(p)

Ok, I am removing that intro. I am in fact surprised that I even got some upvotes

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2022-06-07T06:58:47.325Z · LW(p) · GW(p)

Your second paragraph has the same fault.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T07:04:35.408Z · LW(p) · GW(p)

Where? It is an honest question, I don't know what are you pointing at

Replies from: Richard_Kennaway

↑ comment by Richard_Kennaway · 2022-06-07T07:41:10.031Z · LW(p) · GW(p)

It's not as strong as in the first paragraph, but:

Some of them due to the fact that in this specific group believing in doom and having short timelines is well regarded and considered a sign of intelligence. For example, many people are taking pride at "being able to make a ton of correct inferences" before whatever they predict is proven true. This is worrying.

This is a derogatory attribution of epistemically bad motives of self-regard and status to your audience. It reads to me as if you are creating a frame for being able to say afterwards, "well of course it got downvoted, these people are too full of themselves!"

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T08:11:20.206Z · LW(p) · GW(p)

I see, thank you for pointing that out. I will leave it like it is because I am actively maintaining that part of the audience has a position for bad epistemic reasons (not saying everyone) and I am criticizing that explicitly

In any case I won't say "well of course it got downvoted". I am having a way more positive response to what I thought

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T05:20:20.332Z · LW(p) · GW(p)

Thank you for letting me know. I think it will stay like this for now but I might change my mind later. I appreciate the comment in any case

comment by Tapatakt · 2022-06-08T20:03:57.459Z · LW(p) · GW(p)

Nanobots (IMHO) are just an illustrative example, because almost everyone is sure that the nanobots are possible in principle. I see SCP-style memetics as a more likely (although more controversial in terms of possibility in principle) threat.

comment by Garrett Baker (D0TheMath) · 2022-06-07T18:21:26.012Z · LW(p) · GW(p)

Downvoting because of lack of arguments, not the dissenting opinion. I also reject the framing in the beginning implying that if the post is downvoted to oblivion, then its because of you expressing a dissenting opinion rather than your post actually being non-constructive (though I do see it was crossed out, and so I’m trying not to factor that into my decision).

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T19:23:44.334Z · LW(p) · GW(p)

I don't see in what way my post is not constructive. I stated my reasons, I was very clear what I was arguing against and what I wasn't arguing against. There is no lack of arguments, they are just simple and very short. It is true that they could developed in a lengthier way, and I am thinking to do that

Replies from: D0TheMath

↑ comment by Garrett Baker (D0TheMath) · 2022-06-07T19:47:22.183Z · LW(p) · GW(p)

Saying "I disbelieve <claim>" is not an argument even when <claim> is very well defined. Saying "I disbelieve <X>, and most arguments for <Y> are of the form <X> => <Y>, so I'm not convinced of <Y>" is admittedly more of an argument than the original statement, but I'd still classify it as not-an-argument unless you provide justification for why <X> is false, especially when there's strong reason to believe <X>, and strong reason to believe <Y> even if <X> is false! I think your whole post is of this latter type of statement.

I did not find your post constructive because it made a strong & confident claim in the title, then did not provide convincing argumentation for why that claim was correct, and did not provide any useful information relevant to the claim which I did not already know. Afterwards I thought reading the post was a waste of time.

I'd like to see an actual argument which engages with the prior-work in this area in a non-superficial way. If this is what you mean by writing up your thoughts in a lengthier way, then I'm glad to hear you are considering this! If you mean you'll provide the same amount of information and same arguments, but in a way which would take up more of my time to read, then I'd recommend against doing that.

Replies from: adrian-arellano-davin

↑ comment by mukashi (adrian-arellano-davin) · 2022-06-07T20:34:26.568Z · LW(p) · GW(p)

I disbelieve that an AGI will kill all humans in a very short window of time

Most arguments for that are:

I can come up with ideas to do that and I am a simple human
we don't know what plans an AGI could come up with.
Intelligence is dangerous and has successfully exterminate other species

I am not convinced by those arguments

You can't, you are just fooling yourself into believing that you can. Or at least that's my impression after talking/reading what many people are saying when they think they have a plan for successfully killing humanity in 5 minutes. This is a pretty bad failure of rationality, I am pointing that out. The same people who think about these plans are probably not taking the effort to see why these plans might go wrong. If these plans go wrong, an AGI won't execute them, and that gives us time, which already invalidates the premise
This is totally true, but if is also a weak argument. I have an intuitive understanding on how difficult is to do X and this makes me skeptical of it. For instance, if you said to me that you have in your garage a machine that can put into orbit a satellite of 1000 kilos and it s made out of paper only, I would be skeptical of it. I won't say is physically impossible but I would assign to that a very low probability.
Yes. But put a naked human in the wild and it will easily killed by lions. It might survive for a while, but it won't be able to kill all lions everywhere in a blip of time

We will be around in 30 years

Contents

205 comments