0 comments

Comments sorted by top scores.

comment by Mawrak · 2022-04-12T15:35:54.298Z · LW(p) · GW(p)

Inb4 rationalists intentionally develop an unaligned AI designed to destroy humanity. Maybe the real x-risks were the friends we made along the way...

comment by Evan R. Murphy · 2022-04-12T07:56:17.394Z · LW(p) · GW(p)

It strikes me that there is a lot of middle ground between 1) the utopias people are trying to create with beneficial AGI, and 2) human extinction.

So even if you think #1 is no longer possible (I don't think we're there yet, but I know some do), I don't think you need to leap all the way to #2 in order to address s-risk.

In a world were AGI has not yet been developed, there are surely ways to disrupt its development that don't require building Clippy and causing human extinction. Most of these ways I wouldn't recommend someone to pursue or feel comfortable posting on a public forum though.

Replies from: Daphne_W

↑ comment by Daphne_W · 2022-04-12T16:46:47.804Z · LW(p) · GW(p)

That's a fair point - my model does assume AGI will come into existence in non-negative worlds. Though I struggle to actually imagine a non-negative world where humanity is alive a thousand years from now and AGI hasn't been developed. Even if all alignment researchers believed it was the right thing to pursue, which doesn't seem likely.

Replies from: Evan R. Murphy

↑ comment by Evan R. Murphy · 2022-04-12T19:33:51.205Z · LW(p) · GW(p)

Even a 5~10 year delay in AGI deployment might give enough time to solve the alignment problem.

Replies from: Daphne_W

↑ comment by Daphne_W · 2022-04-13T13:46:15.487Z · LW(p) · GW(p)

That's not a middle ground between a good world and a neutral world, though, that's just another way to get a good world. If we assume a good world is exponentially unlikely, a 10 year delay might mean the odds of a good world rise from 10^-10 to 10^-8 (as opposed to pursuing Clippy bringing the odds of a bad world down from 10^-4 to 10^-6 ).

If you disagree with Yudkowsky about his pessimism about the probability of good worlds, then my post doesn't really apply. My post is about how to handle him being correct about the odds.

comment by gbear605 · 2022-04-12T03:01:24.173Z · LW(p) · GW(p)

Building friendly AGI may be out of reach of a Manhattan project, but building neutral AGI might not.

The idea that intentionally developing a neutral AGI is easier than intentionally developing a friendly AGI seems still unproven to me. If I'm evaluating a potential AGI and trying to figure out if it will be friendly, neutral, or evil, it seems no easier than just figuring out if it is merely friendly versus not friendly. Just as it is difficult to evaluate if an AI will be aligned with human values, it is also difficult to evaluate if it is anti-aligned with human values. For example, we might instantiate Clippy and then have it act like a Basilisk. It seems like Clippy only cares about paperclipping, so it should only do things that create more paperclips. But it could be that for [insert Basilisk decision theory logic here] it still needs to do evil-AGI things to best accomplish its goal of paperclipping.

I agree that since it's an unexplored space, studying it means that there is a decent chance of my assumptions being wrong, but we also shouldn't assume that they must be wrong.

All of that said, I strongly encourage the most possible caution with this post. Creating a "neutral" AGI is still a very evil act, even if it is the act with the highest expected utility.

Replies from: Daphne_W

↑ comment by Daphne_W · 2022-04-12T07:34:59.819Z · LW(p) · GW(p)

I'm not well-versed enough to offer something that would qualify as proof, but intuitively I would say "All problems with making a tiling bot robust are also found in aligning something with human values, but aligning something with human values comes with a host of additional problems, each of which takes additional effort". We can write a tiling bot for a grid world, but we can't write an entity that follows human values in a grid world. Tiling bots don't need to be complicated or clever, they might not even have to qualify as AGI - they just have to be capable of taking over the world.

All of that said, I strongly encourage the most possible caution with this post. Creating a "neutral" AGI is still a very evil act, even if it is the act with the highest expected utility.

Q5 of Yudkowsky's post [LW · GW] seems like an expert opinion that this sort of caution isn't productive. What I present here seems like a natural result of combining awareness of s-risk with the low probability of good futures that Yudkowsky asserts, so I don't think security from obscurity offers much protection. In the likely event that the evil thing is bad, it seems best to discuss it openly so that the error can be made plain for everyone and people don't get stuck believing it is the right thing to do or worrying that others believe it is the right thing to do. In the unlikely event that it is good, I don't want to waste time personally gathering enough evidence to become confident enough to act on it when others might have more evidence readily available.

comment by Mitchell_Porter · 2022-04-12T00:19:36.477Z · LW(p) · GW(p)

Summarizing your idea:

There are really good possible worlds (e.g. a happy billion-year future), and really bad possible worlds (e.g. a billion-year "torture world"). Compared to these, a world in which humans go extinct and are replaced by paperclip maximizers is just mildly bad (or maybe even mildly good, e.g. if we see some value in the scientific and technological progress of the paperclip maximizers).

If the really good worlds are just too hard to bring about (since they require that the problem of human alignment is solved), perhaps people should focus on deliberately bringing about the mildly good/bad worlds (what you call "neutral worlds"), in order to avoid the really bad outcomes.

In other words, Clippy's Modest Proposal boils down to embracing x-risk in order to avoid s-risk?

Replies from: Daphne_W

↑ comment by Daphne_W · 2022-04-12T07:35:53.384Z · LW(p) · GW(p)

That's the gist of it.

comment by Evan R. Murphy · 2022-04-11T23:09:17.797Z · LW(p) · GW(p)

I spent about 15 minutes reading and jumping around this post a bit confused about what the main idea was. I think I finally found it in this text toward the middle - extracting here in case it helps others understand (please let me know if I'm missing a more central part, Daphne_W):

And in the bottom regime [referring to a diagram in the post], where we find ourselves, where both bad and neutral outcomes are much more likely than good outcomes, increasing the probability of good outcomes is practically pointless. All that matters is shifting probability from bad worlds to neutral worlds.
This is a radical departure from the status quo. Most alignment researchers spend their effort developing tools that might make good worlds more likely. Some may be applicable to reducing the probability of bad worlds, but almost nothing is done to increase the probability of neutral worlds. This is understandable – promoting human extinction has a bad reputation, and it’s not something you should do unless you’re confident. You were ready to call yourself confident when it meant dying with dignity, though, and probabilities don’t change depending on how uncomfortable the resulting optimal policy makes you.

Meanwhile, because increasing the probability of neutral worlds is an underexplored field of research, it is likely that there are high-impact projects available. Building friendly AGI may be out of reach of a Manhattan project, but building neutral AGI might not.

I find this to be a depressing idea, but think it's also interesting and potentially worthwhile.

Replies from: Daphne_W, Jeff Rose

↑ comment by Daphne_W · 2022-04-12T07:50:36.176Z · LW(p) · GW(p)

Both that and Q5 seem important to me.

Q5 is an exploration of my uncertainty in spite of me not being able to find faults with Clippy's argument, as well as what I expect others' hesitance might be. If Clippy's argument is correct, then the section you highlight seems like the logical conclusion.

↑ comment by Jeff Rose · 2022-04-12T03:39:38.454Z · LW(p) · GW(p)

Interesting. I thought the main idea was contained in Question 5.

Replies from: Evan R. Murphy

↑ comment by Evan R. Murphy · 2022-04-12T06:33:28.239Z · LW(p) · GW(p)

Mitchell_Porter's summary [LW(p) · GW(p)] seems to concur with the text I focused on.

So you thought it was more about the fact that people would reject extinction, even if the likely alternative were huge amounts of suffering?