BOOK DRAFT: 'Ethics and Superintelligence' (part 1)

lukeprog

BOOK DRAFT: 'Ethics and Superintelligence' (part 1)

post by lukeprog · 2011-02-13T10:09:12.769Z · LW · GW · Legacy · 112 comments

  Chapter 1: The technological singularity is coming soon.
None
112 comments

I'm researching and writing a book on meta-ethics and the technological singularity. I plan to post the first draft of the book, in tiny parts, to the Less Wrong discussion area. Your comments and constructive criticisms are much appreciated.

This is not a book for a mainstream audience. Its style is that of contemporary Anglophone philosophy. Compare to, for example, Chalmers' survey article on the singularity.

Bibliographic references are provided here.

Part 1 is below...

Chapter 1: The technological singularity is coming soon.

The Wright Brothers flew their spruce-wood plane for 200 feet in 1903. Only 66 years later, Neil Armstrong walked on the moon, more than 240,000 miles from Earth.

The rapid pace of progress in the physical sciences drives many philosophers to science envy. Philosophers have been researching the core problems of metaphysics, epistemology, and ethics for millennia and not yet come to consensus about them like scientists have for so many core problems in physics, chemistry, and biology.

I won’t argue about why this is so. Instead, I will argue that maintaining philosophy’s slow pace and not solving certain philosophical problems in the next two centuries may lead to the extinction of the human species.

This extinction would result from a “technological singularity” in which an artificial intelligence (AI) of human-level general intelligence uses its intelligence to improve its own intelligence, which would enable it to improve its intelligence even more, which would lead to an “intelligence explosion” feedback loop that would give this AI inestimable power to accomplish its goals. If so, then it is critically important to program its goal system wisely. This project could mean the difference between a utopian solar system of unprecedented harmony and happiness, and a solar system in which all available matter is converted into parts for a planet-sized computer built to solve difficult mathematical problems.

The technical challenges of designing the goal system of such a superintelligence are daunting.[1] But even if we can solve those problems, the question of which goal system to give the superintelligence remains. It is a question of philosophy; it is a question of ethics.

Philosophy has impacted billions of humans through religion, culture, and government. But now the stakes are even higher. When the technological singularity occurs, the philosophy behind the goal system of a superintelligent machine will determine the fate of the species, the solar system, and perhaps the galaxy.

***

Now that I have laid my positions on the table, I must argue for them. In this chapter I argue that the technological singularity is likely to occur within the next 200 years unless a worldwide catastrophe drastically impedes scientific progress. In chapter two I survey the philosophical problems involved in designing the goal system of a singular superintelligence, which I call the “singleton.”

In chapter three I show how the singleton will produce very different future worlds depending on which normative theory is used to design its goal system. In chapter four I describe what is perhaps the most developed plan for the design of the singleton’s goal system: Eliezer Yudkowsky’s “Coherent Extrapolated Volition.” In chapter five, I present some objections to Coherent Extrapolated Volition.

In chapter six I argue that we cannot decide how to design the singleton’s goal system without considering meta-ethics, because normative theory depends on meta-ethics. In chapter seven I argue that we should invest little effort in meta-ethical theories that do not fit well with our emerging reductionist picture of the world, just as we quickly abandon scientific theories that don’t fit the available scientific data. I also specify several meta-ethical positions that I think are good candidates for abandonment.

But the looming problem of the technological singularity requires us to have a positive theory, too. In chapter eight I propose some meta-ethical claims about which I think naturalists should come to agree. In chapter nine I consider the implications of these plausible meta-ethical claims for the design of the singleton’s goal system.

***

[1] These technical challenges are discussed in the literature on artificial agents in general and Artificial General Intelligence (AGI) in particular. Russell and Norvig (2009) provide a good overview of the challenges involved in the design of artificial agents. Goertzel and Pennachin (2010) provide a collection of recent papers on the challenges of AGI. Yudkowsky (2010) proposes a new extension of causal decision theory to suit the needs of a self-modifying AI. Yudkowsky (2001) discusses other technical (and philosophical) problems related to designing the goal system of a superintelligence.

112 comments

Comments sorted by top scores.

comment by XiXiDu · 2011-02-13T11:42:39.933Z · LW(p) · GW(p)

In chapter two I survey the philosophical problems involved in designing the goal system of a singular superintelligence, which I call the “singleton.”

That sounds a bit like you invented the term "singleton". I suggest to clarify that with a footnote.

Replies from: Perplexed, lukeprog

↑ comment by Perplexed · 2011-02-13T14:11:59.376Z · LW(p) · GW(p)

You (Luke) also need to provide reasons for focusing on the 'singleton' case. To the typical person first thinking about AI singularities, the notion of AIs building better AIs will seem natural, but the idea of an AI enhancing itself will seem weird and even paradoxical.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T16:25:32.290Z · LW(p) · GW(p)

Indeed I shall.

Replies from: gwern

↑ comment by gwern · 2011-02-13T18:44:25.347Z · LW(p) · GW(p)

It's also worth noting that more than one person thinks the singleton wouldn't exist and alternative models are more likely. For example, Robin Hanson's em model (crack of a future dawn) is fairly likely given that we have a decent Whole Brain Emulation Roadmap, but nothing of the sort for a synthetic AI, and people like Nick Szabo emphatically disagree that a single agent could outperform a market of agents.

Replies from: Perplexed, lukeprog

↑ comment by Perplexed · 2011-02-13T21:57:47.148Z · LW(p) · GW(p)

Of course, people can be crushed by impersonal markets as easily as they can by singletons. The case might be made that we would prefer a singleton because the task of controlling it would be less complex and error-prone.

Replies from: gwern

↑ comment by gwern · 2011-02-13T22:16:11.821Z · LW(p) · GW(p)

A reasonable point, but I took Luke to be discussing the problems of designing a good singleton because a singleton seemed like the most likely outcome, not because he likes the singleton aesthetically or because a singleton would be easier to control.

Replies from: Perplexed

↑ comment by Perplexed · 2011-02-13T22:57:06.390Z · LW(p) · GW(p)

In the context of CEV, Eliezer apparently thinks that a singleton is desirable, not just likely.

Only one superintelligent AMA (Artificial Moral Agent) is to be constructed, and it is to take control of the entire future light cone with whatever goal function is decided upon. Justification: a singleton is the likely default outcome for superintelligence, and stable co-existence of superintelligences, if achievable, would offer no inherent advantages for humans.

I'm not convinced, but since Luke is going to critique CEV in any case, this aspect should be addressed.

ETA: I have been corrected - the quotation was not from Eliezer. Also, the quote doesn't directly say that a singleton is a desirable outcome; it says that the assumption that we will be dealing with a singleton is a desirable feature of an FAI strategy.

Replies from: Nick_Tarleton, lukeprog

↑ comment by Nick_Tarleton · 2011-02-14T23:59:09.189Z · LW(p) · GW(p)

I don't know how much you meant to suggest otherwise, but just for context, the linked paper was written by Roko and me, not Eliezer, and doesn't try to perfectly represent his opinions.

Replies from: Perplexed

↑ comment by Perplexed · 2011-02-15T00:09:05.364Z · LW(p) · GW(p)

No, I didn't realize that. Thx for the correction, and sorry for the misattribution.

↑ comment by lukeprog · 2011-02-14T06:21:48.454Z · LW(p) · GW(p)

I have different justifications in mind, and yes I will be explaining them in the book.

↑ comment by lukeprog · 2011-02-13T19:25:43.258Z · LW(p) · GW(p)

Yup, thanks.

↑ comment by lukeprog · 2011-02-13T16:33:19.105Z · LW(p) · GW(p)

Done. It's from Bostrom (2006).

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-02-13T17:14:34.989Z · LW(p) · GW(p)

You say you'll present some objections to CEV. Can you describe a concrete failure scenario of CEV, and state a computational procedure that does better?

Replies from: lukeprog, mwaser, Perplexed

↑ comment by lukeprog · 2011-02-13T19:22:59.222Z · LW(p) · GW(p)

As for concrete failure scenarios, yes - that will be the point of that chapter.

As for a computational procedure that does better, probably not. That is beyond the scope of this book. The book will be too long merely covering the ground that it does. Detailed alternative proposals will have to come after I have laid this groundwork - for myself as much as for others. However, I'm not convinced at all that CEV is a failed project, and that an alternative is needed.

Replies from: Eliezer_Yudkowsky, XiXiDu

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-02-13T20:59:33.097Z · LW(p) · GW(p)

Can you give me one quick sentence on a concrete failure mode of CEV?

Replies from: cousin_it, lukeprog, Dorikka

↑ comment by cousin_it · 2011-02-13T23:08:01.254Z · LW(p) · GW(p)

I'm confused by your asking such questions. Roko's basilisk is a failure mode of CEV. I'm not aware of any work by you or other SIAI people that addresses it, never mind work that would prove the absence of other, yet undiscovered "creative" flaws.

Replies from: Eliezer_Yudkowsky, XiXiDu

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-02-14T06:43:09.494Z · LW(p) · GW(p)

Roko's original proposed basilisk is not and never was the problem in Roko's post. I don't expect it to be part of CEV, and it would be caught by generic procedures meant to prevent CEV from running if 80% of humanity turns out to be selfish bastards, like the Last Jury procedure (as renamed by Bostrom) or extrapolating a weighted donor CEV with a binary veto over the whole procedure.

EDIT: I affirm all of Nesov's answers (that I've seen so far) in the threads below.

Replies from: cousin_it, ciphergoth, wedrifid, wedrifid

↑ comment by cousin_it · 2011-02-14T13:56:17.094Z · LW(p) · GW(p)

wedrifid is right: if you're now counting on failsafes to stop CEV from doing the wrong thing, that means you could apply the same procedures to any other proposed AI, so the real value of your life's work is in the failsafe, not in CEV. What happened to all your clever arguments saying you can't put external chains on an AI? I just don't understand this at all.

Replies from: Vladimir_Nesov, wedrifid

↑ comment by Vladimir_Nesov · 2011-02-14T14:53:58.635Z · LW(p) · GW(p)

Any given FAI design can turn out to be unable to do the right thing, which corresponds to tripping failsafes, but to be a FAI it must also be potentially capable (for all we know) of doing the right thing. Adequate failsafe should just turn off an ordinary AGI immediately, so it won't work as an AI-in-chains FAI solution. You can't make AI do the right thing just by adding failsafes, you also need to have a chance of winning.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-02-14T16:29:26.604Z · LW(p) · GW(p)

Affirmed.

↑ comment by wedrifid · 2011-02-15T08:53:11.537Z · LW(p) · GW(p)

wedrifid is right: if you're now counting on failsafes to stop CEV from doing the wrong thing, that means you could apply the same procedures to any other proposed AI, so the real value of your life's work is in the failsafe, not in CEV.

Since my name was mentioned I had better confirm that I generally agree with your point but would have left out this sentence:

What happened to all your clever arguments saying you can't put external chains on an AI?

I don't disagree with the principle of having a failsafe - and don't think it is incompatible with the aforementioned clever arguments. But I do agree that "but there is a failsafe" is an utterly abysmal argument in favour of preferring CEV over an alternative AI goal system.

I just don't understand this at all.

Tell me about it. With most people if they kept asking the same question when the answer is staring them in the face and then act oblivious as it is told to them repeatedly I dismiss them as either disingenuous or (possibly selectively) stupid in short order. But, to borrow wisdom from HP:MoR:

.... that just doesn't sound like /Eliezer's/ style.

...but you can only think that thought so many times, before you start to wonder about the trustworthiness of that whole 'style' concept.

↑ comment by Paul Crowley (ciphergoth) · 2011-02-14T08:14:27.046Z · LW(p) · GW(p)

Is the Last Jury written up anywhere? It's not in the draft manuscript I have.

Replies from: gwern

↑ comment by gwern · 2011-07-18T03:35:49.831Z · LW(p) · GW(p)

I assume Last Jury is just the Last Judge from CEV but with majority voting among n Last Judges.

↑ comment by wedrifid · 2011-02-14T08:16:00.437Z · LW(p) · GW(p)

it would be caught by generic procedures meant to prevent CEV from running if 80% of humanity turns out to be selfish bastards

I too am confused by your asking of such questions. Your own "80% of humanity turns out to be selfish bastards" gives a pretty good general answer to the question already.

"But we will not run it if is bad" seems like it could be used to reply to just about anything. Sure, it is good to have safety measures no matter what you are doing but not running it doesn't make CEV desirable.

Replies from: XiXiDu, Vladimir_Nesov

↑ comment by XiXiDu · 2011-02-14T11:30:14.727Z · LW(p) · GW(p)

I'm completely confused now. I thought CEV was right by definition? If "80% of humanity turns out to be selfish bastards" then it will extrapolate on that. If we start to cherry pick certain outcomes according to our current perception, why run CEV at all?

Replies from: wedrifid, Vladimir_Nesov

↑ comment by wedrifid · 2011-02-14T12:15:16.371Z · LW(p) · GW(p)

I'm completely confused now. I thought CEV was right by definition? If "80% of humanity turns out to be selfish bastards" then it will extrapolate on that.

No, CEV is right by definition. When CEV is used as shorthand for "the coherent extrapolated volitions of all of humanity" as is the case there then it is quite probably not right at all. Because many humans, to put it extremely politely, have preferences that are distinctly different to what I would call 'right'.

If we start to cherry pick certain outcomes according to our current perception, why run CEV at all?

Yes, that would be pointless, it would be far better to compare the outcomes to CEV<group_I_identify_with_sufficiently> (then just use the latter!) The purpose of doing CEV at all is for signalling and cooperation.

Replies from: steven0461, XiXiDu

↑ comment by steven0461 · 2011-02-14T19:44:23.464Z · LW(p) · GW(p)

Because many humans, to put it extremely politely, have preferences that are distinctly different to what I would call 'right'.

Before or after extrapolation? If the former then why does that matter, if the latter then how do you know?

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-15T02:22:09.148Z · LW(p) · GW(p)

Before or after extrapolation? If the former then why does that matter, if the latter then how do you know?

Former in as much as it allows inferences about the latter. I don't need to know with any particular confidence for the purposes of the point. The point was to illustrate possible (and overwhelmingly obvious) failure modes.

Hoping that CEV is desirable rather than outright unfriendly isn't a particularly good reason to consider it. It is going to result in outcomes that are worse from the perspective of whoever is running the GAI than CEV and CEV.

The purpose of doing CEV at all is for signalling and cooperation (or, possibly, outright confusion).

↑ comment by XiXiDu · 2011-02-14T14:17:13.146Z · LW(p) · GW(p)

The purpose of doing CEV at all is for signalling and cooperation.

Do you mean it is simply an SIAI marketing strategy and that it is not what they are actually going to do?

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-14T14:44:17.918Z · LW(p) · GW(p)

Do you mean it is simply an SIAI marketing strategy and that it is not what they are actually going to do?

Signalling and cooperation can include actual behavior.

↑ comment by Vladimir_Nesov · 2011-02-14T11:56:39.133Z · LW(p) · GW(p)

CEV is not right by definition, it's only well-defined given certain assumptions that can fail. It should be designed so that if it doesn't shut down, then it's probably right.

Replies from: Tyrrell_McAllister

↑ comment by Tyrrell_McAllister · 2011-02-14T17:58:35.893Z · LW(p) · GW(p)

Sincere question: Why would "80% of humanity turns out to be selfish bastards" violate one of those assumptions? Is the problem the "selfish bastard" part? Or is it that the "80%" part implies less homogeneity among humans than CEV assumes?

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-15T02:34:17.757Z · LW(p) · GW(p)

Why would "80% of humanity turns out to be selfish bastards" violate one of those assumptions?

It would certainly seem that 80% of humanity turning out to be selfish bastards is compatible with CEV being well defined, but not with being 'right'. This does not technically contradict anything in the grandparent (which is why I didn't reply with the same question myself). It does, perhaps, go against the theme of Nesov's comments.

Basically, and as you suggest, either it must be acknowledged that 'not well defined' and 'possibly evil' are two entirely different problems or something that amounts to 'humans do not want things that suck' must be one of the assumptions.

Replies from: XiXiDu

↑ comment by XiXiDu · 2011-02-15T09:52:51.524Z · LW(p) · GW(p)

It would certainly seem that 80% of humanity turning out to be selfish bastards is compatible with CEV being well defined, but not with being 'right'.

I suppose you have to comprehend Yudkowsky's metaethics to understand that sentence. I still don't get what kind of 'right' people are talking about.

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-15T10:06:46.532Z · LW(p) · GW(p)

I still don't get what kind of 'right' people are talking about.

Very similar to your right, for all practical purposes. A slight difference in how it is described though. You describe (if I recall), 'right' as being "in accordance with XiXiDu's preferences". Using Eliezer's style of terminology you would instead describe 'right' as more like a photograph of what XiXiDu's preferences are, without them necessarily including any explicit reference to XiXiDu.

In most cases it doesn't really matter. It starts to matter once people start saying things like "But what if XiXiDu could take a pill that made him prefer that he eat babies? Would that mean it became right? Should XiXiDu take the pill?"

By the way, 'right' would also mean what the photo looks like after it has been airbrushed a bit in photoshop by an agent better at understanding what we actually want than we are at introspection and communication. So it's an abstract representation of what you would want if you were smarter and more rational but still had your preferences.

Also note that Eliezer sometimes blurs the line between 'right' meaning what he would want and what some abstract "all of humanity" would want.

↑ comment by Vladimir_Nesov · 2011-02-14T11:16:41.829Z · LW(p) · GW(p)

"But we will not run it if is bad" seems like it could be used to reply to just about anything. Sure, it is good to have safety measures no matter what you are doing but not running it doesn't make CEV desirable.

In case where assumptions fail, and CEV ceases to be predictably good, safety measures shut it down, so nothing happens. In case where assumptions hold, it works. As a result, CEV has good expected utility, and gives us a chance to try again with a different design if it fails.

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-14T11:56:05.712Z · LW(p) · GW(p)

This does not seem to weaken the position you quoted in any way.

Failsafe measures are a great idea. They just don't do anything to privileged CEV + failsafe over anything_else + failsafe.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2011-02-14T12:09:40.456Z · LW(p) · GW(p)

Failsafe measures are a great idea. They just don't do anything to privilege CEV + failsafe over anything_else + failsafe.

Yes. They make sure that [CEV + failsafe] is not worse than not running any AIs. Uncertainty about whether CEV works makes expected [CEV + failsafe] significantly better than doing nothing. Presence of potential controlled shutdown scenarios doesn't argue for worthlessness of the attempt, even where detailed awareness of these scenarios could be used to improve the plan.

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-14T12:21:19.483Z · LW(p) · GW(p)

I'm actually not even sure whether you are trying to disagree with me or not but once again, in case you are, nothing here weakens my position.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2011-02-14T12:31:42.230Z · LW(p) · GW(p)

"Not running it" does make [CEV + failsafe] desirable, as compared to doing nothing, even in the face of problems with [CEV], and nobody is going to run just [CEV]. So most arguments for presence of problems in CEV, if they are met with adequate failsafe specifications (which is far from a template to reply to anything, failsafes are not easy), do indeed lose a lot of traction. Besides, what are they arguments for? One needs a suggestion for improvement, and failsafes are intended to make it so that doing nothing is not an improvement, even though improvements over any given state of the plan would be dandy.

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-14T13:01:46.457Z · LW(p) · GW(p)

"Not running it" does make [CEV + failsafe] desirable, as compared to doing nothing

Yes, this is trivially true and not currently disputed by anyone here. Nobody is suggesting doing nothing. Doing nothing is crazy.

↑ comment by wedrifid · 2011-02-14T11:05:12.900Z · LW(p) · GW(p)

Roko's original proposed basilisk is not and never was the problem in Roko's post.

Of course, Roko did not originally propose a basilisk at all. Just a novel solution to a obscure game theory problem.

↑ comment by XiXiDu · 2011-02-14T11:13:28.502Z · LW(p) · GW(p)

Roko's basilisk is a failure mode of CEV.

From your current perspective. But also given your extrapolated volition? If it is, then it won't happen.

ETA The above was confusing and unclear. I don't believe that one person can change the course of CEV. I rather meant to ask if he believes that it would be a failure mode even if it was the correct extrapolated volition of humanity.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2011-02-14T11:59:13.469Z · LW(p) · GW(p)

If CEV has a serious bug, it won't correctly implement anyone's volition, and so someone's volition saying that CEV shouldn't have that bug won't help.

Replies from: XiXiDu, XiXiDu

↑ comment by XiXiDu · 2011-02-14T14:31:52.830Z · LW(p) · GW(p)

Never mind, upvoted your comment. I wrote "then it won't happen". That was wrong, I don't actually believe that. I meant to ask something different. Edited the comment to add a clarification.

↑ comment by XiXiDu · 2011-02-14T14:10:06.999Z · LW(p) · GW(p)

If CEV has a serious bug, it won't correctly implement anyone's volition...

Obviously. A bug would be the inability to extrapolate volition correctly, not a certain outcome that is based on the correct extrapolated volition. So what did cousin_it mean by saying that outcome X is a failure mode? Does he mean that from his current perspective he doesn't like outcome X or that outcome X would imply a bug in the process of extrapolating volition? (ETA I'm talking about CEV-humanity and not CEV-cousin-it. There would be no difference in the latter case.)

↑ comment by lukeprog · 2011-02-13T21:02:58.046Z · LW(p) · GW(p)

Not until I get to that part of the writing and research, no.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-14T04:56:31.561Z · LW(p) · GW(p)

That is, I'm applying your advice to hold off on proposing solutions until the problem has been discussed as thoroughly as possible without suggesting any.

Replies from: Adele_L

↑ comment by Adele_L · 2013-11-13T05:18:16.032Z · LW(p) · GW(p)

Has this been published anywhere yet?

Replies from: lukeprog

↑ comment by lukeprog · 2013-11-13T16:58:30.346Z · LW(p) · GW(p)

A related thing that has since been published is Ideal Advisor Theories and Personal CEV.

I have no plans to write the book; see instead Bostrom's far superior Superintelligence, forthcoming.

↑ comment by Dorikka · 2011-02-14T06:09:46.718Z · LW(p) · GW(p)

Extrapolated humanity decides that the best possible outcome is to become the Affront. Now, if the FAI put everyone in a separate VR and tricked him into believing that he was acting all Affront-like, then everything would be great -- everyone would be content. However, people don't just want the experience of being the Affront -- everyone agrees that they want to be truly interacting with other sentiences which will often feel the brunt of each other's coercive action.

Replies from: Eliezer_Yudkowsky, nazgulnarsil, lukeprog

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-02-14T06:40:23.884Z · LW(p) · GW(p)

Original version of grandparent contained, before I deleted it, "Besides the usual 'Eating babies is wrong, what if CEV outputs eating babies, therefore a better solution is CEV plus code that outlaws eating babies.'"

↑ comment by nazgulnarsil · 2011-02-16T01:21:39.344Z · LW(p) · GW(p)

I have never understood what is wrong with the amnesia-holodecking scenario. (is there a proper name for this?)

Replies from: Dorikka, Sniffnoy

↑ comment by Dorikka · 2011-02-16T02:20:56.976Z · LW(p) · GW(p)

If you want to, say, stop people from starving to death, would you be satisfied with being plopped on a holodeck with images of non-starving people? If so, then your stop-people-from-starving-to-death desire is not a desire to optimize reality into a smaller set of possible world-states, but simply a desire to have a set of sensations so that you believe starvation does not exist. The two are really different.

If you don't understand what I'm saying, the first two paragraphs of this comment might explain it better.

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-16T02:25:44.531Z · LW(p) · GW(p)

thanks for clarifying. I guess I'm evil. It's a good thing to know about oneself.

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-16T02:30:23.516Z · LW(p) · GW(p)

Uh, that was a joke, right?

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-16T06:19:09.888Z · LW(p) · GW(p)

no.

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-16T23:53:37.863Z · LW(p) · GW(p)

What definition of evil are you using? I'm having trouble understanding why (how?) you would declare yourself evil, especially evil_nazgulnarsil.

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-17T06:06:07.916Z · LW(p) · GW(p)

i don't care about suffering independent of my sensory perception of it causing me distress.

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-17T15:31:49.286Z · LW(p) · GW(p)

Oh. In that case, it might be more precise to say that your utility function does not assign positive or negative utility to the suffering of others (if I'm interpreting your statement correctly). However, I'm curious about whether this statement holds true for you at extremes, so here's a hypothetical.

I'm going to assume that you like ice cream. If you don't like any sort of ice cream, substitute in a certain quantity of your favorite cookie. If you could get a scoop of ice cream (or a cookie) for free at the cost of a million babies thumbs cut off, would you take the ice cream/cookie?

If not, then you assign a non-zero utility to others suffering, so it might be true that you care very little, but it's not true that you don't care at all.

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-18T07:33:48.864Z · LW(p) · GW(p)

I think you misunderstand slightly. Sensory experience includes having the idea communicated to me that my action is causing suffering. I assign negative utility to other's suffering in real life because the thought of such suffering is unpleasant.

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-19T01:50:20.590Z · LW(p) · GW(p)

Alright. Would you take the offer if Omega promised that he would remove your memories of the agreement of having a million babies' thumbs cut off for a scoop of ice cream right after you made the agreement, so you could enjoy your ice-cream without guilt?

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-19T03:31:59.468Z · LW(p) · GW(p)

no, at the time of the decision i have sensory experience of having been the cause of suffering.

I don't feel responsibility to those who suffer in that I would choose to holodeck myself rather than stay in reality and try to fix problems. this does not mean that I will cause suffering on purpose.

a better hypothetical dilemma might be if I could ONLY get access to the holodeck if I cause others to suffer (cypher from the matrix).

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-20T01:47:01.205Z · LW(p) · GW(p)

I don't feel responsibility to those who suffer in that I would choose to holodeck myself rather than stay in reality and try to fix problems.

Okay, so you would feel worse if you had caused people the same amount of suffering than you would if someone else had done so?

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-20T01:55:25.701Z · LW(p) · GW(p)

yes

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-20T01:59:27.051Z · LW(p) · GW(p)

Mmkay. I would say that our utility functions are pretty different, in that case, since, with regard to suffering, I value world-states according to how much suffering they contain, not according to who causes the suffering.

↑ comment by Sniffnoy · 2011-02-16T09:03:04.808Z · LW(p) · GW(p)

Well, it's essentially equivalent to wireheading.

Replies from: nazgulnarsil

↑ comment by nazgulnarsil · 2011-02-16T10:16:45.058Z · LW(p) · GW(p)

which I also plan to do if everything goes tits-up.

↑ comment by lukeprog · 2011-02-14T06:20:09.151Z · LW(p) · GW(p)

Dorikka,

I don't understand this. If the singleton's utility function was written such that it's highest value was for humans to become the Affront, then making it the case that humans believed they were the Affront while not being the Affront would not satisfy the utility function. So why would the singleton do such a thing?

Replies from: Dorikka

↑ comment by Dorikka · 2011-02-15T02:45:39.434Z · LW(p) · GW(p)

I don't think that my brain was working optimally at 1am last night.

My first point was that our CEV might decide to go Baby-Eater, and so the FAI should treat the caring-about-the-real-world-state part of its utility function as a mere preference (like chocolate ice cream), and pop humanity into a nicely designed VR (though I didn't have the precision of thought necessary to put it into such language). However, it's pretty absurd for us to be telling our CEV what to do, considering that they'll have much more information than we do and much more refined thinking processes. I actually don't think that our Last Judge should do anything more than watch for coding errors (as in, we forgot to remove known psychological biases when creating the CEV).

My second point was that the FAI should also slip us into a VR if we desire a world-state in which we defect from each other (with similar results as in the prisoner's dilemma). However, the counterargument from point 1 also applies to this point.

↑ comment by XiXiDu · 2011-02-13T20:02:41.765Z · LW(p) · GW(p)

However, I'm not convinced at all that CEV is a failed project, and that an alternative is needed.

Maybe you should rephrase it then to say that you'll present some possible failure modes of CEV that will have to be taken care of rather than "objections".

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T20:18:34.756Z · LW(p) · GW(p)

No, I'm definitely presenting objections in that chapter.

↑ comment by mwaser · 2011-02-16T13:03:28.420Z · LW(p) · GW(p)

MY "objection" to CEV is exactly the opposite of what you're expecting and asking for. CEV as described is not descriptive enough to allow the hypothesis "CEV is an acceptably good solution" to be falsified. Since it is "our wish if we knew more", etc., any failure scenrio that we could possibly put forth can immediately be answered by altering the potential "CEV space" to answer the objection.

I have radically different ideas about where CEV is going to converge to than most people here. Yet, the lack of distinctions in the description of CEV cause my ideas to be included under any argument for CEV because CEV potentially is . . . ANYTHING! There are no concrete distinctions that clearly state that something is NOT part of the ultimate CEV.

Arguing against CEV is like arguing against science. Can you argue a concrete failure scenario of science? Now -- keeping Hume in mind, what does science tell the AI to do? It's precisely the same argument, except that CEV as a "computational procedure" is much less well-defined than the scientific method.

Don't get me wrong. I love the concept of CEV. It's a brilliant goal statement. But it's brilliant because it doesn't clearly exclude anything that we want -- and human biases lead us to believe that it will include everything we truly want and exclude everything we truly don't want.

My concept of CEV disallows AI slavery. Your answer to that is "If that is truly what a grown-up humanity wants/needs, then that is what CEV will be". CEV is the ultimate desire -- ever-changing and never real enough to be pinned down.

↑ comment by Perplexed · 2011-02-13T17:42:28.813Z · LW(p) · GW(p)

What source would you recommend to someone who wants to understand CEV as a computational procedure?

comment by Kevin · 2011-02-13T21:36:34.168Z · LW(p) · GW(p)

Luke, as an intermediate step before writing a book you should write a book chapter for Springer's upcoming edited volume on the Singularity Hypothesis. http://singularityhypothesis.blogspot.com/p/about-singularity-hypothesis.html I'm not sure how biased they are against non-academics... probably depends on how many submissions they get.

Maybe email Louie and me and we can brainstorm about topics; meta-ethics might not be the best thing compared to something like making an argument about how we need to solve all of philosophy in order to safely build AI.

Replies from: mwaser, lukeprog

↑ comment by mwaser · 2011-02-16T12:05:48.146Z · LW(p) · GW(p)

I know the individuals involved. They are not biased against non-academics and would welcome a well-thought-out contribution from anyone. You could easily have a suitable abstract ready by March 1st (two weeks early) if you believed that it was important enough -- and I would strongly urge you to do so.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-19T17:53:16.008Z · LW(p) · GW(p)

Thanks for this input. I'm currently devoting all my spare time to research on a paper for this volume so that I can hopefully have an extended abstract ready by March 15th.

↑ comment by lukeprog · 2011-02-13T22:18:21.608Z · LW(p) · GW(p)

I will probably write papers and articles in the course of developing the book. Whether or not I could have an abstract ready by March 15th is unknown; at the moment, I still work a full-time job. Thanks for bringing this to my attention.

comment by James_Miller · 2011-02-13T19:17:27.357Z · LW(p) · GW(p)

The first sentence is the most important of any book because if a reader doesn't like it he will stop. Your first sentence contains four numbers, none of which are relevant to your core thesis. Forgive me for being cruel but a publisher reading this sentence would conclude that you lack the ability to write a book people would want to read.

Look at successful non-fiction books to see how they get started.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:41:24.616Z · LW(p) · GW(p)

This is not a book for a popular audience. Also, it's a first draft. That said, you needn't apologize for saying anything "cruel."

But, based on your comments, I've now revised my opening to the following...

Compared to science, philosophy moves at a slow pace. A few decades after the Wright Brothers flew their spruce-wood plane for half the length of a football field, Neil Armstrong walked on the moon. Meanwhile, philosophers are still debating the questions Plato raised more than two millennia ago.

But the world is about to change. Maintaining philosophy’s slow pace and not solving certain philosophical problems in the next two centuries may lead to the extinction of the human species.

This extinction would result from...

It's still not like the opening of a Richard Dawkins book, but it's not supposed to be like a Richard Dawkins book.

Replies from: James_Miller

↑ comment by James_Miller · 2011-02-14T00:00:57.862Z · LW(p) · GW(p)

Better, but how about this:

"Philosophy's pathetic pace could kill us."

Replies from: NihilCredo

↑ comment by NihilCredo · 2011-02-14T12:21:02.271Z · LW(p) · GW(p)

If his target audience is academia, then drastic claims (whether substantiated or not) are going to be an active turnoff, and should only be employed when absolutely necessary.

comment by Perplexed · 2011-02-13T13:44:52.080Z · LW(p) · GW(p)

Bibliographic references are provided here.

I notice some of the references you suggest are available as online resources. It would be a courtesy if you provided links.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:24:59.240Z · LW(p) · GW(p)

Done.

comment by CharlesR · 2011-02-13T16:32:45.283Z · LW(p) · GW(p)

"This extinction would result from a “technological singularity” in which an artificial intelligence (AI) . . . "

By this point, you've talked about airplanes, Apollo, science good, philosophy bad. Then you introduce the concepts of existential risk, claim we are at the cusp of an extinction level event, and the end of the world is going to come from . . . Skynet.

And we're only to paragraph four.

These are complex ideas. Your readers need time to digest them. Slow down.

You may also want to think about coming at this from another direction. If the goal is to convince your readers AI is dangerous, maybe you should introduce the concept of AI first. Then explain why their dangerous. Use an example that everyone knows about and build on that. You need to establish rapport with your readers before you try to get them to accept strange ideas. (For example, it is common knowledge computers are better at chess than humans.)

Finally, is your goal to get published? Nonfiction is usually written on spec. Some (many, all?) publishers are wary of buying anything that has already appeared on the internet. Just a few things to keep in mind.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:24:47.973Z · LW(p) · GW(p)

This is a difference between popular writing and academic writing. Academic writing begins with an abstract - a summary of your position and what you argue, without any explanation of the concepts involved or arguments for your conclusions. Only then do you proceed to explanation and argument.

As for publishing, that is less important than getting it written, and getting it written well. That said, the final copy will be quite a bit different than the draft sections posted here. My copy of this opening is already quite a bit different than what you see above.

Replies from: CharlesR

↑ comment by CharlesR · 2011-02-14T03:11:20.914Z · LW(p) · GW(p)

Clearly, I and others thought you were writing a popular book. No need to "school" us on the difference.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-14T03:50:22.323Z · LW(p) · GW(p)

Okay.

It wasn't clear to me that you thought I was writing a popular book, since I denied that in my second paragraph (before the quoted passage from the book).

Replies from: CharlesR

↑ comment by CharlesR · 2011-02-14T15:50:22.076Z · LW(p) · GW(p)

Your clarification wasn't in the original version of the preamble that I read. Or are you claiming that you haven't edited it? Because I clearly remember a different sentence structure.

However, I am willing to admit my memory is faulty on this.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-14T20:36:45.041Z · LW(p) · GW(p)

CharlesR,

My original clarification said that it was a cross between academic writing and mainstream writing, the result being something like 'Epistemology and the Psychology of Human Judgment.' That apparently wasn't clear enough, so I did indeed change my preamble recently to be clearer in its denial of popular style. Sorry if that didn't come through in the first round.

Replies from: CharlesR

↑ comment by CharlesR · 2011-02-14T22:36:09.830Z · LW(p) · GW(p)

And people wonder how wars get started . . .

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-14T23:31:34.600Z · LW(p) · GW(p)

Heh. Sorry; I didn't mean to offend. I thought it was clear from my original preamble that this wasn't a popular-level work, but apparently not!

comment by [deleted] · 2011-02-13T14:39:55.669Z · LW(p) · GW(p)

I'm glad you're writing a book!

comment by Unnamed · 2011-02-13T18:40:56.606Z · LW(p) · GW(p)

I tried reading this through the eyes of someone who wasn't familiar with the singularity & LW ideas, and you lost me with the fourth paragraph ("This extinction..."). Paragraph 3 makes the extremely bold claim that humanity could face its extinction soon unless we solve some longstanding philosophical problems. When someone says something outrageous-sounding like that, they have a short window to get me to see how their claim could be plausible and is worth at least considering as a hypothesis, otherwise it gets classified as ridiculous nonsense. You missed that chance, and instead went with a dense paragraph filled with jargon (which is too inferentially distant to add plausibility) and more far-fetched claims (which further activate my bullshit detector).

What I'd like to see instead is a few paragraphs sketching out the argument in a way that's as simple, understandable, and jargon-free as possible. First why to expect an intelligence explosion (computers getting better and more domain general, what happens when they can do computer science?), then why the superintelligences could determine the fate of the planet (humans took over the planet once we got smart enough, what happens when the computers are way smarter than us?), then what this has to do with philosophy (philosophical rules about how to behave aren't essential for humans to get along with each other since we have genes, socialization, and interdependence due to limited power, but these computers won't have that so the way to behave will need to be programmed in).

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:27:51.739Z · LW(p) · GW(p)

This is a difference between popular writing and academic writing. The opening is my abstract. See here.

Replies from: Unnamed

↑ comment by Unnamed · 2011-02-13T21:02:33.373Z · LW(p) · GW(p)

The problem that I described in my first paragraph is there regardless of how popular or academic a style you're aiming for. The bold, attention-grabbing claims about extinction/utopia/the fate of the world are a turnoff, and they actually seem more out of place for academic writing than for popular writing.

If you don't want to spend more time elaborating on your argument in order to make the bold claims sound plausible, you could just get rid of those bold claims. Maybe you could include one mention of the high stakes in your abstract, as part of the teaser of the argument to come, rather than vividly describing the high stakes before and after the abstract as a way to shout out "hey this is really important!"

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T21:27:27.467Z · LW(p) · GW(p)

Thanks for your comment, but I'm going with a different style. This kind of opening is actually quite common in Anglophone philosophy, as the quickest route to tenure is to make really bold claims and then come up with ingenius ways of defending them.

I know that Less Wrong can be somewhat averse to the style of contemporary Anglophone philosophy, but that will not dissuade me from using it. To drive home the point that my style here is common in Anglophone philosophy (I'm avoiding calling it analytic philosophy), here a few examples...

The opening paragraphs of David Lewis' On the Plurality of Worlds, in which he defends a radical view known as modal realism, that all possible worlds actually exist:

This book defends modal realism: the thesis that the world we are part of is but one of a plurality of worlds, and that we who inhabit this world are only a few out of all the inhabitants of all the worlds.

I begin the first chapter by reviewing the many ways in which systematic philosophy goes more easily if we may presuppose modal realism...

In the second chapter, I reply to numerous objections...

In the third chapter, I consider the prospect that a more credible ontology might yield the same benefits...

Opening paragraph (abstract) of Neil Sinhababu's "Possible Girls" for the Pacific Philosophical Quarterly:

I argue that if David Lewis’ modal realism is true, modal realists from different possible worlds can fall in love with each other. I offer a method for uniquely picking out possible people who are in love with us and not with our counterparts. Impossible lovers and trans-world love letters are considered. Anticipating objections, I argue that we can stand in the right kinds of relations to merely possible people to be in love with them and that ending a transworld relationship to start a relationship with an actual person isn’t cruel to one’s otherworldly lover.

Opening paragraph of Peter Klein's "Human Knowledge and the Infinite Regress of Reasons" for Philosophical Perspectives:

The purpose of this paper is to ask you to consider an account of justification that has largely been ignored in epistemology. When it has been considered, it has usually been dismissed as so obviously wrong that arguments against it are not necessary. The view that I ask you to consider can be called "Infinitism." Its central thesis is that the structure of justificatory reasons is infinite and non-repeating. My primary reason for recommending infinitism is that it can provide an acceptable account of rational beliefs, i.e., beliefs held on the basis of adequate reasons, while the two alternative views, foundationalism and coherentism, cannot provide such an account.

And, the opening paragraph of Steven Maitzen's paper arguing that a classical theistic argument actually proves atheism:

Chapter 15 of Anselm's Prosblogion contains the germ of an argument that confronts theology with a serious trilemma: atheism, utter mysticism, or radical anti-Anselmianism. The argument establishes a disjunction of claims that Anselmians in particular, but not only they, will find disturbing: (a) God does not exist, (b) no human being can have even the slightest conception of God, or (c) the Anselmian requirement of maximal greatness in God is wrong. Since, for reasons I give below, (b) and (c) are surely false, I regard the argument as establishing atheism.

And those are just the first four works that came to mind. This kind of abrupt opening is the style of Anglophone philosophy, and that's the style I'm using. Anyone who keeps up with Anglophone philosophy lives and breathes this style of writing every week.

Anglophone philosophy is not written for people who are casually browsing for interesting things to read. It is written for academics who have hundreds and hundreds of papers and books we might need to read, and we need to know right away in the opening lines whether or not a particular book or paper addresses the problems we are researching.

comment by JohnD · 2011-02-13T11:26:13.536Z · LW(p) · GW(p)

There's not much to critically engage with yet, but...

I find it odd that you claim to have "laid [your] positions on the table" in the first half of this piece. As far as I can make out, the first half only describes a set of problems and possibilities arising from the "intelligence explosion". It doesn't say anything about your response or proposed solution to those problems.

comment by XiXiDu · 2011-02-13T11:12:40.185Z · LW(p) · GW(p)

I haven't read all of the recent comments. Have you made progress yet on understanding Yudkowsky's meta-ethics sequence? I hope you let us know if you do (via a top-level post). It seems a bit weird to write a book on it if you don't either understand it yet or haven't disregarded understanding it for the purpose of your book.

Anyway, I appreciate your efforts very much and think that the book will be highly valuable either way.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T16:26:58.969Z · LW(p) · GW(p)

For now, see here, though my presentation of Yudkowsky's views in the book will be longer and clearer.

comment by XiXiDu · 2011-02-13T11:37:15.642Z · LW(p) · GW(p)

But even if we can solve those problems, the question of which goal system to give the superintelligence remains. It is a question of philosophy; it is a question of ethics.

Isn't it an interdisciplinary question, also involving decision theory, game theory and evolutionary psychology etc.? Maybe it is mainly a question about philosophy of ethics, but not solely?

comment by XiXiDu · 2011-02-13T11:33:57.606Z · LW(p) · GW(p)

...and a solar system in which all available matter is converted into parts for a planet-sized computer built to solve difficult mathematical problems.

This sentence isn't very clear. People who don't know about the topic will think, "to create an utopia you also have to solve difficult mathematical problems."

This project could mean the difference between a utopian solar system of unprecedented harmony and happiness, and a solar system void of human values in which all available matter is being used to to pursue a set of narrow goals.

comment by XiXiDu · 2011-02-13T11:22:38.126Z · LW(p) · GW(p)

The Wright Brothers flew their spruce-wood plane for 200 feet in 1903. Only 66 years later, Neil Armstrong walked on the moon, more than 240,000 miles from Earth.

I'm not sure if there is a real connection here? Has any research on "flight machines" converged with rocket science? They seem not to be correlated very much or the correlation is not obvious. Do you think it might be good to advance on that point or rephrase it to show that there has been some kind of intellectual or economic speedup that caused the quick development of various technologies?

Replies from: timtyler

↑ comment by timtyler · 2011-02-13T12:04:48.630Z · LW(p) · GW(p)

The connection is - presumably - powered flight.

comment by Daniel_Burfoot · 2011-02-13T17:00:26.164Z · LW(p) · GW(p)

I'll offer you a trade: an extensive and in-depth analysis of your book in return for an equivalent analysis of my book.

Quick note: I think explicit metadiscourse like "In Chapter 7 I argue that..." is ugly. Instead, try to fold those kinds of organizational notes into the flow of the text or argument. So write something like "But C.E.V. has some potential problems, as noted in Chapter 7, such as..." Or just throw away metadiscourse altogether.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T17:07:42.284Z · LW(p) · GW(p)

What is your book?

Replies from: Daniel_Burfoot

↑ comment by Daniel_Burfoot · 2011-02-13T18:01:53.434Z · LW(p) · GW(p)

It's about the philosophy of science, machine learning, computer vision, computational linguistics, and (indirectly) artificial intelligence. It should be interesting/relevant to you, even if you don't buy the argument.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:04:21.336Z · LW(p) · GW(p)

Sorry, outside my expertise. In this book I'm staying away from technical implementation problems and sticking close to meta-ethics.

comment by lukeprog · 2011-02-13T16:32:49.120Z · LW(p) · GW(p)

Thanks, everyone. I agree with almost every point here and have updated my own copy accordingly. I especially look forward to your comments when I have something meaty to say.

comment by XiXiDu · 2011-02-13T11:41:18.499Z · LW(p) · GW(p)

In this chapter I argue that the technological singularity is likely to occur within the next 200 years...

If it takes 200 years it could as well take 2000. I'm skeptical that if it doesn't occur this century it will occur next century for sure. If it doesn't occur this century then that might as well mean that it won't occur any time soon afterwards either.

Replies from: Normal_Anomaly

↑ comment by Normal_Anomaly · 2011-02-13T16:44:00.704Z · LW(p) · GW(p)

I have a similar feeling. If it hasn't happened within a century, I'll probably think (assume for the sake of argument I'm still around) that it will be in millenia or never.

Replies from: lukeprog

↑ comment by lukeprog · 2011-02-13T19:26:17.264Z · LW(p) · GW(p)

200 years is my 'outer bound.' It may very well happen much sooner, for example in 45 years.

BOOK DRAFT: 'Ethics and Superintelligence' (part 1)

Contents

Chapter 1: The technological singularity is coming soon.

112 comments