What I Think, If Not Why

eliezer_yudkowsky

What I Think, If Not Why

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T17:41:43.000Z · LW · GW · Legacy · 103 comments

103 comments

Reply to: Two Visions Of Heritage

Though it really goes tremendously against my grain - it feels like sticking my neck out over a cliff (or something) - I guess I have no choice here but to try and make a list of just my positions, without justifying them. We can only talk justification, I guess, after we get straight what my positions are. I will also leave off many disclaimers to present the points compactly enough to be remembered.

• A well-designed mind should be much more efficient than a human, capable of doing more with less sensory data and fewer computing operations. It is not infinitely efficient and does not use zero data. But it does use little enough that local pipelines such as a small pool of programmer-teachers and, later, a huge pool of e-data, are sufficient.

• An AI that reaches a certain point in its own development becomes able to (sustainably, strongly) improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability. This point is at, or probably considerably before, a minimally transhuman mind capable of writing its own AI-theory textbooks - an upper bound beyond which it could swallow and improve its entire design chain.

• It is likely that this capability increase or "FOOM" has an intrinsic maximum velocity that a human would regard as "fast" if it happens at all. A human week is ~1e15 serial operations for a population of 2GHz cores, and a century is ~1e19 serial operations; this whole range is a narrow window. However, the core argument does not require one-week speed and a FOOM that takes two years (~1e17 serial ops) will still carry the weight of the argument.

• The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).

• The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.

• The good guys do not write an AI which values a bag of things that the programmers think are good ideas, like libertarianism or socialism or making people happy or whatever. There were multiple Overcoming Bias sequences about this one point, like the Fake Utility Function sequence and the sequence on metaethics. It is dealt with at length in the document Coherent *Extrapolated* Volition. It is the first thing, the last thing, and the middle thing that I say about Friendly AI. I have said it over and over. I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI.

• The good guys do not directly impress their personal values onto a Friendly AI.

• Actually setting up a Friendly AI's values is an extremely meta operation, less "make the AI want to make people happy" and more like "superpose the possible reflective equilibria of the whole human species, and output new code that overwrites the current AI and has the most coherent support within that superposition". This actually seems to be something of a Pons Asinorum in FAI - the ability to understand and endorse metaethical concepts that do not directly sound like amazing wonderful happy ideas. Describing this as declaring total war on the rest of humanity, does not seem fair (or accurate).

• I myself am strongly individualistic: The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf. It is also a known principle of hedonic psychology that people are happier when they're steering their own lives and doing their own interesting work. When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background, silent as the laws of Nature once were; and finally folding up and vanishing when it is no longer needed. But this is only the thought of my mind that is merely human, and I am barred from programming any such consideration directly into a Friendly AI, for the reasons given above.

• Nonetheless, it does seem to me that this particular scenario could not be justly described as "a God to rule over us all", unless the current fact that humans age and die is "a malevolent God to rule us all". So either Robin has a very different idea about what human reflective equilibrium values are likely to look like; or Robin believes that the Friendly AI project is bound to fail in such way as to create a paternalistic God; or - and this seems more likely to me - Robin didn't read all the way through all the blog posts in which I tried to explain all the ways that this is not how Friendly AI works.

• Friendly AI is technically difficult and requires an extra-ordinary effort on multiple levels. English sentences like "make people happy" cannot describe the values of a Friendly AI. Testing is not sufficient to guarantee that values have been successfully transmitted.

• White-hat AI researchers are distinguished by the degree to which they understand that a single misstep could be fatal, and can discriminate strong and weak assurances. Good intentions are not only common, they're cheap. The story isn't about good versus evil, it's about people trying to do the impossible versus others who... aren't.

• Intelligence is about being able to learn lots of things, not about knowing lots of things. Intelligence is especially not about tape-recording lots of parsed English sentences a la Cyc. Old AI work was poorly focused due to inability to introspectively see the first and higher derivatives of knowledge; human beings have an easier time reciting sentences than reciting their ability to learn.

• Intelligence is mostly about architecture, or "knowledge" along the lines of knowing to look for causal structure (Bayes-net type stuff) in the environment; this kind of knowledge will usually be expressed procedurally as well as declaratively. Architecture is mostly about deep insights. This point has not yet been addressed (much) on Overcoming Bias, but Bayes nets can be considered as an archetypal example of "architecture" and "deep insight". Also, ask yourself how lawful intelligence seemed to you before you started reading this blog, how lawful it seems to you now, then extrapolate outward from that.

103 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

comment by Robin_Hanson2 · 2008-12-11T18:55:10.000Z · LW(p) · GW(p)

I understand there are various levels on which one can express one's loves. One can love Suzy, or kind pretty funny women, or the woman selected by a panel of judges, or the the one selected by a judging process designed by a certain AI strategy, etc. But even very meta loves are loves. You want an AI that loves the choices made by a certain meta process that considers the wants of many, and that may well be a superior love. But it is still a love, your love, and the love you want to give the AI. You might think the world should be grateful to be placed under the control of such a superior love, but many of them will not see it that way; they will see your attempt to create an AI to take over the world as an act of war against them.

comment by AGI_Researcher · 2008-12-11T18:57:48.000Z · LW(p) · GW(p)

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

Replies from: ciphergoth

↑ comment by Paul Crowley (ciphergoth) · 2010-10-30T16:48:00.648Z · LW(p) · GW(p)

"IN CASE OF UNFRIENDLY AI, IT IS TOO LATE TO BREAK GLASS"

comment by Aron · 2008-12-11T18:58:40.000Z · LW(p) · GW(p)

And I believe that if two very smart people manage to agree on where to go for lunch they have accomplished a lot for one day.

Replies from: VAuroch

↑ comment by VAuroch · 2013-12-26T23:00:31.151Z · LW(p) · GW(p)

There is a pretty good method for this specific thing; where I saw it mentioned, it was called the Restaurant Veto Game. It goes like this: Take a group of people, and have any one of them suggest a lunch location. Anyone else may veto this, if they can propose a different lunch location, not yet mentioned. A location which goes unvetoed is a good-enough compromise, if the players are reasonably rational and understand the strategy of the game.

comment by AGI_Researcher · 2008-12-11T19:06:20.000Z · LW(p) · GW(p)

"I am sure if I was running an FAI project that was excessively well funded, it would be worth buying EY to put in a glass case in the break room."

To clear up any confusion about the meaning of this statement, I do agree with pretty much everything here, and I do agree that FAI is critically important.

That doesn't change the fact that I think EY isn't being very useful ATM.

comment by Aaron5 · 2008-12-11T19:12:29.000Z · LW(p) · GW(p)

I'm just trying to get the problem you're presenting. Is it that in the event of a foom, a self-improving AI always presents a threat of having its values drift far enough away from humanity's that it will endanger the human race? And your goal is to create the set of values that allow for both self-improvement and friendliness? And to do this, you must not only create the AI architecture but influence the greater system of AI creation as well? I'm not involved in AI research in any capacity, I just want to see if I understand the fundamentals of what you're discussing.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T19:20:00.000Z · LW(p) · GW(p)

Robin, using the word "love" sounds to me distinctly like something intended to evoke object-level valuation. "Love" is an archetype of direct valuation, not an archetype of metaethics.

And I'm not so much of a mutant that, rather than liking cookies, I like everyone having their reflective equilibria implemented. Taking that step is the substance of my attempt to be fair. In the same way that someone voluntarily splitting up a pie into three shares, is not on the same moral level as someone who seizes the whole pie for themselves - even if, by volunteering to do the fair thing rather than some other thing, they have shown themselves to value fairness.

My take on this was given in The Bedrock of Fairness.

But you might as well say "George Washington gave in to his desire to be a tyrant; he was just a tyrant who wanted democracy." Or "Martin Luther King declared total war on the rest of the US, since what he wanted was a nonviolent resolution."

Similarly with "I choose not to control you" being a form of controlling.

comment by Vladimir_Golovin · 2008-12-11T19:27:20.000Z · LW(p) · GW(p)

AGI Researcher: "... I do agree that FAI is critically important." "... EY isn't being very useful ATM."

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

comment by fnordfnordfnordbayesfnord · 2008-12-11T19:39:34.000Z · LW(p) · GW(p)

"Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?"

That poster is taking bits of an IM conversation out of context and then paraphrasing them. Sadly any expectation of logical consistency has to be considered unwarranted optimism.

comment by AGI_Researcher · 2008-12-11T19:39:46.000Z · LW(p) · GW(p)

Isn't this a contradiction, given that EY is one of the few people who publicly promote the idea of unfriendly AIs being fatal?

All that stuff was the party line back in 2004.

There has been no /visible/ progress since then.

comment by Alexandros · 2008-12-11T19:40:10.000Z · LW(p) · GW(p)

Slightly off the main topic but nearer to Robin's response:

Eliezer, how do we know that human good-ness scales? How do we know that, even if corectly implemented, applying it to a near-infinitely capable entity won't yield something equally monstrous as a paperclipper? Perhaps our sense of good-ness is meaningful only at or near our current level of capability?

comment by TGGP4 · 2008-12-11T19:49:56.000Z · LW(p) · GW(p)

There is nothing oxymoronic about calling democracy "the tyranny of the majority". And George Washington himself was decisive in both the violent war of secession called a "revolution" that created a new Confederate government and the unlawful replacement of the Articles of Confederation with the Constitution, after which he personally crushed the Whiskey Rebellion of farmers resisting the national debt payments saddled upon them by this new government. Even MLK has been characterized as implicitly threatening more riots if his demands were not met (in that respect he followed Gandhi, who actually justified violence on the basis of nationalism though this is not as well remembered). Eliezer is mashing applause lights.

comment by Vladimir_Golovin · 2008-12-11T19:52:40.000Z · LW(p) · GW(p)

AGI Researcher: "There has been no /visible/ progress since [2004]."

What would you consider /visible/ progress? Running code?

Also, how about this: "Overcoming Bias presently gets over a quarter-million monthly pageviews"?

comment by Robin_Hanson2 · 2008-12-11T19:59:57.000Z · LW(p) · GW(p)

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

comment by Aron · 2008-12-11T20:29:39.000Z · LW(p) · GW(p)

"In a foom that took two years.."

The people of the future will be in a considerably better position than you to evaluate their immediate future. More importantly, they are in a position to modify their future based on that knowledge. This anticipatory reaction is what makes both of your opinions exceedingly tenuous. Everyone else who embarks on pinning down the future at least has the sense to sell books.

In the light of this, the goal should be to use each other's complementary talents to find the hardest rock solid platform not to sell the other a castle made of sand.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T20:41:59.000Z · LW(p) · GW(p)

Robin, we're still talking about a local foom. Keeping security for two years may be difficult but is hardly unheard-of.

Replies from: Perplexed

↑ comment by Perplexed · 2010-09-09T01:36:14.926Z · LW(p) · GW(p)

And what do you do when an insider says, "If you don't change the CEV programming to include X, then I am going public!" How do you handle that? How many people is it that you expect to remain quiet for two years?

Replies from: ata, timtyler, wallowinmaya

↑ comment by ata · 2010-09-09T01:57:48.263Z · LW(p) · GW(p)

I suppose the only people who will get to the point of being "insiders" will be a subset of the people who are trustworthy and sane and smart and non-evil enough not to try something like that.

Replies from: Perplexed

↑ comment by Perplexed · 2010-09-09T02:30:49.415Z · LW(p) · GW(p)

Ah, so no insider has ever walked off in a huff? No insider has ever said he would refuse to participate further if something he felt strongly about wasn't done? Look at A3 here.

The SIAI must use some pretty remarkable personality tests to choose their personnel.

Replies from: ata

↑ comment by ata · 2010-09-09T02:47:26.211Z · LW(p) · GW(p)

Are we using the same definition of "insider"? I was talking about people who are inside the FAI project and have privileged knowledge of its status and possibly access to the detailed theory and its source code, etc. I don't get the relevance of your links.

Replies from: Perplexed

↑ comment by Perplexed · 2010-09-09T03:05:09.286Z · LW(p) · GW(p)

My links mentioned three persons. Obviously at this point, Robin and Roko are not going to become insiders in an FAI construction project. If you could assure me that the third linked person will not be an insider either, it would relieve a lot of my worries.

The relevance of my links was to point out that when intelligent people with strong opinions get involved together in important projects with the future of mankind at stake, keeping everyone happy and focused on the goal may be difficult. Especially since the goal has not yet been spelled out and no one seems to want to work on clarifying the goal since it is apparently so damned disruptive to even talk about it.

Documents dated 2004 and labeled "already obsolete when written" for Gods sake!

↑ comment by timtyler · 2010-09-09T08:24:15.331Z · LW(p) · GW(p)

For some reasonably-successful corporate secrecy, perhaps look to Apple. They use NDAs, need-to-know principles, and other techniques - and they are usually fairly successful at keeping their secrets. Some of the apparent leaks are probably PR exercises.

Or, show me Google's source code - or the source code of any reasonable-size hedge fund. Secrecy seems fairly managable, in practice.

Replies from: Baughn

↑ comment by Baughn · 2012-12-30T22:19:18.884Z · LW(p) · GW(p)

Google leaks like a sieve, actually, but that should be because of the sheer number of employees.

It's true that there have been no source-code leaks (to my knowledge), but that could just as likely be because of the immense expected consequences of getting caught at leaking any, and you would probably get caught.

Replies from: Decius

↑ comment by Decius · 2012-12-30T23:24:16.853Z · LW(p) · GW(p)

I think that a programmer who cared enough about CEV to be a secret-keeper would also care enough about getting CEV right to kill in order to prevent it from being done wrong. The public need not be involved at all.

Replies from: Baughn

↑ comment by Baughn · 2013-01-01T16:29:48.222Z · LW(p) · GW(p)

Agreed, in principle, but I'm not sure that such people would make very good teammates.

(Implying that AGI is more likely to be developed by people who don't care that much.)

Replies from: Decius

↑ comment by Decius · 2013-01-01T21:03:15.982Z · LW(p) · GW(p)

Is a good teammate one who has the social skills to make everybody happy when they are doing something they don't want to, or someone who thinks that the team's task is so important that they will do anything to get it done?

Are major breakthroughs which require a lot of work more likely to be done by people who don't care, or by people that do?

Replies from: Baughn

↑ comment by Baughn · 2013-01-04T12:16:26.539Z · LW(p) · GW(p)

That's not my point, which is simply this:

A good teammate is probably not one who's willing to kill you if you make the wrong move, and who -- being human -- may misinterpret your actions.

Replies from: Decius

↑ comment by Decius · 2013-01-04T21:22:45.232Z · LW(p) · GW(p)

If there is no move you could make which would result in your teammate trying to kill you, then you have a different problem.

↑ comment by David Althaus (wallowinmaya) · 2011-05-27T09:07:35.560Z · LW(p) · GW(p)

Do you really think the public would be interested in the opinions of a programmer who claims that some guys in a basement are building a superintelligent machine? Most would regard him as a crack-pot, just like most people think that Eliezer and his ideas are crazy. Perhaps in 20-30 years this will change, and the problem of FAI will be recognized as tremendously important by political leaders and the general public, but I'm skeptical.

ETA: I meant of course, that most people, if they knew Eliezer would think he is crazy.

Replies from: wedrifid

↑ comment by wedrifid · 2011-05-27T11:06:44.053Z · LW(p) · GW(p)

just like most people think that Eliezer and his ideas are crazy.

He hasn't reached that level yet. Most people just don't know or care wtf Eliezer is! ;)

comment by Tim_Tyler · 2008-12-11T20:44:59.000Z · LW(p) · GW(p)

An AI that reaches a certain point in its own development becomes able to improve itself. At this point, recursive cascades slam over many internal growth curves to near the limits of their current hardware, and the AI undergoes a vast increase in capability.

This seems like the first problem I detected. An intelligence being able to improve itself does not necessarily lead to a recursive cascade of self-improvement - since it may only be able to improve some parts of itself - and it's quite possible that after it has done those improvements, it can't do any more.

Say that machine intelligence learns how to optimise FOR loops, eliminatining unnecessary conditions, etc. Presto, it can optimise its entire codebase - and thus improve itself. However, that doesn't lead to a self-improving recursive cascade - because it only improved itself in one way, and that was a rather limited way. Of course this kind of improvement has been going on for decades - via lint tools and automatic refactoring.

As machines get smarter, they will gradually become able to improve more and more of themselves. Yes, eventually machines will be able to cut humans out of the loop - but before that there will have been much automated improvement of machines by machines - and after that there may still be human code reviews.

This is not the first time I have made this point here. It does not seem especially hard to understand to me - but yet the conversation sails gaily onwards, with no coherent criticism, and no sign of people updating their views: it feels like talking to a wall.

Replies from: Houshalter

↑ comment by Houshalter · 2013-09-30T04:23:49.225Z · LW(p) · GW(p)

In order to learn how to optimize FOR loops it would have to be pretty intelligent and have general learning ability. So it wouldn't just stop after learning that, it would go on to learn more things at increased speed. Learning the first optimization would let it learn more optimizations even faster than it otherwise would have. The second optimization it makes helps it learn the third even faster and so on.

It's not clear to me how fast this process would be. Just because it learns the next optimization even faster than it otherwise would have taken, doesn't mean it wouldn't have taken a long time to begin with. It could take years for it to improve to super-human abilities, or it could take days. It depends on stuff like how long it takes the average optimization it learns to pay back the time it took to research it. As well as the distribution of optimizations; maybe after learning the first few they get progressively more difficult to discover and give less and less value in return.

It seems to my intuition that this process would be very fast and get very far before hitting limits, though I can't prove that. But I would point to other exponential processes to compare it to like compound interest.

comment by Jef_Allbright · 2008-12-11T20:46:40.000Z · LW(p) · GW(p)

Ironic, such passion directed toward bringing about a desirable singularity, rooted in an impenetrable singularity of faith in X. X yet to be defined, but believed to be [meaningful|definable|implementable] independent of future context.

It would be nice to see an essay attempting to explain an information or systems-theoretic basis supporting such an apparent contradiction (definition independent of context.)

Or, if the one is arguing for a (meta)invariant under a stable future context, an essay on the extended implications of such stability, if the one would attempt to make sense of "stability, extended."

Or, a further essay on the wisdom of ishoukenmei, distinguishing between the standard meaning of giving one's all within a given context, and your adopted meaning of giving one's all within an unknowable context.

Eliezer, I recall that as a child you used to play with infinities. You know better now.

comment by JamesAndrix · 2008-12-11T20:48:02.000Z · LW(p) · GW(p)

In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it.

But its clearly the best search engine available. And here I am making an argument for peace via economics!

If it's doing anything visible, its probably doing something at least some people want.

comment by bambi · 2008-12-11T20:49:03.000Z · LW(p) · GW(p)

Regarding the 2004 comment, AGI Researcher probably was referring to the Coherent Extrapolated Volition document which was marked by Eliezer as slightly obsolete in 2004, and not a word since about any progress in the theory of Friendliness.

Robin, if you grant that a "hard takeoff" is possible, that leads to the conclusion that it will eventually be likely (humans being curious and inventive creatures). This AI would "rule the world" in the sense of having the power to do what it wants. Now, suppose you get to pick what it wants (and program that in). What would you pick? I can see arguing with the feasibility of hard takeoff (I don't buy it myself), but if you accept that step, Eliezer's intentions seem correct.

comment by bambi · 2008-12-11T20:57:10.000Z · LW(p) · GW(p)

Oh, and Friendliness theory (to the extent it can be separated from specific AI architecture details) is like the doomsday device in Dr. Strangelove: it doesn't do any good if you keep it secret! [in this case, unless Eliezer is supremely confident of programming AI himself first]

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T21:06:50.000Z · LW(p) · GW(p)

@Tim Re: FOR loops - I made that exact point explicitly when introducing the concept of "recursion" via talking about self-optimizing compilers.

Talk about no progress in the conversation. I begin to think that this whole theory is simply too large to be communicated to casual students. Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too.

comment by luzr · 2008-12-11T21:10:41.000Z · LW(p) · GW(p)

"FOOM that takes two years"

In addition to comments by Robin and Aron, I would also pointed out the possibility that longer the FOOM takes, larger the chance it is not local, regardless of security - somewhere else, there might be another FOOMing AI.

Now as I understand, some consider this situation even more dangerous, but it as well might create "take over" defence.

Another comment to FOOM scenario and this is sort of addition to Tim's post:

"As machines get smarter, they will gradually become able to improve more and more of themselves. Yes, eventually machines will be able to cut humans out of the loop - but before that there will have been much automated improvement of machines by machines - and after that there may still be human code reviews."

Eliezer seems to spend a lot of time explaining what happens when "k > 1" - when AI intelligence surpases human and starts selfimproving. But I suspect that the phase 0.3 < k < 1 might be pretty long, maybe decades.

Also, moreover, by the time of FOOM, we should be able to use vast amounts of fast 'subcritical' AIs (+ weak AIs) as guardians of process. In fact, by that time, k < 1 AIs might play a pretty important role in world economy and security by that time and it does not take too much pattern recognition power to keep things at bay. (Well, in fact, I believe Eliezer proposes something similar in his thesis, except for locality issue).

comment by luzr · 2008-12-11T21:26:56.000Z · LW(p) · GW(p)

Eliezer:

"Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own."

Why do you think it is crushing objection? I believe Tim just repeats his favorite theme (which, in fact, I tend to agree with) where machine augmented humans build better machines. If you can use automated refactoring to improve the way compiler works (and today, you often can), that is in fact pretty cool augmentation of human capabilities. It is recursive FOOM. The only difference of your vision and his is that as long as k < 1 (and perhaps some time after that point), humans are important FOOM agents. Also, humans are getting much more capable in the process. For example, machine augmented human (think weak AI + direct neural interface and all that cyborging whistles + mind drugs) might be quite likely to follow the FOOM.

comment by Jason_Joachim · 2008-12-11T21:33:35.000Z · LW(p) · GW(p)

Robin says "You might think the world should be grateful to be placed under the control of such a superior love, but many of them will not see it that way; they will see your attempt to create an AI to take over the world as an act of war against them."

Robin, do you see that CEV was created (AFAICT) to address that very possibility? That too many, feeling this too strongly, means the AI self detructs or somesuch.

I like that someone challenged you to create your own unoffensive FAI/CEV, I hope you'll respond to that. Perhaps you believe that there simply isn't any possible fully global wish, however subtle or benign, that wouldn't also be tantamount to a declaration of war...?

comment by Tim_Tyler · 2008-12-11T21:45:29.000Z · LW(p) · GW(p)

Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own.

It does not seem very likely that I am copying you - when my essay on this subject dates from February 3rd, while yours apparently dates from November 25th.

So what exactly is the counter-argument you were attempting to make?

That self-optimising compilers lack "insight" - and "insight" is some kind of boolean substance that you either have or you lack?

In my view, machines gradually accumulate understanding of themselves - and how to modify themselves. There is a long history of automated refactoring - which seems to me to clearly demonstrate that "insight" within machines into how to modify computer code comes in a vast number of little pieces, which are gradually being assembled over the decades into ever more impressive refactoring tools. I have worked on refactoring tools myself - and I see no hint of sudden gains in capability in this area - rather progress is made in thousands, or even millions of tiny steps.

Replies from: Kenny

↑ comment by Kenny · 2013-07-24T21:22:14.644Z · LW(p) · GW(p)

But the machines themselves are not writing the code for any of these millions of tiny steps. If they were, and if they were able to do so faster than humans, their self-improvement would be different than what you're describing.

comment by Carl_Shulman · 2008-12-11T21:51:29.000Z · LW(p) · GW(p)

" I can see arguing with the feasibility of hard takeoff (I don't buy it myself), but if you accept that step, Eliezer's intentions seem correct."

Bambi,

Robin has already said just that. I think Eliezer is right that this is a large discussion, and when many of the commenters haven't carefully followed it, comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T22:07:12.000Z · LW(p) · GW(p)

Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented.

comment by Roko · 2008-12-11T22:07:59.000Z · LW(p) · GW(p)

"comments bringing up points that have already been explicitly addressed will take up a larger and larger share of the comment pool."

how about using something like debatepedia?

http://wiki.idebate.org/

comment by Will_Pearson · 2008-12-11T22:08:13.000Z · LW(p) · GW(p)

There are some types of knowledge that seem hard to come by (especially for singletons). The type of knowledge is knowing what destroys you. As all knowledge is just an imperfect map, there are some things a priori that you need to know to avoid. The archetypal example is in-built fear of snakes in humans/primates. If we hadn't had this while it was important we would have experimented with snakes the same way we experiment with stones/twigs etc and generally gotten ourselves killed. In a social system you can see what destroys other things like you, but the knowledge of what can kill you is still hard won.

If you don't have this type of knowledge you may step into an unsafe region, and it doesn't matter how much processing power or how much you correctly use your previous data. Examples that might threaten singletons:

1) Physics experiments, the model says you should be okay but you don't trust your model under these circumstances, which is the reason to do the experiment. 2) Self-change, your model says that the change will be better but the model is wrong. It disables the system to a state it can't recover from, i.e. not an obvious error but something that renders it ineffectual. 3) Physical self-change. Large scale unexpected effects from feedback loops at a different levels of analysis, e.g. things like the swinging/vibrating bridge problem, but deadly.

comment by Aron · 2008-12-11T22:09:32.000Z · LW(p) · GW(p)

It is true that the topic is too large for casual followers (such as myself). So rather than aiming at refining any of the points personally, I wonder in what ways Robin has convinced Eli, and vice-versa. Because certainly, if this were a productive debate, they would be able to describe how they are coming to consensus. And from my perspective there are distinct signals that the anticipation of a successful debate declines as posts become acknowledged for their quality as satire.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T22:11:40.000Z · LW(p) · GW(p)

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

comment by Venu · 2008-12-11T22:24:03.000Z · LW(p) · GW(p)

The default case of FOOM is an unFriendly AI Before this, we also have: "The default case of an AI is to not FOOM at all, even if it's self-modifying (like a self-optimizing compiler)." Why not anti-predict that no AIs will FOOM at all?

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

comment by Tim_Tyler · 2008-12-11T22:36:46.000Z · LW(p) · GW(p)

Huh? I never mentioned self-optimizing compilers, and you never mentioned FOR loops.

I usually view this particular issue in terms of refactoring - not compilation - since refactoring is more obviously a continuous iterative process operating on an evolving codebase: whereas you can't compile a compiled version of a program very many times.

Anyway, this just seems like an evasion of the point - and a digression into trivia.

If you have any kind of case to make that machines will suddenly develop the ability to reprogram and improve themselves all-at-once - with the histories of compilation, refactoring, code wizards and specification languages representing an irrelevant side issue - I'm sure I'm not the only one who would be interested to hear about it.

comment by luzr · 2008-12-11T22:39:34.000Z · LW(p) · GW(p)

Eliezer:

"Tim, your page doesn't say anything about FOR loops or self-optimizing compilers not being able to go a second round, which is the part you got from me and then thought you had invented."

Well, it certainly does:

"Today, machines already do a lot of programming. They perform refactoring tasks which would once have been delegated to junior programmers. They compile high-level languages into machine code, and generate programs from task specifications. They also also automatically detect programming errors, and automatically test existing programs."

I guess your claim is only a misunderstaning caused by not understaning CS terminology.

Find a new way how to optimize loops is application of automated refactoring and automated testing and benchmarking.

comment by Will_Pearson · 2008-12-11T22:42:22.000Z · LW(p) · GW(p)

Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere.

My point was not that non-singletons can see it coming. But if one non-singletons trys self-modification in a certain way and it doesn't work out then other non-singletons can learn from the mistake (or in worst the evolutionary case the descendents of people curious in a certain way would be out competed by those that instinctively didn't try the dangerous activity). Less so with the physics experiments, depending on dispersal of non-singletons, range of the physical destruction.

comment by bambi · 2008-12-11T22:45:41.000Z · LW(p) · GW(p)

Carl, Robin's response to this post was a critical comment about the proposed content of Eliezer's AI's motivational system. I assumed he had a reason for making the comment, my bad.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-11T22:46:12.000Z · LW(p) · GW(p)

Venu: Given the tiny minority of AIs that will FOOM at all, what is the probability that an AI which has been designed for a purpose other than FOOMing, will instead FOOM?

It seems to me like a pretty small probability that an AI not designed to self-improve will be the first AI that goes FOOM, when there are already many parties known to me who would like to deliberately cause such an event.

Why not anti-predict that no AIs will FOOM at all?

A reasonable question from the standpoint of antiprediction; here you would have to refer back to the articles on cascades, recursion, the article on hard takeoff, etcetera.

Re Tim's "suddenly develop the ability reprogram and improve themselves all-at-once" - the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI, not whether things can happen with zero resource or instantaneously. But the former position seems to be routinely distorted into the straw latter.

Replies from: adamisom

↑ comment by adamisom · 2012-12-30T19:27:22.187Z · LW(p) · GW(p)

It seems to me like a pretty small probability that an AI not designed to self-improve will be the first AI that goes FOOM, when there are already many parties known to me who would like to deliberately cause such an event.

I know this is four years old, but this seems like a damn good time to "shut up and multiply" (thanks for that thoughtmeme by the way).

comment by Tim_Tyler · 2008-12-11T22:49:27.000Z · LW(p) · GW(p)

For example, machine augmented human (think weak AI + direct neural interface and all that cyborging whistles + mind drugs) might be quite likely to follow the FOOM

It seems unlikely to me. For one thing, see my Against Cyborgs video/essay. For another, see my Intelligence Augmentation video/essay. The moral of the latter one in this context is that Intelligence Augmentation is probably best thought of as machine intelligence's close cousin and conspirator - not really some kind of alternative, something that will happen later on, or a means to keep humans involved somehow.

comment by luzr · 2008-12-11T22:50:15.000Z · LW(p) · GW(p)

Eliezer:

"Will, your example, good or bad, is universal over singletons, nonsingletons, any way of doing things anywhere."

I guess there is significant difference - for singleton, each mistake can be fatal (and not only for it).

I believe that this is the real part I dislike about the idea, except the part where singleton either cannot evolve or cannot stay singleton (because of speed of light vs locality issue).

comment by luzr · 2008-12-11T23:07:06.000Z · LW(p) · GW(p)

Tim:

Well, as off-topic recourse, I see only cited some engineering problems in your "Against Cyborgs" essay as contraargument. Anyway, let me to say that in my book:

"miniaturizing and refining cell phones, video displays, and other devices that feed our senses. A global-positioning-system brain implant to guide you to your destination would seem seductive only if you could not buy a miniature ear speaker to whisper you directions. Not only could you stow away this and other such gear when you wanted a break, you could upgrade without brain surgery."

is pretty much equivalent of what I had in mind with cyborging. Brain surgery is not the point. I guess it is even today pretty obvious that to read thoughts, you will not need any surgery at all. And if information is fed back into my glasses, that is OK with.

Still, the ability to just "think" the code (yep, I am a programmer), then see the whole procedure displayed before my eyes already refactored and tested (via weak AI augmentation) sound like nice productivity booster. In fact, I believe that if thinking code is easy, one, with the help of some nice programming language, could learn to use coding to solve much more problems in normal live situations, gradually building personal library of routines..... :)

comment by Tim_Tyler · 2008-12-11T23:13:04.000Z · LW(p) · GW(p)

the issue is whether something happens efficiently enough to be local or fast enough to accumulate advantage between the leading Friendly AI and the leading unFriendly AI

Uh, that's a totally different issue from the one I was discussing.

To recap: I was pointing out that machines have been writing code and improving themselves for decades - that refactoring and lint-like programs applying their own improvements to their own codebases has a long history in the community - dating back to the early days of Smalltalk. That progress in computer ability at self-improvement (via modification of your own codebase) is, in point of fact, a long, slow and gradual process that has been going on for decades so far - and thus is not really well conceived of as being something that will happen suddenly in the future - when computers attain "insight".

Also, I notice that you have "quietly" edited the original post - in an attempt to eliminate the very point I was originally criticising. This rather makes it look as though I was misquoting you. Then you accuse me of attacking a straw man - after this clumsy attempt to conceal the original evidence. Oh well, at least you are correcting your own mistakes when they are pointed out to you - it seems like a kind of progress to me.

comment by Phil_Goetz6 · 2008-12-11T23:42:16.000Z · LW(p) · GW(p)

Eliezer: "and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever)."

Why do the values freeze? Because there is no more competition? And if that's the problem, why not try to plan a transition from pre-AI to an ecology of competing AIs that will not converge to a singleton? Or spell out the problem clearly enough that we can figure whether one can achieve a singleton that doesn't have that property?

(Not that Eliezer hasn't heard me say this before. I made a bit of a speech about AI ecology at the end of the first AGI conference a few years ago.)

Robin: "In a foom that took two years, if the AI was visible after one year, that might give the world a year to destroy it."

Yes. The timespan of the foom is important largely because it changes what the AI is likely to do, because it changes the level of danger that the AI is in and the urgency of its actions.

Eliezer: "When I try myself to visualize what a beneficial superintelligence ought to do, it consists of setting up a world that works by better rules, and then fading into the background."

There are many sociological parallels between Eliezer's "movement", and early 20th-century communism.

Eliezer: "I truly do not understand how anyone can pay any attention to anything I have said on this subject, and come away with the impression that I think programmers are supposed to directly impress their non-meta personal philosophies onto a Friendly AI."

I wonder if you're thinking that I meant that. You can see that I didn't in my first comment on Visions of Heritage. But I do think you're going one level too few meta. And I think that CEV would make it very hard to escape the non-meta philosophies of the programmers. It would be worse at escaping them than the current, natural system of cultural evolution is.

Numerous people have responded to some of my posts by saying that CEV doesn't restrict the development of values (or equivalently, that CEV doesn't make AIs less free). Obviously it does. That's the point of CEV. If you're not trying to restrict how values develop, you might as well go home and watch TV and let the future spin out of control. One question is where "extrapolation" fits on a scale between "value stasis" and "what a free wild-type AI would think of on its own." Is it "meta-level value stasis"?

I think that evolution and competition have been pretty good at causing value development. (That's me going one more level meta.) Having competition between different subpopulations with different values is a key part of this. Taking that away would be disastrous.

Not to mention the fact that value systems are local optima. If you're doing search, it might make sense to average together some current good solutions and test the results out, in competition with the original solutions. It is definitely a bad idea to average together your current good solutions and replace them with the average.

comment by Phil_Goetz6 · 2008-12-11T23:48:09.000Z · LW(p) · GW(p)

Eliezer: "Tim probably read my analysis using the self-optimizing compiler as an example, then forgot that I had analyzed it and thought that he was inventing a crushing objection on his own. This pattern would explain a lot of Phil Goetz too."

No; the dynamic you're thinking of is that I raise objections to things that you have already analyzed, because I think your analyis was unconvincing. Eg., the recent Attila the Hun / Al Qaeda example. The fact that you have written about something doesn't mean you've dealt with it satsifactorily.

comment by Vladimir_Nesov · 2008-12-12T00:19:45.000Z · LW(p) · GW(p)

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.

Your perception of lawful extrapolation of values as "stasis" seems to stem from intuitions about free will. If you look at the worldline as a 4D crystal, everything is set in stone, according to laws of physics. The future is determined by the content of the world, in particular by actors embedded in it. If you allow AI to fiddle with the development of humanity, you can view it as a change in underlying laws of physics in which humanity is embedded, not as a change on the level you'd recognize as interference in your decision-making. If it must, this change can drive the events in ways so locally insignificant you'd need to be a superintelligence yourself to tell them from chance, but it could act as a special "luck" that in the end results in the best possible outcome given the allowed level of interference.

comment by JulianMorrison · 2008-12-12T00:23:11.000Z · LW(p) · GW(p)

A two year FOOM doesn't have to be obvious for one year or even half a year. If the growth rate is up-curving, it's going to spend most of its ascent looking a bit ELIZA, and then it's briefly a cute news-darling C3PO, and then it goes all ghost-in-the shell - game over. Even if there is a window of revealed vulnerability, will you without hindsight recognize it? Can you gather the force and political will in time? How would you block the inevitable morally outraged (or furtively amoral) attempts to rebuild?

Bruce Willis is not the answer.

comment by Grant · 2008-12-12T00:37:41.000Z · LW(p) · GW(p)

The problems that I see with friendly AGI are:

1) Its not well understood outside of AI researchers, so the scientists who create it will build what they think is the most friendly AI possible. I understand what Eliezer is saying about not using his personal values, so instead he uses his personal interpretation of something else. Eliezer says that making a world which works by "better rules" then fading away would not be a "god to rule us all", but who's decided on those rules (or the processes by which the AI decides on those rules)? Ultimately its the coders who design the thing. Its a very small group of people with specialized knowledge changing the fate of the entire human race.

2) Do we have any reason to believe that a single foom will drastically increase an AI's intelligence, as opposed to making it just a bit smarter? Typically, recursive self-improvement does make significant headway, until the marginal return on investment in more improvement is eclipsed by other (generally newer) projects.

3) If an AGI could become so powerful as to rule the world in a short time span, any group which disagrees with how an AGI project is going will try to create their own before the first one is finished. This is a prisoner's dilemma arms-race scenario. Considerations about its future friendliness could be put on hold in order to get it out "before those damn commies do".

4) In order to create an AGI before the opposition, vast resources would be required. The process would almost certainly be undertaken by governments. I'm imagining the cast of characters from Dr. Strangelove sitting in the War Room and telling the programmers and scientist how to design their AI.

In short, I think the biggest hurdles are political, and so I'm not very optimistic they'll be solved. Trying to create a friendly AI in response to someone else creating a perceived unfriendly AI is a rational thing to do, but starting the first friendly AI project may not be rational.

I don't see whats so bad about a race of machines wiping us out though; we're all going to die and be replaced by our children in one way or another anyways.

comment by Phil_Goetz6 · 2008-12-12T01:25:40.000Z · LW(p) · GW(p)

It would have been better of me to reference Eliezer's Al Qaeda argument, and explain why I find it unconvincing.

Vladimir:

Phil, in suggesting to replace an unFriendly AI that converges on a bad utility by a collection of AIs that never converge, you are effectively trying to improve the situation by injecting randomness in the system.

You believe evolution works, right?

You can replace randomness only once you understand the search space. Eliezer wants to replace the evolution of values, without understanding what it is that that evolution is optimizing. He wants to replace evolution that works, with a theory that has so many weak links in its long chain of logic that there is very little chance it will do what he wants it to, even supposing that what he wants it to do is the right thing to do.

Vladimir:

Your perception of lawful extrapolation of values as "stasis" seems to stem from intuitions about free will.

That's a funny thing to say in response to what I said, including: 'One question is where "extrapolation" fits on a scale between "value stasis" and "what a free wild-type AI would think of on its own."' It's not that I think "extrapolation" is supposed to be stasis; I think it may be incoherent to talk about an "extrapolation" that is less free than "wild-type AI", and yet doesn't keep values out of some really good areas in value-space. Any way you look at it, it's primates telling superintelligences what's good.

As I just said, clearly "extrapolation" is meant to impose restrictions on the development of values. Otherwise it would be pointless.

Vladimir:

it could act as a special "luck" that in the end results in the best possible outcome given the allowed level of interference.

Please remember that I am not assuming that FAI-CEV is an oracle that magically works perfectly to produce the best possible outcome. Yes, an AI could subtly change things so that we're not aware that it is RESTRICTING how our values develop. That doesn't make it good for the rest of all time to be controlled by the utility functions of primates (even at a meta level).

Here's a question whose answer could diminish my worries: Can CEV lead to the decision to abandon CEV? If smarter-than-humans "would decide" (modulo the gigantic assumption CEV makes that it makes sense to talk about what "smarter than humans would decide", as if greater intelligence made agreement more rather than less likely - and, no, they will not be perfect Bayesians) that CEV is wrong, does that mean an AI guided by CEV would then stop following CEV?

If this is so, isn't it almost probability 1 that CEV will be abandoned at some point?

comment by billswift · 2008-12-12T01:26:40.000Z · LW(p) · GW(p)

Eliezer, maybe you should be writing fiction. You say you want to inspire the next generation of researchers and you're spending a lot of time writing these essays and correcting misconceptions of people who never read or didn't understand earlier essays (fiction could tie the different parts of your argument together better than this essay style. Why not try coming up with several possible scenarios along with your thinking embedded in them. It may be worth remembering that far more of the engineers working on Apollo spoke of being inspired by Robert Heinlein, than by Goddard and von Braun and the rocket pioneers.

comment by Carl_Shulman · 2008-12-12T01:40:04.000Z · LW(p) · GW(p)

"If this is so, isn't it almost probability 1 that CEV will be abandoned at some point?"

Phil, if a CEV makes choices for reasons why would you expect it to have a significant chance of reversing that decision without any new evidence or reasons, and for this chance to be independent across periods? I can be free to cut off my hand with an axe, even if the chance that I'll do it is very low, since I have reasons not to.

comment by Vladimir_Nesov · 2008-12-12T02:40:56.000Z · LW(p) · GW(p)

Phil, I don't see the point in criticizing a flawed implementation of CEV. If we don't know how to implement it properly, if we don't understand how it's supposed to work in much more technical detail than the CEV proposal includes, it shouldn't be implemented at all, no more than a garden-variety unFriendly AI. If you can point out a genuine flaw in a specific scenario of FAI's operation, right implementation of CEV shouldn't lead to that. To answer your question, yes, CEV could decide to disappear completely, construct an unintelligent artifact, or produce an AI with some strange utility. It makes a single decision, an attempt to deliver humane values through the threshold of inability to self-reflect, and what comes of it is anyone's guess.

comment by Nick Hay (nickjhay) · 2008-12-12T02:57:00.000Z · LW(p) · GW(p)

Phil: Yes. CEV completely replaces and overwrites itself, by design. Before this point it does not interact with the external world to change it in a significant sense (it cannot avoid all change; e.g. its computer will add tiny vibrations to the Earth, as all computers do). It executes for a while then overwrites itself with a computer program (skipping every intermediate step here). By default, and if anything goes wrong, this program is "shutdown silently, wiping the AI system clean."

(When I say "CEV" I really mean a FAI which satisfies the spirit behind the extremely partial specification given in the CEV document. The CEV document says essentially nothing of how to implement this specification.)

comment by Fenty · 2008-12-12T04:07:00.000Z · LW(p) · GW(p)

I like the argument that true AGI should take massive resources to make, and people with massive resources are often unfriendly, even if they don't know it.

The desired case of FOOM is a Friendly AI, built using deep insight, so that the AI never makes any changes to itself that potentially change its internal values; all such changes are guaranteed using strong techniques that allow for a billion sequential self-modifications without losing the guarantee. The guarantee is written over the AI's internal search criterion for actions, rather than external consequences.

This is blather. A self-modifying machine that fooms yet has limitations on how it can modify itself? A superintelligent machine that can't get around human-made restraints?

You can't predict the future, except you can predict it won't happen the way you predict it will.

comment by CarlShulman · 2008-12-12T04:24:00.000Z · LW(p) · GW(p)

Fenty,

I give you Nick Bostrom:

"If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness. This point is elementary. A âfriendâ who seeks to transform himself into somebody who wants to hurt you, is not your friend."

comment by Grant · 2008-12-12T05:04:00.000Z · LW(p) · GW(p)

Fenty, I didn't mean to suggest that people with massive resources are unfriendly more than others, but more that people with power have little reason to respect those without power. Humans have a poor track record of coercive paternalizing regardless of stated motives (I believe both Bryan and Eliezer have posted about that quite a bit in the past). I just don't think the people with the capabilities to get the first AGI online would posses the impeccable level of friendliness needed, or anywhere near it.

If Eliezer is right about the potential of AGI, then building the first one for the good of humanity might be irrational because it might spark an AI-arms-race (which would almost certainly lower the quality of friendliness of the AIs).

comment by Vladimir_Golovin · 2008-12-12T07:38:00.000Z · LW(p) · GW(p)

Eliezer, how about turning the original post into a survey? It's already structured, so all that you (or someone with an hour of free time) have to do is:

1) Find a decent survey-creating site.
2) Enter all paragraphs the original post (maybe except #9) as questions.
3) Allow the results to be viewed publicly, without any registration.

The answer to each question would be a list of radio-buttons like this:
( ) Strongly agree
(Â·) Agree
( ) Don't know
( ) Disagree
( ) Strongly disagree

Does anybody know a survey site that allows all of the above?

comment by Wei_Dai2 · 2008-12-12T07:47:00.000Z · LW(p) · GW(p)

Isn't CEV just a form of Artificial Mysterious Intelligence? Eliezer's conversation with the anonymous AIfolk seems to make perfect sense if we search and replace "neural network" with "CEV" and "intelligence" with "moral growth/value change".

How can the same person that objected to "Well, intelligence is much too difficult for us to understand, so we need to find some way to build AI without understanding how it works." by saying "Look, even if you could do that, you wouldn't be able to predict any kind of positive outcome from it. For all you knew, the AI would go out and slaughter orphans." be asking us to to place our trust in the mysterious moral growth of nonsentient but purportedly human-like simulations?

Replies from: Luke_A_Somers

↑ comment by Luke_A_Somers · 2012-12-30T16:28:39.417Z · LW(p) · GW(p)

The difference is that an entity would be going out and understanding moral value change. The same cannot be said of neural networks and intelligence itself.

comment by Tim_Tyler · 2008-12-12T08:25:00.000Z · LW(p) · GW(p)

"If a superintelligence starts out with a friendly top goal, however, then it can be relied on to stay friendly, or at least not to deliberately rid itself of its friendliness. This point is elementary. A âfriendâ who seeks to transform himself into somebody who wants to hurt you, is not your friend."

Well, that depends on the wirehead problem - and it is certainly not elementary. The problem is with the whole idea that there may be something such as a "friendly top goal" in the first place.

The idea that a fully self-aware powerful agent that has access to its own internals can be made to intrinsically have environment-related goals - or any other kind of external referents - is a challenging and difficult one - and success at doing this has yet to be convincingly demonstrated. It is possible - if you "wall off" bits of the superintelligence - but then you have the problem of the superintelligence finding ways around the walls.

comment by Cameron_Taylor · 2008-12-12T10:59:00.000Z · LW(p) · GW(p)

Thanks, seeing the claims all there together is useful.

The technical assumptions and reason all seem intuitive (given the last couple of years of background given here). The meta-ethic FAI singleton seems like the least evil goal I can imagine, given the circumstances.

A superintelligent FAI, with the reliably stable values that you mention, sounds like an impossible goal to achieve. Personally, I assign a significant probability to your failure, either by being too slow to prevent cataclysmic alternatives or by making a fatal mistake. Nevertheless, your effort is heroic. It is fortunate that many things seem impossible right up until the time someone does them.

comment by Mitchell_Porter · 2008-12-12T12:14:00.000Z · LW(p) · GW(p)

I don't understand the skepticism (expressed in some comments) about the possibility of a superintelligence with a stable top goal. Consider that classic computational architecture, the expected-utility maximizer. Such an entity can be divided into a part which evaluates possible world-states for their utility (their "desirability"), according to some exact formula or criterion, and into a part which tries to solve the problem of maximizing utility by acting on the world. For the goal to change, one of two things has to happen: either the utility function - the goal-encoding formula - is changed, or the interpretation of that formula - its mapping onto world-states - is changed. And it doesn't require that much intelligence to see that either of these changes will be bad, from the perspective of the current utility function as currently interpreted. Therefore, preventing such changes is an elementary subgoal, almost as elementary as physical self-preservation.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-12T13:43:00.000Z · LW(p) · GW(p)

Nick Hay, a CEV might also need to gather more information, nondestructively and unobtrusively. So even before the first overwrite you need a fair amount of FAI content just so it knows what's valuable and shouldn't be changed in the process of looking at it, though before the first overwrite you can afford to be conservative about how little you do. But English sentences don't work, so "look but don't touch" is not a trivial criterion (it contains magical categories).

Wei Dai, I thought that I'd already put in some substantial work in cashing out the concept of "moral growth" in terms of a human library of non-introspectively-accessible circuits that respond to new factual beliefs and to new-found arguments from a space of arguments that move us, as well as changes in our own makeup thus decided. In other words, I thought I'd tried to cash out "reflective equilibrium" in naturalistic terms. I don't think I'm finished with this effort, but are you unsatisfied with any of the steps I've taken so far? Where?

All, a Friendly AI does not necessarily maximize an object-level expected utility.

comment by billswift · 2008-12-12T15:37:00.000Z · LW(p) · GW(p)

I see a possible problem with FAI in general. The real world is deterministic on the most fundamental level, but not even a super-powerful computer could handle realistic problems at that level, so it uses stochastic, Bayesian, probabilistic, whatever you want to call them methods to model the apparent randomness at more tractable levels. Once it start uses these methods for other problems, what is to stop it from applying them to its goal-system (meta-ethical or whatever you want to call it)? Not in an attempt to become unFriendly, but to improve its goal-system when it realizes that there may be room for improvement in that system also. And it discovers that Friendliness isn't a necessary part of itself but was programmed in and can be modified.

comment by luzr · 2008-12-12T18:15:00.000Z · LW(p) · GW(p)

"real world is deterministic on the most fundamental level"

Is it?

http://en.wikipedia.org/wiki/Determinism#Determinism.2C_quantum_mechanics.2C_and_classical_physics

comment by Wei_Dai2 · 2008-12-12T19:24:00.000Z · LW(p) · GW(p)

Eliezer, as far as I can tell, "reflective equilibrium" just means "the AI/simulated non-sentient being can't think of any more changes that it wants to make" so the real question is what counts as a change that it wants to make? Your answer seems to be whatever is decided by "a human library of non-introspectively-accessible circuits". Well the space of possible circuits is huge, and "non-introspectively-accessible" certainly doesn't narrow it down much. And (assuming that "a human library of circuits" = "a library of human circuits") what is a "human circuit"? A neural circuit copied from a human being? Isn't that exactly what you argued against in "Artificial Mysterious Intelligence"?

(It occurs to me that perhaps you're describing your understanding of how human beings do moral growth and not how you plan for an AI/simulated non-sentient being to do it. But if so, that understanding seems to be similar in usefulness to "human beings use neural networks to decide how to satisfy their desires.")

Eliezer wrote: I don't think I'm finished with this effort, but are you unsatisfied with any of the steps I've taken so far? Where?

The design space for "moral growth" is just as big as the design space for "optimization" and the size of the target you have to hit in order to have a good outcome is probably just as small. More than any dissatisfaction with the specific steps you've taken, I don't understand why you don't seem to (judging from your public writings) view the former problem to be as serious and difficult as the latter one, if not more so, because there is less previous research and existing insights that you can draw from. Where are the equivalents of Bayes, von Neumann-Morgenstern, and Pearl, for example?

comment by Tim_Tyler · 2008-12-12T21:17:00.000Z · LW(p) · GW(p)

Wireheading (in the form of drug addiction) is a real-world phenomenon - so presumably your position is that there's some way of engineering a superintelligence so it is not vulnerable to the same problem.

To adopt the opposing position for a moment, the argument goes that a sufficiently-intelligent agent with access to its internals would examine itself - conclude that external referents associated with its utility function were actually superfluous nonsense; that it had been living under a delusion about its true goals - and that it could better maximise expected utility by eliminating its previous delusion, and expecting extremely large utility.

In other words, the superintelligence would convert to Buddhism - concluding that happiness lies within - and that the wheel of suffering is something to be escaped from.

We have a model of this kind of thing in the human domain: religious conversion. An agent may believe strongly that it's aim in life is to do good deeds, and go to heaven. However, they may encounter evidence which weakens this belief, and as evidence accumulates their original beliefs can sometimes gradually decay - and eventually crumble catastrophically - and as a result, values and behaviour can change.

Others argue that this sort of thing would not happen to correctly-constructed machine intelligences (e.g. see Yudkowsky and Omohundro) - but none of the arguments seems terribly convincing - and there's no math proof either way.

Obviously, we are not going to see too many wireheads in practice, under either scenario - but the issue of whether they form "naturally" or not still seems like an important one to me.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-12T21:37:00.000Z · LW(p) · GW(p)

Wei, when you're trying to create intelligence, you're not trying to get it human, you're trying to get it rational.

When it comes to morality - well, my morality doesn't talk about things being right in virtue of my brain thinking them, but it so happens that my morality is only physically written down in my brain and nowhere else in the physical universe. Likewise with all other humans.

So to get a powerful moral intelligence, you've got to create intelligence to start with using an implementation-independent understanding, and then direct that intelligence to acquire certain information off of physical human brains (because that information doesn't exist anywhere else), whether that has to be done by directly scanning a brain via nondestructive nanotech, or can be confidently asserted just from examining the causal shadows of brains (like their written words).

comment by Virge2 · 2008-12-13T02:35:00.000Z · LW(p) · GW(p)

Carl: This point is elementary. A âfriendâ who seeks to transform himself into somebody who wants to hurt you, is not your friend."

The switch from "friendly" (having kindly interest and goodwill; not hostile) to a "friend" (one attached to another by affection or esteem) is problematic. To me it radically distorts the meaning of FAI and makes this pithy little sound-bite irrelevant. I don't think it helps Bostrom's position to overload the concept of friendship with the connotations of close friendship.

Exactly how much human bias and irrationality is needed to sustain our human concept of "friend", and is that a level of irrationality that we'd want in a superintelligence? Can the human concept of friendship (involving extreme loyalty and trust in someone we've happened to have known for some time and perhaps profitably exchanged favours with) be applied to the relationship between a computer and a whole species?

I can cope with the concept of "friendly" AI (kind to humans and non hostile), but I have difficulty applying the distinct English word "friend" to an AI.

Suggested listening: Tim Minchin - If I Didn't Have You
http://www.youtube.com/watch?v=Gaid72fqzNE

comment by Virge2 · 2008-12-13T02:39:00.000Z · LW(p) · GW(p)

Correction: I don't think it helps Bostrom's position to overload the concept of friendship friendly with the connotations of close friendship.

comment by Wei_Dai2 · 2008-12-13T03:44:00.000Z · LW(p) · GW(p)

Eliezer, you write as if there is no alternative to this plan, as if your hand is forced. But that's exactly what some people believe about neural networks. What about first understanding human morality and moral growth, enough so that we (not an AI) can deduce and fully describe someone's morality (from his brain scan, or behavior, or words) and predict his potential moral growth in various circumstances, and maybe enough to correct any flaws that we see either in the moral content or in the growth process, and finally program the seed AI's morality and moral growth based on that understanding once we're convinced it's sufficiently good? Your logic of (paraphrasing) "this information exists only in someone's brain so I must let the AI grab it directly without attempting to understand it myself" simply makes no sense. First the conclusion doesn't follow from the premise, and second if you let the AI grab and extrapolate the information without understanding it yourself, there is no way you can predict a positive outcome.

In case people think I'm some kind of moralist for harping on this so much, I think there are several other aspects of intelligence that are not captured by the notion of "optimization". I gave some examples here. We need to understand all aspects of intelligence, not just the first facet for which we have a good theory, before we can try to build a truly Friendly AI.

comment by samantha · 2008-12-13T06:21:00.000Z · LW(p) · GW(p)

I find the hypothesis that an AGI's values will remain frozen highly questionable. To be believable one would have to argue that the human ability to question values is due only or principally to nothing more than the inherent sloppiness of our evolution. However, I see no reason to suppose that an AGI would apply its intelligence to every aspect of its design except its goal structure. I see no reason to suppose that relatively puny and sloppy minds can do a level of questioning and self-doubt that a vastly superior intelligence never will or can.

I also find in extremely doubtful that any human being has a mind sufficient to make guarantees of what will remain immutable in a much more sophisticated mind after billions of iterative improvements. It will take extremely strong arguments before this appears even remotely feasible.

I don't find CEV at all convincing as the basis for FAI as detailed some time ago on the SL4 list.

Please explicate what you mean by "reflective equilibria of the whole human species. What does the "human species" have to do with it if the "human" as we know it is only a phase on the way to something other that humanity or at least some humans may become?

I don't think it is realistic to both create an intelligence that goes "FOOM" by self-improvement and that is any less than a god compared to us. I know you think you can create something that is not necessarily ever self-aware and yet can maximize human well-being or at least you have seemed to hold this position in the past. I do not believe that is possible. An intelligence that mapped human psychology that deeply would be forced to map our relationships to it. Thus self-awareness along with a far deeper introspection than humans can dream of is inescapable.

That humans age and die does not imply a malevolent god set things up (or exists of course). This stage may be inescapable for the growing of new independent intelligences. To say that this is obviously evil is possibly provincial and a very biased viewpoint. We do not know enough to say.

If "testing is not sufficient" then exactly how are you to know that you have got it right in this colossal undertaking?

From where I am sitting it very much looks like you are trying to do the impossible - trying to not only create an intelligence that dwarfs your own by several orders of magnitude but also to guarantees its fundamental values and the overall results of its implementations of those values in reality with respect to humanity. If that is not impossible then I don't know what is.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-13T09:11:00.000Z · LW(p) · GW(p)

Wei, the criterion "intelligent" compresses down to a very simple notions of effective implementation abstracted away from the choice of goal. After that, the only question is how to get "intelligence", which is something you can, in principle, learn by observation (if you start with learning ability). Flaws in a notion of "intelligence" can be self-corrected if not too great; you observe that what you're doing isn't working (for your goal criterion).

Morality does not compress; it's not something you can learn just by looking at the (nonhuman) environment or by doing logic; if you want to get all the details correct, you have to look at human brains.

That's the difference.

Samantha, were you around for the metaethics sequence?

comment by Richard_Hollerith · 2008-12-13T12:39:00.000Z · LW(p) · GW(p)

Speaking of compressing down nicely, that is a nice and compressed description of humanism. Singularitarians, question humanism.

comment by Richard_Hollerith · 2008-12-13T13:03:00.000Z · LW(p) · GW(p)

Question for Eliezer. If the human race goes extinct without leaving any legacy, then according to you, any nonhuman intelligent agent that might come into existence will be unable to learn about morality?

If your answer is that the nonhuman agent might be able to learn about morality if it is sentient then please define "sentient". What is it about a paperclip maximizer that makes it nonsenient? What is it about a human that makes it sentient?

comment by Will_Pearson · 2008-12-13T15:29:00.000Z · LW(p) · GW(p)

Morality does not compress; it's not something you can learn just by looking at the (nonhuman) environment or by doing logic; if you want to get all the details correct, you have to look at human brains.

Why? Why can't you rewrite this as "complexity and morality"?

You may talk about the difference between mathematical and moral insights. Which is true, but then mathematical insights aren't sufficient for intelligence. Maths doesn't tell you whether a snake is poisonous and will kill you or not....

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-13T15:51:00.000Z · LW(p) · GW(p)

Terminal values don't compress. Instrumental values compress to terminal values.

comment by Will_Pearson · 2008-12-13T17:27:00.000Z · LW(p) · GW(p)

Are you saying "snakes are often deadly poisonous to humans" is an instrumental value?

I'd agree that dying is bad therefore avoid deadly poisonous things. But I still don't see that snakes have little xml tags saying keep away, might be harmful.... I don't see that as a value of any sort.

comment by Tim_Tyler · 2008-12-13T19:47:00.000Z · LW(p) · GW(p)

However, I see no reason to suppose that an AGI would apply its intelligence to every aspect of its design except its goal structure.

I don't think that describes the probable outcome anybody's superintelligence construction plan.

comment by Wei_Dai2 · 2008-12-14T06:58:00.000Z · LW(p) · GW(p)

(Eliezer, why do you keep using "intelligence" to mean "optimization" even after agreeing with me that intelligence includes other things that we don't yet understand?)

Morality does not compress

You can't mean that morality literally does not compress (i.e. is truly random). Obviously there are plenty of compressible regularities in human morality. So perhaps what you mean is that it's too hard or impossible to compress it into a small enough description that humans can understand. But, we also have no evidence that effective universal optimization in the presence of real-world computational constraints (as opposed to idealized optimization with unlimited computing power) can be compressed into a small enough description that humans can understand.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-14T15:29:00.000Z · LW(p) · GW(p)

Wei, I was agreeing with you that these were important questions - not necessarily agreeing with your thesis "there's more to intelligence than optimization". Once you start dealing in questions like those, using a word like "intelligence" implies that all the answers are to be found in a single characteristic and that this characteristic has something to do with the raw power of a mind. Whereas I would be more tempted to look at the utility function, or the structure of the prior - an AI that fails to see a question where we see one is not necessarily stupid; it may simply not care about our own stupidity, and be structured too far outside it to ever see a real question as opposed to a problem of finding a word-string that satisfies certain apes that they have been answered.

Human morality "compresses", in some sense, to a certain evolutionary algorithm including the stupidities of that algorithm (which is why it didn't create expected fitness maximizers) and various contingencies about the ancestral environment we were in and a good dose of sheer path dependency. But since running the same Earth over again wouldn't necessarily create anything like humans, you can't literally compress morality to that.

On the other hand, intelligence - or let us rather say "optimization under real-world constraints" - is something that evolution did cough out, and if you took another world starting with a different first replicator, you would rate the probability far higher of seeing "efficient cross-domain optimization" than "human morality", and the probability of seeing a mind that obsessed about qualia would be somewhere in between.

So "efficient cross-domain optimization" is something you can get starting from the criterion of "optimization", looking at the environment to figure out the generalizations, testing things to see if they "work" according to a criterion already possessed - with no need to look at human brains as a reference.

comment by Wei_Dai2 · 2008-12-14T20:45:00.000Z · LW(p) · GW(p)

Maybe we don't need to preserve all of the incompressible idiosyncrasies in human morality. Considering that individuals in the post-Singularity world will have many orders of magnitude more power than they do today, what really matter are the values that best scale with power. Anything that scales logarithmically for example will be lost in the noise compared to values that scale linearly. Even if we can't understand all of human morality, maybe we will be able to understand the most important parts.

Just throwing away parts of one's utility function seems bad. That can't be optimal right? Well, as Peter de Blanc pointed out, it can be if there is no feasible alternative that improves expected utility under the original function. We should be willing to lose our unimportant values to avoid or reduce even a small probability of losing the most important ones. With CEV, we're supposed to implement it with the help of an AI that's not already Friendly, and if we don't get it exactly right on the first try, we can't preserve even our most important values. Given that we don't know how to safely get an unFriendly AI to do anything, much less something this complicated, the probability of failure seems quite large.

comment by lukeprog · 2011-03-26T03:56:31.334Z · LW(p) · GW(p)

The most painful memories in my life have been when other people thought they knew better than me, and tried to do things on my behalf.

Fascinating. You must either lead an extraordinarily pain-free existence, or not just be "individualistic" but very sensitive about your competence - which I think is odd for someone of such deep and wide-ranging intellectual competence.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-03-26T04:07:34.625Z · LW(p) · GW(p)

Got no idea what you mean by either of those clauses. Why wouldn't the most painful times in your life be someone else's well-meant disaster, forced on you without your control? And then the second clause I don't understand at all.

Replies from: lukeprog

↑ comment by lukeprog · 2011-03-26T04:16:19.755Z · LW(p) · GW(p)

Oh, I gotcha. I thought you were saying that you are so individualistic that the subjective experience of having someone think they knew better than you and trying to do something on your behalf was badly painful, but now it sounds to me like you're saying the consequences of people trying to do things on your behalf with an inferior understanding of the situation are some of the most painful memories of your life, because those other persons really screwed things up. I assumed the first interpretation because the sentence I quoted is followed by one describing the hedonic consequences of doing things yourself.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2011-03-26T05:09:48.246Z · LW(p) · GW(p)

...you must've led an existence extraordinarily free of other people screwing up your life.

What I Think, If Not Why

Contents

103 comments