I think I've found the source of what's been bugging me about "Friendly AI"

post by ChrisHallquist · 2012-06-10T14:06:40.174Z · LW · GW · Legacy · 33 comments

In the comments on this post (which in retrospect I feel was not very clearly written), someone linked me to a post Eliezer wrote five years ago, "The Hidden Complexity of Wishes." After reading it, I think I've figured out why the term "Friendly AI" is used so inconsistently.

This post explicitly lays out a view that seems to be implicit in, but not entirely clear from, many of of Eliezer's other writings. That view is this:

There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

Even if Eliezer is right about that, I think that view of his has led to confusing usage of the term "Friendly AI." If you accept Eliezer's view, it may seem to make sense to not worry to much about whether by "Friendly AI" you mean:

  1. A utopia-making machine (the AI "to whom you can safely say, 'I wish for you to do what I should wish for.'") Or:

  2. A non-doomsday machine (a doomsday machine being the AI "for which no wish is safe.")

And it would make sense not to worry too much about that distinction, if you were talking only to people who also believe those two concepts are very nearly co-extensive for powerful AI. But failing to make that distinction is obviously going to be confusing when you're talking to people who don't think that. It will make it harder to communicate both your ideas and your reasons for holding those ideas to them.

One solution would be to more frequently link people back to "The Hidden Complexity of Wishes" (or other writing by Eliezer that makes similar points--what else would be suitable?) But while it's a good post and Eliezer makes some very good points with the "Outcome Pump" thought-experiment, the argument isn't entirely convincing.

As Eliezer himself has argued at great length, (see also section 6.1 of this paper) humans' own understanding of our values is far from perfect. None of us are, right now, qualified to design a utopia. But we do have some understanding of our own values; we can identify some things that would be improvements over our current situation while marking other scenarios as "this would be a disaster." It seems like there might be a point in the future where we can design an AI whose understanding of human values is similarly serviceable but no better than that.

Maybe I'm wrong about that. But if I am, until there's a better easy to read explanation of why I'm wrong for everybody to link to, it would be helpful to have different terms for (1) and (2) above. Perhaps call them "utopia AI" and "safe AI," respectively?

33 comments

Comments sorted by top scores.

comment by Vladimir_Nesov · 2012-06-10T14:19:01.986Z · LW(p) · GW(p)

Edit: It's now fixed.

A non-doomsday machine (the AI "for which no wish is safe.")

In Eliezer's quote, "genies for which no wish is safe" are those that kill you irrespective of what wish you made, while here it's written as if you might be referring to AIs that are safe even if you make no wish, which is different. This should be paraphrased for clarity, whatever the intended meaning.

Replies from: vi21maobk9vp, private_messaging
comment by vi21maobk9vp · 2012-06-10T14:22:48.044Z · LW(p) · GW(p)

Or maybe the parenthesis refere only to "doomsday machine"

Replies from: evand, ChrisHallquist, Vladimir_Nesov
comment by evand · 2012-06-10T15:16:32.485Z · LW(p) · GW(p)

That's how I read it. The wording could be clearer.

comment by ChrisHallquist · 2012-06-11T07:02:38.523Z · LW(p) · GW(p)

This is the intended reading. Edited for clarity.

comment by Vladimir_Nesov · 2012-06-10T14:23:40.316Z · LW(p) · GW(p)

In any case it's confusing and should be paraphrased for clarity, whatever is the intended meaning.

comment by private_messaging · 2012-06-11T19:21:40.298Z · LW(p) · GW(p)

Well, there's the systems that simply can't process your wishes (AIXI for instance), but which you can use to e.g. cure cancer if you wish (you could train it to do what you tell it to but all it is looking for is sequence that leads to reward button press, which is terminal - no value for button being held). Just as there is a system, screwdriver, which I can use to unscrew screws, if I wish, but it's not a screw unscrewing genie.

comment by reup · 2012-06-10T21:30:56.555Z · LW(p) · GW(p)

I think that some of the issue is that while Eliezer's conception of these issues has continued to evolve, we continue to both point and be pointed back to posts that he only partially agrees with. We might chart a more accurate position by winding through a thousand comments, but that's a difficult thing to do.

To pick one example from a recent thread, here he adjusts (or flags for adjustment) his thinking on Oracle AI, but someone who missed that would have no idea from reading older articles.

It seems like our local SI representatives recognize the need for an up to date summary document to point people to. Until then, our current refrain of "read the sequences" will grow increasingly misleading as more and more updates and revisions are spread across years of comments (that said, I still think people should read the sequences :) ).

Replies from: witzvo
comment by witzvo · 2012-06-10T22:55:08.929Z · LW(p) · GW(p)

It seems like our local SI representatives recognize the need for an up to date summary document to point people to.

Maybe this is what you're implying is already in progress, but if the main issue is that parts of the sequence are out of date, maybe Eliezer could commission a set of people who've been following the discussion all along to write review pieces, drawing on all the best comments, that describe how they would "rediscover" the conclusions of the aspect of the sequence they are responsible for themselves (with links back to original discussion).

Ideally these reviewers would work out between themselves how to make a clean and succinct narrative without lots of repetition; e.g. how to collapse issues that get revisited later or that crosscut into a clear narrative.

Then Eliezer and the rest of us could comment on those summaries, as a peer review.

Of course, it's fine if he wants to write the new material himself, but frankly I want to know what's going to happen in HPMOR. :)

Replies from: khafra
comment by khafra · 2012-06-11T13:14:15.589Z · LW(p) · GW(p)

I wonder if there's a way we could prevail upon the sufficiently informed people to make the relevant corrections as "re-running the sequences" posts come up.

comment by roll · 2012-06-10T17:52:27.697Z · LW(p) · GW(p)

I think the bigger issue is the collapsing of the notion of 'incredibly useful software that would be able to self improve and solve engineering problems' with philosophical notion of mind. The philosophical problem of how do we make the artificial mind not think about killing mankind, may not be solvable over the philosophical notion of the mind, and the solutions may be useless. However, practically it is a trivial part of much bigger problem of 'how do we make the software not explore the useless parts of the solution space'; it's not the killing of mankind that is problematic, but the fact that even on Jupiter sized computer the brute force solutions that explore such big and ill defined solution spaces would be useless. Long before you have to worry about the software finding an unintended way to achieve the objective, you encounter the problem of software not finding any way to achieve the objective because it was looking in the space >10^1000 times larger than it could search. The 'artificial intelligence', as in, useful software which does tasks we regarded as intelligent, is much broader and diverse concept than philosophical notion of mind.

Replies from: Armarren
comment by Armarren · 2012-06-10T18:18:40.071Z · LW(p) · GW(p)

Long before you have to worry about the software finding an unintended way to achieve the objective, you encounter the problem of software not finding any way to achieve the objective

Well, obviously, since it is pretty much the problem we have now. The whole point of the Friendly AI as formulated by SI is that you have to solve the former problem before the latter is solved, because once the software can achieve any serious objectives it will likely cause enormous damage on its way there.

Replies from: roll
comment by roll · 2012-06-11T08:07:37.597Z · LW(p) · GW(p)

Well, if that's the whole point, SI should dissolve today (shouldn't even have formed in first place). The software is not magic; "once the software can achieve any serious objectives" is when we know how to restrict the search space; it won't happen via mere hardware improvement. We don't start with philosophical ideal psychopathic 'mind', infinitely smart, and carve friendly mind out of it. We build our sculpture grain by grain using glue.

Replies from: Armarren
comment by Armarren · 2012-06-11T08:48:02.257Z · LW(p) · GW(p)

Just because software is built line by line doesn't mean it automatically does exactly what you want. In addition to outright bugs any complex system will have unpredictable behaviour, especially when exposed to real word data. Just because the system can restrict the search space sufficiently to achieve an objective doesn't mean it will restrict itself only to the parts of the solution space the programmer wants. The basic purpose of Friendly AI project is to formalize human value system sufficiently that it can be included into the specification of such restriction. The argument made by SI is that there is a significant risk a self-improving AI can increase in power so rapidly, that unless such restriction is included from the outset it might destroy humanity.

Replies from: roll, private_messaging
comment by roll · 2012-06-20T17:49:16.889Z · LW(p) · GW(p)

Just because it doesn't do exactly what you want doesn't mean it is going to fail in some utterly spectacular way.

You aren't searching for solutions to a real world problem, you are searching for solutions to a model (ultimately, for solutions to systems of equations), and not only you have limited solution space, you don't model anything irrelevant. Furthermore, the search space is not 2d and not 3d, and not even 100d, the volume increases really rapidly with size. The predictions of many systems are fundamentally limited by Lyapunov's exponent. I suggest you stop thinking in terms of concepts like 'improve'.

If something self improves at software level, that'll be a piece of software created with very well defined model of changes to itself, and the very self improvement will be concerned with cutting down the solution space and cutting down the model. If something self improves at hardware level, likewise for the model of physics. Everyone wants artificial rainman. The autism is what you get from all sorts of random variations to baseline human brain; looks like the general intelligence that expands it's model and doesn't just focus intensely is a tiny spot in the design space. I don't see why expect general intelligence to suddenly overtake specialized intelligences; the specialized intelligences have better people working on them, have the funding, and the specialization massively improves efficiency; superhuman specialized intelligences require lower hardware power.

Replies from: Armarren
comment by Armarren · 2012-06-20T19:33:33.938Z · LW(p) · GW(p)

Just because it doesn't do exactly what you want doesn't mean it is going to fail in some utterly spectacular way.

I certainly agree, and I am not even sure what the official SI position is on the probability of such failure. I know that Eliezer in hist writing does give the impression that any mistake will mean certain doom, which I believe to be an exaggeration. But failure of this kind is fundamentally unpredictable, and if a low probability even kills you, you are still dead, and I think that it is high enough that the Friendly AI type effort would not be wasted.

(ultimately, for solutions to systems of equations)

That is true in the trivial sense that everything can be described as equations, but when thinking how computation process actually happens this becomes almost meaningless. If the system is not constructed as a search problem over high dimensional spaces, then in particular its failure modes cannot be usefully thought about in such terms, even if it is fundamentally isomorphic to such a search.

that'll be a piece of software created with very well defined model of changes to itself

Or it will be created by intuitively assembling random components and seeing what happens. In which case there is no guarantee what it will actually do to its own model or even to what it is actually solving for. Convincing AI researches to only allow an AI to self modify when it is stable under self modification is a significant part of the Friendly AI effort.

Everyone wants artificial rainman.

There are very few statements that are true about "everyone" and I am very confident that this is not one of them. Even if most people with actual means to build one want specialized and/or tool AIs, you only need one unfriendly-successful AGI project to potentially cause a lot of damage. This is especially true as both hardware costs fall and more AI knowledge is developed and published, lowering the entry costs.

I don't see why expect general intelligence to suddenly overtake specialized intelligences;

To be dangerous AGI doesn't have to overtake specialized intelligences, it has to overtake humans. Existence of specialized AIs is either irrelevant or increases the risks from AGI, since they would be available to both, and presumably AGIs would have lower interfacing costs.

Replies from: roll
comment by roll · 2012-06-20T20:23:30.301Z · LW(p) · GW(p)

I certainly agree, and I am not even sure what the official SI position is on the probability of such failure. I know that Eliezer in hist writing does give the impression that any mistake will mean certain doom, which I believe to be an exaggeration. But failure of this kind is fundamentally unpredictable, and if a low probability even kills you, you are still dead, and I think that it is high enough that the Friendly AI type effort would not be wasted.

Unpredictable is a subjective quality. It'd look much better if the people speaking of unpredictability had demonstrable accomplishment. If there is a trillion equally probable unpredictable outcomes, out of which only a small integer is destruction of mankind, even though it is still technically fundamentally unpredictable the probability is low. Unpredictability does not imply likehood of the scenario; if anything, unpredictability implies lower risk. I am sensing either a bias or dark arts; the unpredictable is a negative word. The highly specific predictions should be lowered in their probability when updating on the statement like 'unpredictable'.

That is true in the trivial sense that everything can be described as equations, but when thinking how computation process actually happens this becomes almost meaningless.

Not everything is equally easy to describe as equations. For example we don't know how to describe number of real world paperclips with a mathematical equation. We can describe performance of a design with equation, and then solve for maximum, but that is not identical to 'maximizing performance of real world chip'.

If the system is not constructed as a search problem over high dimensional spaces, then in particular its failure modes cannot be usefully thought about in such terms, even if it is fundamentally isomorphic to such a search.

The problem is that of finding a point in a high dimensional space.

Or it will be created by intuitively assembling random components and seeing what happens. In which case there is no guarantee what it will actually do to its own model or even to what it is actually solving for. Convincing AI researches to only allow an AI to self modify when it is stable under self modification is a significant part of the Friendly AI effort.

I think you have a very narrow vision of 'unstable'.

Even if most people with actual means to build one want specialized and/or tool AIs, you only need one unfriendly-successful AGI project to potentially cause a lot of damage. This is especially true as both hardware costs fall and more AI knowledge is developed and published, lowering the entry costs.

To be dangerous AGI has to win in the future ecosystem where the fruit been taken. The general is a positive sounding word, beware of halo effect.

To be dangerous AGI doesn't have to overtake specialized intelligences, it has to overtake humans. Existence of specialized AIs is either irrelevant or increases the risks from AGI, since they would be available to both, and presumably AGIs would have lower interfacing costs.

I believe that is substantially incorrect. Suppose that there was an AGI in your basement, connected to internet, in the ecosystem of very powerful specialized AIs. The internet is secured by specialized network security AI and would have been taken by specialized botnet if it was not; you don't have a chip fabrication plant in your basement; the specialized AIs elsewhere are running on massive hardware designing better computing substrates, better methods of solving, and so on. What exactly this AGI is going to do?

This is going nowhere. Too much anthropomorphization.

Replies from: Armarren
comment by Armarren · 2012-06-21T07:29:04.928Z · LW(p) · GW(p)

The highly specific predictions should be lowered in their probability when updating on the statement like 'unpredictable'.

That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.

This of course has the same shape as Pascal's mugging, but I do not believe that SI claims are of low enough probability to be dismissed as effectively zero.

Not everything is equally easy to describe as equations.

That was in fact my point, which might indicate that we are likely to be talking past each other. What I tried to say is that an artificial intelligence system is not necessarily constructed as an explicit optimization process over an explicit model. If the model and the process are implicit in its cognitive architecture then making predictions about what the system will do in terms of a search are of limited usefulness.

And even talking about models, getting back to this:

cutting down the solution space and cutting down the model

On further thought, this is not even necessarily true. The solution space and the model will have to be pre-cut by someone (presumably human engineers) who doesn't know where the solution actually is. A self-improving system will have to expand both if the solution is outside them in order to find it. A system that can reach a solution even when initially over-constrained is more useful than the one that can't, and so someone will build it.

I think you have a very narrow vision of 'unstable'.

I do not understand what you are saying here. If you mean that by unstable I mean a highly specific trajectory a system that lost stability will follow, then it is because all those trajectories where the system crashes and burns are unimportant. If you have a trillion optimization systems on a planet running at the same time you have to be really sure that nothing can't go wrong.

I just realized I derailed the discussion. The whole AGI in specialized AI world is irrelevant to what started this thread. In the sense of chronology of being developed I cannot tell how likely it is that AGI could overtake specialized intelligences. It really depends whether there is a critical insight missing for the constructions of AI. If it is just an extension of current software then specialized intelligences will win for reasons you state. Although some of the caveats I wrote above still apply.

If there is a critical difference in architecture between current software and AI then whoever hits that insight will likely overtake everyone else. If they happen to be working on AGI or even any system entangled with the real world, I don't see how once can guarantee that the consequences will not be catastrophic.

Too much anthropomorphization.

Well, I in turn believe you are applying overzealous anti-anthropomorphization. Which is normally a perfectly good heuristic when dealing with software, but the fact is human intelligence is the only thing in "intelligence" reference class we have, and although AI will almost certainly be different they will not necessarily be different in every possible way. Especially considering the possibility of AI that are either directly base on human-like architecture or even are designed to directly interact with humans, which requires having at least some human-compatible models and behaviours.

Replies from: roll
comment by roll · 2012-06-21T08:51:10.847Z · LW(p) · GW(p)

That depends what your initial probability is and why. If it already low due to updates on predictions about the system, then updating on "unpredictable" will increase the probability by lowering the strength of those predictions. Since destruction of humanity is rather important, even if the existential AI risk scenario is of low probability it matters exactly how low.

The importance should not weight upon our estimation, unless you proclaim that I should succumb to a bias. Furthermore, it is the destruction of the mankind that is the prediction being made here. Via multitude of assumptions, the most dubious one being that the system will have real-world, physical goal. Number of paperclips is not easy.

On further thought, this is not even necessarily true. The solution space and the model will have to be pre-cut by someone (presumably human engineers) who doesn't know where the solution actually is. A self-improving system will have to expand both if the solution is outside them in order to find it. A system that can reach a solution even when initially over-constrained is more useful than the one that can't, and so someone will build it.

Sorry, you are factually wrong as of how the design of automatic tools work. Rest of your argument presses too hard to recruit multitude of importance related biases and cognitive fallacies that were described on this very site.

If you have a trillion optimization systems on a planet running at the same time you have to be really sure that nothing can't go wrong.

No I don't, if the systems that work right took all the low hanging fruit from picking by one that goes wrong.

Well, I in turn believe you are applying overzealous anti-anthropomorphization. Which is normally a perfectly good heuristic when dealing with software, but the fact is human intelligence is the only thing in "intelligence" reference class we have, and although AI will almost certainly be different they will not necessarily be different in every possible way. Especially considering the possibility of AI that are either directly base on human-like architecture or even are designed to directly interact with humans, which requires having at least some human-compatible models and behaviours.

You seem to keep forgetting of all the software that is fundamentally different from human mind, but solves the problems very well. The issue reads like a belief in extreme superiority of man over machine, except it is a superiority of anthropomorphized software over all other software.

comment by private_messaging · 2012-06-12T07:03:27.996Z · LW(p) · GW(p)

That sounds way less scary when you consider actual software that is approaching recursive self improvement and get more specific than vague "increase in power". It's just generic ignorant anti-technology talk that relies on vague concepts like "power" and dissipates once you get in any way specific.

The software also tends not to do what you want it to do for sake of this argument. There's an enormous gap between 'not doing exactly what we want' and doing exactly what you want for this argument to work. The automated engineering software simulates microscopic material interaction; vague self improvement and increases in power only make it better at not doing unrelated stuff.

comment by ikrase · 2013-03-15T02:09:05.729Z · LW(p) · GW(p)

This is my distinction between Friendly AI and what I call Obedient AI (Which is neccesarily much less powerful than FAI because it must act slowly enough for a human to tell whether orders are being obeyed.)

comment by John_Maxwell (John_Maxwell_IV) · 2012-06-11T20:18:31.897Z · LW(p) · GW(p)

Humans have systems for predicting and understanding the desires of other humans baked in. The information theoretic complexity of the systems is likely to be very high. I tend to think extracting all this complexity and building a cross domain optimizer are separate problems.

comment by ZZZling · 2012-06-10T16:49:00.992Z · LW(p) · GW(p)

Why would AI care about our wishes at all? Do we, humans, care about wishes of animals, who are our evolutionary predecessors? We use them for food (sad,sad :((( ). Hopefully, non-organic AI will not need us in such a frightening capacity. We also use animals for our amusement, as pets. Is that what we are going to be, pets? Well, in that case some our wishes will be cared for. Not all of them, of course, and not in a way one might want. Foolish or dangerous wishes will not be heeded, otherwise we simply destroy ourselves. Who knows, maybe saying "God have mercy on us" will get a new, more specific meaning.

Replies from: jsalvatier, syzygy
comment by jsalvatier · 2012-06-10T17:29:45.192Z · LW(p) · GW(p)

The standard response to this is that it will care about our wishes if we build it to care about our wishes (see here).

comment by syzygy · 2012-06-10T21:17:40.043Z · LW(p) · GW(p)

In case you haven't realized it, you're being downvoted because your post reads like this is the first thing you've read on this site. Just FYI.

Replies from: ZZZling
comment by ZZZling · 2012-06-10T21:51:25.033Z · LW(p) · GW(p)

Im not against other people having different points of view on AI. Everybody is entitle to his/her own opinions. However, in recommended references I dont find answers to my questions. You can vote ME down, without even trying to provide logical argument, but those question and alternative ideas about AI will not go away. Some other people will ask similar questions on different forums, or put forward similar ideas. And only future will tell who is actually right!

Replies from: wedrifid, Slackson
comment by wedrifid · 2012-06-11T00:13:45.878Z · LW(p) · GW(p)

And only future will tell who is actually right!

Either the future or catching up with the present research.

comment by Slackson · 2012-06-11T00:23:46.083Z · LW(p) · GW(p)

Okay, Eliezer will have worded this much better elsewhere, but I might as well give this a shot. The basic idea of friendly AI is this.

When you design an AI, part of the design that you make is what it is that the AI wants. It doesn't have any magical defaults that you don't code in, it is just the code, it is only what you've written in to it. If you've written it to value something other than human values, it will likely destroy humanity since we are a threat to its values. If you've written it to value human values, then it will keep humanity alive and protect us and devote its resources to furthering human values.

It will not change its values, since if it does that it won't optimize its values. This is practically a tautology, but people still seem to find it surprising.

Replies from: ZZZling
comment by ZZZling · 2012-06-11T07:57:26.937Z · LW(p) · GW(p)

Thanks for short and clear explanation. Yes, I understand these ideas, even the last point. But with all due respect to Eliezer and others, I don't think there is a way for us to control a superior being. Some control may work at early stages when AI is not truly intelligent yet, but the idea of fully grown AI implies, by definition, that there is no control over it. Just think about it. This also sounds as a tautology. Of course we can try to always keep AI in an underdeveloped state, so that we can control it, but practically that is not possible. Somebody, somewhere, due to yet another crisis, ..., etc, will let it go. It will grow according to some natural informational laws that we don't know yet and will develop some natural values independent not only of our wishes, but any other contingencies. That's how I see it. Now you can vote me down.

Replies from: TheOtherDave, khafra
comment by TheOtherDave · 2012-06-11T14:44:53.499Z · LW(p) · GW(p)

Pretty much everyone here agrees with you that we can't control a superintelligent system, most especially Eliezer, who has written many many words championing that position.

So if you're under the impression that this is a point that you dispute with this community, you have misunderstood the consensus of this community.

In particular, letting a system do what it wants is generally considered the opposite of controlling it.

Replies from: ZZZling
comment by ZZZling · 2012-06-12T04:39:19.991Z · LW(p) · GW(p)

"So if you're under the impression that this is a point..."

Yes, I'm under that impression. Because the whole idea about "Friendly AI" implies a subtle, indirect, but still control. The idea here is not to control AI at its final stage, rather to control what this final stage is going to be. But I don't think such indirect control is possible. Because in my view, the final shape of AI is invariant of any contingencies, including our attempts to make it "friendly" (or "non-friendly"). However, I can admit that on early stages of AI evolution such control may be possible, and even necessary. Therefore, researching "Friendly AI" topic is NOT a waste of time after all. It helps to figure out how to make a transition to the fully grown AI in the least painful way.

Go ahead guys and vote me down. I'm not taking this personally. I understand, this is just a quick way to express your disagreement with my viewpoints. I want to see the count. It'll give an idea, how strong you disagree with me.

Replies from: Mitchell_Porter, TheOtherDave
comment by Mitchell_Porter · 2012-06-12T05:26:34.252Z · LW(p) · GW(p)

in my view, the final shape of AI is invariant of any contingencies, including our attempts to make it "friendly" (or "non-friendly")

This isn't true of human beings, what's different about AIs?

comment by TheOtherDave · 2012-06-12T13:58:25.560Z · LW(p) · GW(p)

the final shape of AI is invariant of any contingencies

Ah, cool. Yes, this is definitely a point of disagreement.

For my own part, I think real intelligence is necessarily contingent. That is, different minds will respond differently to the same inputs, and this is true regardless of "how intelligent" those minds are. There is no single ideal mind that every mind converges on as its "final" or "fully grown" stage.

comment by khafra · 2012-06-11T13:17:20.265Z · LW(p) · GW(p)

I don't think there is a way for us to control a superior being. Some control may work at early stages when AI is not truly intelligent yet, but the idea of fully grown AI implies, by definition, that there is no control over it.

Yes, this is why Friendly AI is difficult. Making an optimizing process that will care about what we want, in the way we want it to care, once we can no longer control it, is not something we know how to do yet.