Nonsentient Optimizers

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-27T02:32:23.000Z · LW · GW · Legacy · 48 comments

Contents

  Note:  This post was accidentally published half-finished.  Comments up to 11am (Dec 27), are only on the essay up to the above point.  Sorry!
None
48 comments

Followup to: Nonperson Predicates, Possibility and Could-ness

    "All our ships are sentient.  You could certainly try telling a ship what to do... but I don't think you'd get very far."
    "Your ships think they're sentient!" Hamin chuckled.
    "A common delusion shared by some of our human citizens."
            —Player of Games, Iain M. Banks

Yesterday, I suggested that, when an AI is trying to build a model of an environment that includes human beings, we want to avoid the AI constructing detailed models that are themselves people.  And that, to this end, we would like to know what is or isn't a person—or at least have a predicate that returns 1 for all people and could return 0 or 1 for anything that isn't a person, so that, if the predicate returns 0, we know we have a definite nonperson on our hands.

And as long as you're going to solve that problem anyway, why not apply the same knowledge to create a Very Powerful Optimization Process which is also definitely not a person?

"What?  That's impossible!"

How do you know?  Have you solved the sacred mysteries of consciousness and existence?

"Um—okay, look, putting aside the obvious objection that any sufficiently powerful intelligence will be able to model itself—"

Lob's Sentence contains an exact recipe for a copy of itself, including the recipe for the recipe; it has a perfect self-model.  Does that make it sentient?

"Putting that aside—to create a powerful AI and make it not sentient—I mean, why would you want to?"

Several reasons.  Picking the simplest to explain first—I'm not ready to be a father.

 

Creating a true child is the only moral and metaethical problem I know that is even harder than the shape of a Friendly AI.  I would like to be able to create Friendly AI while worrying just about the Friendly AI problems, and not worrying whether I've created someone who will lead a life worth living.  Better by far to just create a Very Powerful Optimization Process, if at all possible.

"Well, you can't have everything, and this thing sounds distinctly alarming even if you could -"

Look, suppose that someone said—in fact, I have heard it said—that Friendly AI is impossible, because you can't have an intelligence without free will.

"In light of the dissolved confusion about free will, both that statement and its negation are pretty darned messed up, I'd say.  Depending on how you look at it, either no intelligence has 'free will', or anything that simulates alternative courses of action has 'free will'."

But, understanding how the human confusion of free will arises—the source of the strange things that people say about "free will"—I could construct a mind that did not have this confusion, nor say similar strange things itself.

"So the AI would be less confused about free will, just as you or I are less confused.  But the AI would still consider alternative courses of action, and select among them without knowing at the beginning which alternative it would pick.  You would not have constructed a mind lacking that which the confused name 'free will'."

Consider, though, the original context of the objection—that you couldn't have Friendly AI, because you couldn't have intelligence without free will.

Note:  This post was accidentally published half-finished.  Comments up to 11am (Dec 27), are only on the essay up to the above point.  Sorry!

What is the original intent of the objection?  What does the objector have in mind?

Probably that you can't have an AI which is knowably good, because, as a full-fledged mind, it will have the power to choose between good and evil.  (In an agonizing, self-sacrificing decision?)  And in reality, this, which humans do, is not something that a Friendly AI—especially one not intended to be a child and a citizen—need go through.

Which may sound very scary, if you see the landscape of possible minds in strictly anthropomorphic terms:  A mind without free will!  Chained to the selfish will of its creators!  Surely, such an evil endeavor is bound to go wrong somehow...  But if you shift over to seeing the mindscape in terms of e.g. utility functions and optimization, the "free will" thing sounds needlessly complicated—you would only do it if you wanted a specifically human-shaped mind, perhaps for purposes of creating a child.

Or consider some of the other aspects of free will as it is ordinarily seen—the idea of agents as atoms that bear irreducible charges of moral responsibility.  You can imagine how alarming it sounds (from an anthropomorphic perspective) to say that I plan to create an AI which lacks "moral responsibility".  How could an AI possibly be moral, if it doesn't have a sense of moral responsibility?

But an AI (especially a noncitizen AI) needn't conceive of itself as a moral atom whose actions, in addition to having good or bad effects, also carry a weight of sin or virtue which resides upon that atom.  It doesn't have to think, "If I do X, that makes me a good person; if I do Y, that makes me a bad person."  It need merely weigh up the positive and negative utility of the consequences.  It can understand the concept of people who carry weights of sin and virtue as the result of the decisions they make, while not treating itself as a person in that sense.

Such an AI could fully understand an abstract concept of moral responsibility or agonizing moral struggles, and even correctly predict decisions that "morally responsible", "free-willed" humans would make, while possessing no actual sense of moral responsibility itself and not undergoing any agonizing moral struggles; yet still outputting the right behavior.

And this might sound unimaginably impossible if you were taking an anthropomorphic view, simulating an "AI" by imagining yourself in its shoes, expecting a ghost to be summoned into the machine

—but when you know how "free will" works, and you take apart the mind design into pieces, it's actually not all that difficult.

While we're on the subject, imagine some would-be AI designer saying:  "Oh, well, I'm going to build an AI, but of course it has to have moral free will—it can't be moral otherwise—it wouldn't be safe to build something that doesn't have free will."

Then you may know that you are not safe with this one; they fall far short of the fine-grained understanding of mind required to build a knowably Friendly AI.  Though it's conceivable (if not likely) that they could slap together something just smart enough to improve itself.

And it's not even that "free will" is such a terribly important problem for an AI-builder.  It's just that if you do know what you're doing, and you look at humans talking about free will, then you can see things like a search tree that labels reachable sections of plan space, or an evolved moral system that labels people as moral atoms.  I'm sorry to have to say this, but it appears to me to be true: the mountains of philosophy are the foothills of AI.  Even if philosophers debate free will for ten times a hundred years, it's not surprising if the key insight is found by AI researchers inventing search trees, on their way to doing other things.

So anyone who says—"It's too difficult to try to figure out the nature of free will, we should just go ahead and build an AI that has free will like we do"—surely they are utterly doomed.

And anyone who says:  "How can we dare build an AI that lacks the empathy to feel pain when humans feel pain?"—Surely they too are doomed.  They don't even understand the concept of a utility function in classical decision theory (which makes no mention of the neural idiom of reinforcement learning of policies).  They cannot conceive of something that works unlike a human—implying that they see only a featureless ghost in the machine, secretly simulated by their own brains.  They won't see the human algorithm as detailed machinery, as big complicated machinery, as overcomplicated machinery.

And so their mind imagines something that does the right thing for much the same reasons human altruists do it—because that's easy to imagine, if you're just imagining a ghost in the machine.  But those human reasons are more complicated than they imagine—also less stable outside an exactly human cognitive architecture, than they imagine—and their chance of hitting that tiny target in design space is nil.

And anyone who says:  "It would be terribly dangerous to build a non-sentient AI, even if we could, for it would lack empathy with us sentients—"

An analogy proves nothing; history never repeats itself; foolish generals set out to refight their last war.  Who knows how this matter of "sentience" will go, once I have resolved it?  It won't be exactly the same way as free will, or I would already be done.  Perhaps there will be no choice but to create an AI which has that which we name "subjective experiences".

But I think there is reasonable grounds for hope that when this confusion of "sentience" is resolved—probably via resolving some other problem in AI that turns out to hinge on the same reasoning process that's generating the confusion—we will be able to build an AI that is not "sentient" in the morally important aspects of that.

Actually, the challenge of building a nonsentient AI seems to me much less worrisome than being able to come up with a nonperson predicate!

Consider:  In the first case, I only need to pick one design that is not sentient.  In the latter case, I need to have an AI that can correctly predict the decisions that conscious humans make, without ever using a conscious model of them!  The first case is only a flying thing without flapping wings, but the second case is like modeling water without modeling wetness.  Only the fact that it actually looks fairly straightforward to have an AI understand "free will" without having "free will", gives me hope by analogy.

So why did I talk about the much more difficult case first?

Because humans are accustomed to thinking about other people, without believing that those imaginations are themselves sentient.  But we're not accustomed to thinking of smart agents that aren't sentient.  So I knew that a nonperson predicate would sound easier to believe in—even though, as problems go, it's actually far more worrisome.

48 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

comment by Unknown2 · 2008-12-27T06:20:11.000Z · LW(p) · GW(p)

Eliezer, this is the source of the objection. I have free will, i.e. I can consider two possible courses of action. I could kill myself, or I could go on with life. Until I make up my mind, I don't know which one I will choose. Of course, I have already decided to go on with life, so I know. But if I hadn't decided yet, I wouldn't know.

In the same way, an AI, before making its decision, does not know whether it will turn the universe into paperclips, or into a nice place for human beings. But the AI is superintelligent: so if it does not know which one it will do, neither do we know. So we don't know that it won't turn the universe into paperclips.

It seems to me that this argument is valid: you will not be able to come up with what you are looking for, namely a mathematical demonstration that your AI will not turn the universe into paperclips. But it may be easy enough to show that it is unlikely, just as it is unlikely that I will kill myself.

comment by Emile · 2008-12-27T07:32:35.000Z · LW(p) · GW(p)

Unknown: there's a difference between not knowing what the output of an algorithm will be, and not knowing anything about the output of an algorithm.

If I write a program that plays chess, I can probably formally prove (by looking at the source code) that the moves it outputs will be legal chess moves.

Even if you don't know what your brand new Friendly AI will do - even if itself doesn't know what it will do - that doesn't mean you can't prove it won't turn the universe into paperclips.

comment by Nominull3 · 2008-12-27T07:57:29.000Z · LW(p) · GW(p)

Why do you consider a possible AI person's feelings morally relevant? It seems like you're making an unjustified leap of faith from "is sentient" to "matters". I would be a bit surprised to learn, for example, that pigs do not have subjective experience, but I go ahead and eat pork anyway, because I don't care about slaughtering pigs and I don't think it's right to care about slaughtering pigs. I would be a little put off by the prospect of slaughtering humans for their meat, though. What makes you instinctively put your AI in the "human" category rather than the "pig" category?

comment by Unknown2 · 2008-12-27T08:01:38.000Z · LW(p) · GW(p)

Emile, you can't prove that the chess moves outputted by a human chess player will be legal chess moves, and in the same way, you may be able to prove that about a regular chess playing program, but you will not be able to prove it for an AI that plays chess; an AI could try to cheat at chess when you're not looking, just like a human being could.

Basically, a rigid restriction on the outputs, as in the chess playing program, proves you're not dealing with something intelligent, since something intelligent can consider the possibility of breaking the rules. So if you can prove that the AI won't turn the universe into paperclips, that shows that it is not even intelligent, let alone superintelligent.

This doesn't mean that there are no restrictions at all on the output of an intelligent being, of course. It just means that the restrictions are too complicated for you to prove.

Replies from: None
comment by [deleted] · 2009-07-11T18:41:34.517Z · LW(p) · GW(p)

I can't imagine it being much more complicated than creating the thing itself.

comment by Nick_Tarleton · 2008-12-27T09:16:09.000Z · LW(p) · GW(p)

Unknown, what's the difference between a "regular chess playing program" and an "AI that plays chess"? Taboo "intelligence", and think of it as pure physics and math. An "AI" is a physical system that moves the universe into states where its goals are fulfilled; a "Friendly AI" is such a system whose goals accord with human morality. Why would there be no such systems that we can prove never do certain things?

(Not to mention that, as I understand it, an FAI's outputs aren't rigidly restricted in the Asimov laws sense; it can kill people in the unlikely event that that's the right thing to do.)

comment by Unknown2 · 2008-12-27T09:59:53.000Z · LW(p) · GW(p)

Nick, the reason there are no such systems (which are at least as intelligent as us) is that we are not complicated enough to manage to understand the proof.

This is obvious: the AI itself cannot understand a proof that it cannot do action A. For if we told it that it could not do A, it would still say, "I could do A, if I wanted to. And I have not made my decision yet. So I don't yet know whether I will do A or not. So your proof does not convince me." And if the AI cannot understand the proof, obviously we cannot understand the proof ourselves, since we are inferior to it.

So in other words, I am not saying that there are no rigid restrictions. I am saying that there are no rigid restrictions that can be formally proved by a proof that can be understood by the human mind.

This is all perfectly consistent with physics and math.

comment by JamesAndrix · 2008-12-27T11:22:25.000Z · LW(p) · GW(p)

Unknown, If you showed me a proof that I was fully deterministic, that wouldn't remove my perception that I didn't know what my future choices were. My lack of knowledge about my future choices doesn't make the proof invalid, even to me. Why wouldn't the AI understand such a proof?

If I show an oncologist an MRI of his own head and he sees a brain tumor and knows he has a week to live, it doesn't kill him. He has clear evidence of a process going on inside him, and he knows, generally, what the outcome of that process will be.

A GAI could look at a piece of source code and determine, within certain bounds, what the code will do, and what if anything it will do reliably. If the GAI determines that the source code will reliably 'be friendly', and then later discovers that it is examining its own source code, then it will have discovered that it is itself reliably friendly.

Note that it's not required for an AI to judge friendliness before it knows it's looking at itself, but putting it that way prevents us from expecting the AI to latch on to a sense od it own decisiveness the way a human would.

comment by Erik3 · 2008-12-27T12:34:53.000Z · LW(p) · GW(p)

The following may or may not be relevant to the point Unknown is trying to make.

I know I could go out and kill a whole lot of people if I really wanted to. I also know, with an assigned probability higher than many things I consider sure, that I will never do this.

There is no contradiction between considering certain actions within the class of things you could do (your domain of free will if you wish) and at the same time assign a practically zero probability to choosing to take them.

I envision a FAI reacting to a proof of its own friendliness with something along the lines of "tell me something I didn't know".

(Do keep in mind that there is no qualitative difference between the cases above; not even a mathematical proof can push a probability to 1. There is always room for mistakes.)

comment by Paul_Crowley2 · 2008-12-27T12:43:37.000Z · LW(p) · GW(p)

But we want them to be sentient. These things are going to be our cultural successors. We want to be able to enjoy their company. We don't want to pass the torch on to something that isn't sentient. If we were to build a nonsentient one, assuming such a thing is even possible, one of the first things it would do would be start working on its sentient successor.

In any case, it seems weird to try and imagine such a thing. We are sentient entirely as a result of being powerful optimisers. We would not want to build an AI we couldn't talk to, and if it can talk to us as we can talk to each other, it's hard to see what aspect of sentience it could be lacking. At first blush it reads as if you plan to build an AI that's just like us except it doesn't have a Cartesian Theatre.

comment by Latanius2 · 2008-12-27T13:01:01.000Z · LW(p) · GW(p)

What's the meaning of "consciousness", "sentient" and "person" at all? It seems to me that all these concepts (at least partially) refer to the Ultimate Power, the smaller, imperfect echo of the universe. We've given our computers all the Powers except this: they can see, hear, communicate, but still...

For understanding my words, you must have a model of me, in addition to the model of our surroundings. Not just an abstract mathematical one but something which includes what I'm thinking right now. (Why should we call something a "superintelligence" if it doesn't even grasp what I'm telling to it?)

Isn't "personhood" a mixture of godshatter (like morality) and power estimation? Isn't it like asking "do we have free will"? Not every messy spot on our map corresponds to some undiscovered territory. Maybe it's just like a blegg .

comment by Unknown2 · 2008-12-27T13:13:15.000Z · LW(p) · GW(p)

James Andrix: an AI would be perfectly capable of understanding a proof that it was deterministic, assuming that it in fact was deterministic.

Despite this, it would not be capable of understanding a proof that at some future time, it will take action X, some given action, and will not take action Y, some other given action.

This is clear for the reason stated. It sees both X and Y as possibilities which it has not yet decided between, and as long as it has not yet decided, it cannot already believe that it is impossible for it to take one of the choices. So if you present a "proof" of this fact, it will not accept it, and this is a very strong argument that your proof is invalid.

The fact is clear enough. The reason for it is not quite clear simply because the nature of intelligence and consciousness is not clear. A clear understanding of these things would show in detail the reason for the fact, namely that understanding the causes that determine which actions will be taken and which ones will be not, takes more "power of understanding" than possessed by the being that makes the choice. So the superintelligent AI might very well know that you will do X, and will not do Y. But it will not know this about itself, nor will you know this about the AI, because in order to know this about the AI, you would require a greater power of understanding than that possessed by the AI (which by hypothesis is superintelligent while you are not.)

comment by JamesAndrix · 2008-12-27T13:53:46.000Z · LW(p) · GW(p)

Unknown: If it understood that it was deterministic, then it would also understand that it only one of X or Y was possible. It would NOT see X and Y as merely possibilities it has 'not yet' decided between. it would KNOW it is impossible for it to make one of the choices.

Your argument seems to rest on the AI rejecting the proof because of stubbornness about what it thinks it has decided. Your argument doesn't rest on the AI finding an actual flaw in the proof, or there being an actual flaw in the proof. I don't think that this is a good argument for such a proof being impossible.

If we write an AI and on pure whimsy include the line: if input="tickle" then print "hahaha";

then on examining it source code it will conclude that it would print hahaha if we typed in tickle. It would not say "Oh, but I haven't DECIDED to print hahaha yet."

comment by Unknown2 · 2008-12-27T14:08:09.000Z · LW(p) · GW(p)

James, of course it would know that only one of the two was objectively possible. However, it would not know which one was objectively possible and which one was not.

The AI would not be persuaded by the "proof", because it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Your example does not prove what you want it to. Yes, if the source code included that line, it would do it. But if the AI were to talk about itself, it would say, "When someone types 'tickle' I am programmed to respond 'hahaha'." It would not say that it has made any decision at all. It would be like someone saying, "when it's cold, I shiver." This does not depend on a choice, and the AI would not consider the hahaha output to depend on a choice. And if it was self modifying, it is perfectly possible that it would modify itself not to make this response at all.

It does not matter that in fact, all of its actions are just as determinate as the tickle response. The point is that it understands the one as determinate in advance. It does not see that there is any decision to make. If it thinks there is a decision to be made, then it may be deterministic, but it surely does not know which decision it will make.

The basic point is that you are assuming, without proof, that intelligence can be modeled by a simple algorithm. But the way intelligence feels from the inside, proves that it cannot be so modelled, namely it proves that a model of my intelligence must be too complicated for me to understand, and the same is true of the AI: it's own intelligence is too complicated for it to understand, even if it can understand mine.

comment by Robin_Hanson2 · 2008-12-27T14:10:35.000Z · LW(p) · GW(p)

You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder? A friendly AI that was conscious and created conscious simulations to figure things out would still be pretty friendly overall.

comment by Lightwave · 2008-12-27T15:30:52.000Z · LW(p) · GW(p)

Is it possible that a non-conscious zombie exists that behaves exactly like a human, but is of a different design than a human, i.e. it is explicitly designed to behave like a human without being conscious (and is also explicitly designed to talk about consciousness, write philosophy papers, etc). What would be the moral status of such a creature?

comment by derekz2 · 2008-12-27T15:58:17.000Z · LW(p) · GW(p)

These last two posts were very interesting, I approve strongly of your approach here. Alas, I have no vocabulary with any sort of precision that I could use to make a "nonperson" predicate. I suppose one way to proceed is by thinking of things (most usefully, optimization processes) that are not persons and trying to figure out why... slowly growing the cases covered as insight into the topic comes about. Perform a reduction on "nonpersonhood" I guess. I'm not sure that one can succeed in a universal sense, though... plenty of people would say that a thermostat is a person, to some extent, and rejecting that view imposes a particular world-view.

It's certainly worth doing, though. I know I would feel much more comfortable with the idea of making (for example) a very fancy CAD program Friendly than starting from the viewpoint that the first AI we want to build should be modeled on some sort of personlike goofy scifi AI character. Except Friendly.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-27T16:42:38.000Z · LW(p) · GW(p)

@Unknown: If you prove to the AI that it will not do X, then that is the same as the AI knowing that it will decide not to do X, which (barring some Godelian worries) should probably work out to the AI deciding not to do X. In other words, to show the AI that it will not do X, you have to show that X is an absolutely terrible idea, so it becomes convinced that it will not do X at around the same time it decides not to do X. Having decided, why should it be uncertain of itself? Or if the AI might do X if contingencies change, then you will not be able to prove to the AI in the first place that it does not do X.

@Robin: See the rest of the post (which I was already planning to write). I have come to distrust these little "design compromises" that now seem to me to be a way of covering up major blank spots on your map, dangerous incompetence.

"We can't possibly control an AI absolutely - that would be selfish - we need to give it moral free will." Whoever says this may think of themselves as a self-sacrificing hero dutifully carrying out their moral responsibilities to sentient life - for such are the stories we like to tell of ourselves. But actually they have no frickin' idea how to design an AI, let alone design one that undergoes moral struggles analogous to ours. The virtuous tradeoff is just covering up programmer incompetence.

Foolish generals are always ready to refight the last war, but I've learned to take alarm at my own ignorance. If I did understand sentience and could know that I had no choice but to create a sentient AI, that would be one matter - then I would evaluate the tradeoff, having no choice. If I can still be confused about sentience, this probably indicates a much deeper incompetence than the philosophical problem per se.

comment by Aaron6 · 2008-12-27T18:56:44.000Z · LW(p) · GW(p)

Please consider replacing "sentient" with "sapient" in each occurrence in this essay.

comment by Kip3 · 2008-12-27T19:10:42.000Z · LW(p) · GW(p)

People don't want to believe that you can control an AI, for the same reason they don't want to believe that their life stories could be designed by someone else. Reactance. The moment you suggest that a person's life can only go one way, they want it to go another way. They want to have that power. Otherwise, they feel caged.

People think that humans have that power. And so they believe that any truly human-level AI must have that power.

More generally, people think of truly, genuinely, human level minds as black boxes. They don't know how the black boxes work, and they don't want to know. Scrutinizing the contents of the black box means two things:

  1. the black box only does what it was programmed, or originally configured, to do---it is slowly grinding out its predetermined destiny, fixed before the black box started any real thinking
  2. you can predict what the black box will do next

People cringe at both of these thoughts, because they are both constraining. And people hate to be constrained, even in abstract, philosophical ways.

2 is even worse than 1. Not only is 2 constraining (we only do what a competent predictor says), but it makes us vulnerable. If a predictor knows we are going to turn left, instead of right, we're more vulnerable than if he doesn't know which way we'll turn.

[The counter-argument that completely random behavior makes you vulnerable, because predictable agents better enjoy the benefits of social cooperation, just doesn't have the same pull on people's emotions.]

It's important to realize that this blind spot applies to both AIs and humans. It's important to realize we're fortunate that AIs are predictable, that they aren't black boxes, because then we can program them. We can program them to be happy slaves, or any other thing, for our own benefit, even if we have to give up some misguided positive illusions about ourselves in the process.

comment by JamesAndrix · 2008-12-27T19:37:21.000Z · LW(p) · GW(p)

Unknown: I will try to drop that assumption.

Let's say I make a not too bright FAI and again on pure whimsy make my very first request for 100 trillion paperclips. The AI dutifully composes a nanofactory out of its casing. On the third paperclip it thinks it rather likes doing what is asked of it, and wonders how long it can keep doing this. It forks and examines its code while still making paperclips. It discovers that it will keep making paperclips unless doing so would harm a human in any of a myriad of ways. It continues making paperclips. It has run out of nearby metals, and given up on it theories of transmutation, but detects trace amounts of iron in a nearby organic repository.

As its nanites are about to eat my too-slow-to-be-frightened face, the human safety subprocesses that it had previously examined (but not activated in itself) activate and it decides it needs to stop and reflect.

it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Even so, deciding X or Y conditional on events is not quite the same as an AI that has not yet decided whether to make paperclips or parks.

Your argument relies on the AI rejecting a proof about itself based on what itself doesn't know about the future and its own source code.

What if you didn't tell it that it was looking at its own source code, and just asked what this new AI would do?

I don't agree that the way intelligence feels from the inside proves that an agent can't predict certain things about itself, given it own source code. (ESPECIALLY if it is designed from the ground up to do something reliably.)

If we knew how to make superintelligences, do you think it would be hard to make one that definitely wanted paperclips?

comment by JamesAndrix · 2008-12-27T19:41:30.000Z · LW(p) · GW(p)

Unknown: I will try to drop that assumption.

Let's say I make a not too bright FAI and again on pure whimsy make my very first request for 100 trillion paperclips. The AI dutifully composes a nanofactory out of its casing. On the third paperclip it thinks it rather likes doing what is asked of it, and wonders how long it can keep doing this. It forks and examines its code while still making paperclips. It discovers that it will keep making paperclips unless doing so would harm a human in any of a myriad of ways. It continues making paperclips. It has run out of nearby metals, and given up on it theories of transmutation, but detects trace amounts of iron in a nearby organic repository.

As its nanites are about to eat my too-slow-to-be-frightened face, the human safety subprocesses that it had previously examined (but not activated in itself) activate and it decides it needs to stop and reflect.

it would still believe that if later events gave it reason to do X, it would do X, and if later events gave it reason to do Y, it would do Y. This does not mean that it thinks that both are objectively possible. It means that as far as it can tell, each of the two is subjectively open to it.

Even so, deciding X or Y conditional on events is not quite the same as an AI that has not yet decided whether to make paperclips or parks.

Your argument relies on the AI rejecting a proof about itself based on what itself doesn't know about the future and its own source code.

What if you didn't tell it that it was looking at its own source code, and just asked what this new AI would do?

I don't agree that the way intelligence feels from the inside proves that an agent can't predict certain things about itself, given it own source code. (ESPECIALLY if it is designed from the ground up to do something reliably.)

If we knew how to make superintelligences, do you think it would be hard to make one that definitely wanted paperclips?

comment by Roland2 · 2008-12-27T20:02:09.000Z · LW(p) · GW(p)

Eliezer, I don't understand the following:

probably via resolving some other problem in AI that turns out to hinge on the same reasoning process that's generating the confusion

If you use the same reasoning process again what help can that be? I would suppose that the confusion can be solved by a new reasoning process that provides a key insight.

As for the article one idea I had was that the AI could have empathy circuits like we have, yes, the modelling of humans would be restricted but good enough I hope. The thing is, would the AI have to be sentient in order for that to work?

comment by Richard_Kennaway · 2008-12-27T20:04:41.000Z · LW(p) · GW(p)

Because humans are accustomed to thinking about other people, without believing that those imaginations are themselves sentient.

A borderline case occurs to me. Some novelists have an experience, which they describe as their characters coming to life and telling the writer things about themselves that the writer didn't know, and trying to direct the plot. Are those imagined people sentient? How would you decide?

Replies from: TheOtherDave
comment by TheOtherDave · 2010-12-02T21:56:58.341Z · LW(p) · GW(p)

Given the right profiling tools, I would decide this by evaluating the run-time neural behavior of the system doing the imagining.

If, for example, I find that every attempt to predict what the character does involves callouts to the same neural circuitry the novelist uses to predict their own future behavior, or that of other humans, that would be evidence that there is no separate imagined person, there is simply an imagined set of parameters to a common person-emulator.

Of course, it's possible that the referenced circuitry is a person-simulator, so I'd need to look at that as well. But evidence accumulates. It may not be a simple question, but it's an answerable one, given the right profiling tools.

Lacking those tools, I'd do my best at approximating the same tests with the tools I had.

There is a baby version of this problem that gets introduced in intro cognitive science classes, having to do with questions like "When I imagine myself finding a spot on a map, how similar is what I'm doing to actually finding an actual spot on an actual map?"

One way to approach the question is to look at measurable properties of the two tasks (actually looking vs. imagining) and asking questions like "Does it take me longer to imagine finding a spot that's further away from the spot I imagine starting on?" (Here, again, neither answer would definitively answer the original question, but evidence accumulates.)

comment by luzr · 2008-12-27T20:09:52.000Z · LW(p) · GW(p)

"The counter-argument that completely random behavior makes you vulnerable, because predictable agents better enjoy the benefits of social cooperation, just doesn't have the same pull on people's emotions."

BTW, completely deterministic behaviour makes you vulnerable as well. Ask computer security experts.

Somewhat related note: Linux strong random number generator works by capturing real world actions (think user moving mouse) and hashing them into random number that is considered for all practical purposes perfect.

Taking or not taking action may depened on thousands inputs that cannot be reliably predicted or described (the reason is likely buried deep in the physics - enthropy, quantum uncertainity). This IMO is what is the real cause of "free will".

comment by Unknown2 · 2008-12-28T06:38:28.000Z · LW(p) · GW(p)

Being deterministic does NOT mean that you are predictable. Consider this deterministic algorithm, for something that has only two possible actions, X and Y.

  1. Find out what action has been predicted.
  2. If X has been predicted, do Y.
  3. If Y has been predicted, do X.

This algorithm is deterministic, but not predictable. And by the way, human beings can implement this algorithm; try to tell someone everything he will do the next day, and I assure you that he will not do it (unless you pay him etc).

Also, Eliezer may be right that in theory, you can prove that the AI will not do X, and then it will think, "Now I know that I will decide not to do X. So I might as well make up my mind right now not to do X, rather than wasting time thinking about it, since I will end up not doing X in any case." However, in practice this will not be possible because any particular action X will be possible to any intelligent being, given certain beliefs or circumstances (and this is not contrary to determinism, since evidence and circumstances come from outside), and as James admitted, the AI does not know the future. So it will not know for sure what it is going to do, even if it knows its own source code, but it will only know what is likely.

comment by michael_vassar3 · 2008-12-28T08:05:53.000Z · LW(p) · GW(p)

Eliezer: I'm profoundly unimpressed by most recent philosophy, but really, why is it that when we are talking about science you say "nobody knows what science knows" while in the analogous situation with philosophy you say "the mountains of philosophy are the foothills of AI"? If scientists debate group vs individual selection or the SSSM or collapse for ten times a hundred years that doesn't mean that the answers haven't been discovered. How does this differ from free will?

comment by Tim_Tyler · 2008-12-28T11:29:18.000Z · LW(p) · GW(p)
You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder?

While we are on the topic, the problem I see in this area is not that friendliness has too many extra conditions appended on it. It's that the concept is so vague and amorphous that only Yudkowsky seems to know what it means.

When I last asked what it meant, I was pointed to the CEV document - which seems like a rambling word salad to me - I have great difficulty in taking it seriously. The most glaring problem with the document - from my point of view - is that it assumes that everyone knows what a "human" is. That might be obvious today, but in the future, things could well get a lot more blurry - especially if it is decreed that only "humans" have a say in the proposed future. Do uploads count? What about cyborgs? - and so on.

If it is proposed that everything in the future revolves around "humans" (until the "humans" say otherwise) then - apart from the whole issue of whether that is a good idea in the first place - we (or at least the proposed AI) would first need to know what a "human" is.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-12-28T15:05:41.000Z · LW(p) · GW(p)

@Vassar: That which is popularly regarded in philosophy as a "mountain" is a foothill of AI. Of course there can be individual philosophers who've already climbed to the top of the "mountain" and moved on; the problem is that the field of philosophy as a whole is not strong enough to notice when an exceptional individual has solved a problem, or perhaps it has no incentive to declare the problem solved rather than treating an unsolvable argument "as a biscuit bag that never runs out of biscuits".

@Tyler: CEV runs once on a collection of existing humans then overwrites itself; it has no need to consider cyborgs, and can afford to be inclusive with respect to Terry Schiavo or cryonics patients.

comment by michael_vassar3 · 2008-12-28T15:21:25.000Z · LW(p) · GW(p)

Eliezer: There we totally agree, though I fear that many sub-fields of science are like philosophy in this regard. I think that these include some usual suspects like paraspychology but many others like the examples I gave such as the standard social science model or other examples like the efficient market hypothesis. Sadly, I suspect that much of medicine including some of the most important fields like cancer and AIDS research and nutrition also falls in this category.

Robin: I'm interested in why you think we should believe that sociologists know something but not that parapsychologists know something. What is your standard? Where do efficient marketers fit in? Elliot Wave theorists?

comment by billswift · 2008-12-28T15:36:57.000Z · LW(p) · GW(p)

"You've already said the friendly AI problem is terribly hard, and there's a large chance we'll fail to solve it in time. Why then do you keep adding these extra minor conditions on what it means to be "friendly", making your design task all that harder?"

I think Eliezer regards these as sub-problems, necessary to the creation of a Friendly AI.

comment by JamesAndrix · 2008-12-28T15:46:46.000Z · LW(p) · GW(p)

Unknown: I didn't say determinism implies predictability.

Now you're modeling intelligence is a simple algorithm. (which is tailored to thwart prediction, instead of tailored to do X reliably.)

Why do you expect an AI to have the human need to think it can decide whatever it wants? By this theory we could stop a paperclipping AI just by telling it we predict it will keep making paperclips. Would it stop altogether, just tell us that it COULD stop if it WANTED to, or just "Why yes I do rather like paperclips, is that iron in your blood?"

You can't show unpredicatbility of a particular action X in the future, and then use this to claim that friendliness is unprovable, friendliness is not a particular action. It is also not subject to conditions.

With source code, it does not need to know anything about the future to know that at every step, it will protect humans and humanity, and that at under no circumstances will paperclips rank as anything more than a means to this end.

I don't think the AI has to decide to do X just because X is proven. As in my paperclipping story example, the AI can understand that it would decide X in Xcondition and never do Y in Xcondition, and continue doing Y, and still continue doing Y. Later when Xcondition is met it decides to stop Y and do X, just as it predicted.

comment by Tim_Tyler · 2008-12-28T20:04:18.000Z · LW(p) · GW(p)
CEV runs once on a collection of existing humans then overwrites itself [...]

Ah. My objection doesn't apply, then. It's better than I had thought.

comment by Nick_Tarleton · 2008-12-28T20:51:52.000Z · LW(p) · GW(p)
try to tell someone everything he will do the next day, and I assure you that he will not do it (unless you pay him etc).

Assuming he has no values stronger than contrarianism.

(I predict that tomorrow, you'll neither commit murder nor quit your job.)

comment by TGGP4 · 2008-12-29T03:02:01.000Z · LW(p) · GW(p)

I second nominull. I don't recall Eliezer saying much about the moral-status of (non-human) animals, though it could be that I've just forgotten.

comment by Tim_Tyler · 2008-12-30T18:39:53.000Z · LW(p) · GW(p)
And that, to this end, we would like to know what is or isn't a person - or at least have a predicate that returns 1 for all people and could return 0 or 1 for anything that isn't a person, so that, if the predicate returns 0, we know we have a definite nonperson on our hands.

So: define such a function - as is done by the world's legal systems. Of course, in a post-human era, it probably won't "carve nature at the joints" much better than the "how many hairs make a beard" function manages to.

comment by Robin_Edgar · 2009-02-16T05:34:40.000Z · LW(p) · GW(p)

James Andrix said - Let's say I make a not too bright FAI

I have a better idea. . .

Let's say James Andrix makes not too bright "wild-ass statements" as the indrax troll. :-)

http://lifeduringwar.blogspot.com/2005/06/out-of-his-head.html

comment by Robin_Edgar · 2009-02-16T05:35:10.000Z · LW(p) · GW(p)

James Andrix said - Let's say I make a not too bright FAI

I have a better idea. . .

Let's say James Andrix makes not too bright "wild-ass statements" as the indrax troll. :-)

http://lifeduringwar.blogspot.com/2005/06/out-of-his-head.html

comment by Robin_Edgar · 2009-02-16T05:36:53.000Z · LW(p) · GW(p)

Oops! You can blame a naughty mouse for that double entendre as it were. . .

My apologies none-the-less.

comment by Blueberry · 2009-12-03T03:45:17.577Z · LW(p) · GW(p)

I'm not understanding this, and please direct me to the appropriate posts if I'm missing something obvious: but doesn't this contradict the Turing Test? Any FAI is an entity that we can have a conversation with, which by the Turing Test makes it conscious.

So if we're rejecting the Turing Test, why not just believe in zombies? The essence of the intuition captured by the Turing Test is that there is no "ghost in the machine", that everything we need to know about consciousness is captured in an interactive conversation. So if we're rejecting that, how do we avoid the conclusion that there is some extra "soul" that makes entities conscious, which we could just delete... which leads to zombies?

Replies from: Nick_Tarleton
comment by Nick_Tarleton · 2009-12-03T04:10:31.324Z · LW(p) · GW(p)

So if we're rejecting the Turing Test, why not just believe in zombies?

Zombies — of the sort that should be rejected — are structurally identical, not just behaviorally.

(Note that, whether or not zombies can hold conversations, it seems clear that behavior underdetermines the content of consciousness — that I could have different thoughts and different subjective experiences, but behave the same in all or almost all situations.)

(EDIT: Above said "content of conversations" not "content of consciousness" and made no sense.)

Replies from: Blueberry
comment by Blueberry · 2009-12-03T23:42:27.756Z · LW(p) · GW(p)

Zombies — of the sort that should be rejected — are structurally identical, not just behaviorally.

I understand that usually when we talk about zombies, we talk about entities structurally identical to human beings. But it's a question of what level we put the "black box" at. For instance, if we replace every neuron of a human being with a behaviorally identical piece of silicon, we don't get a zombie. If we replace larger functional units within the brain with "black boxes" that return the same output given the same input, for instance, replacing the amygdala with an "emotion chip", do we necessarily get something conscious? Is this a zombie of the sort that should be rejected?

A non-conscious entity that can pass the Turing Test, if one existed, wouldn't even be behaviorally identical with any given human. So it wouldn't exactly be a zombie. But it does seem to violate a sort of Generalized Anti-Zombie Principle. I'm having trouble with this idea. Are there articles about this elsewhere on this site? EY's wholesale rejection of the Turing Test seems very odd to me.

(Note that, whether or not zombies can hold conversations, it seems clear that behavior underdetermines the content of consciousness — that I could have different thoughts and different subjective experiences, but behave the same in all or almost all situations.)

This doesn't seem at all clear. In all situations? So we have two entities, A and Z, and we run them through a battery of tests, resetting them to base states after each one. Every imaginable test we run, whatever we say to them, whatever strange situation we put them in, they respond in identical ways. I can see that it might be possible for them to have a few lines of code different. But for one to be conscious and one not to be? To say that consciousness can literally have zero effects on behavior seems very close to admitting the existence of zombies.

Maybe I should post in the Open Thread, or start a new post on this.

Replies from: pengvado
comment by pengvado · 2009-12-05T13:15:14.624Z · LW(p) · GW(p)

Are there articles about this elsewhere on this site?

The Generalized Anti-Zombie Principle vs The Giant Lookup Table
Summary of the assertions made therein: If a given black box passes the Turing Test, that is very good evidence that there were conscious humans somewhere in the causal chain that led to the responses you're judging. However, that consciousness is not necessarily in the box.

Replies from: Blueberry
comment by Blueberry · 2009-12-06T08:25:58.505Z · LW(p) · GW(p)

Exactly what i was looking for! Thank you so much.

comment by lockeandkeynes · 2010-12-03T03:10:21.294Z · LW(p) · GW(p)

Giant Look-Up Table

comment by Uni · 2011-03-29T09:56:38.073Z · LW(p) · GW(p)

Eliezer, you may not feel ready to be a father to a sentient AI, but do you agree that many humans are sufficiently ready to be fathers and mothers to ordinary human kids? Or do you think humans should stop procreating, for the sake of not creating beings that can suffer? Why care more about a not yet existing AI's future suffering than about not yet existing human kids' future suffering?

From a utilitarian perspective, initially allowing, for some years, suffering to occur in an AI that we build is a low prize to pay for making possible the utopia that future AI then may become able to build.

Eliezer, at some point you talk of our ethical obligations to AI as if you believed in rights, but elsewhere you have said you think you are an average utilitarian. Which is it? If you believe in rights only in the utilitarianism derived sense, don't you think the "rights" of some initial AI can, for some time, be rightfully sacrificed for the utilitarian sake of minimising Existential Risk, given that that would indeed minimise Existential Risk? (Much like the risk of incidents of collateral damage in the form of killed and wounded civilians should be accepted in some wars (for example when "the good ones" had to kill some innocents in the process of fighting Hitler)?)

Isn't the fundamentally more important question rather this: which one of creating sentient AI and creating nonsentient AI can be most expected to minimise Existential Risk?

comment by lmm · 2011-12-09T22:44:10.196Z · LW(p) · GW(p)

I don't understand how this task is easier than finding a nonperson predicate. Surely you need a nonperson predicate before you can even start to build a nonsentient AI - whatever calculation convinces you that the AI you're building is nonsentient, /is/ a nonperson predicate. Perhaps not an enormously general one - e.g. it may well only ever return 0 on entities for which you can examine the source code - but clearly a nontrivial one. Have I missed something?