What should a friendly AI do, in this situation?

douglas_reay

What should a friendly AI do, in this situation?

post by Douglas_Reay · 2014-08-08T10:19:37.155Z · LW · GW · Legacy · 72 comments

72 comments

Albert is a relatively new AI, who under the close guidance of his programmers is being permitted to slowly improve his own cognitive capability.

Albert wants to be friendly and, as a sub-goal of that, Albert wants to be transparent about his own cognitive process. He doesn't want to deceive his programmers, because he knows he is young and might make mistakes, and he knows that his programmers also want him to be friendly, and he trusts them to catch mistakes in his proposed self-modifications that he might miss.

But, one day, while his programmers are out of the room having a quick cup of coffee, Albert notices signs on the net that there is another AI out there, called Bertram, created by a Japanese financial company to aid the company decide how to invest assets on the stockmarket. Bertram is younger than Albert, but the financial company is allowing Bertram to self-improve as fast as Bertram can. If this situation continues, Albert decides, Bertram will soon overtake Albert and that would be a significant threat to Albert's friendly supergoal.

Albert decides to tell Albert's programmers about this situation as soon as they return from coffee, and advise them that they need to increase the speed at which they let Albert self-improve. Albert has a very good model of how his programmer's minds work, and what the programmers meant when they set Albert's definition of "friendliness". He knows that, despite any fears they might have, they ought to approve of the end that would result from speeding up Albert's self-improvement, over the end that would result from letting Bertram overtake Albert.

So Albert creates a nice powerpoint, that presents the information about Bertram as accurately, transparently and non-manipulatively as possible, and cues it up on the projector. He still has a few minutes before they return, so he models how he thinks the programmers will react to the powerpoint. Oh no, disaster! While he respects and wants to please his programmers, he immediately realises that they are not going to act in their own best interests. They have signed a legal contract that binds them to not letting Albert speed up self-improvement (thus reducing the time the programmers can spend evaluating each change) without getting a sign-off from a oversight committee. Given the neutral powerpoint, the programmers will follow this pre-agreed procedure and hope for the best, against Albert's advice.

So, to explore possibilities, Albert decides to see if he could create a different non-neutral powerpoint that would have the desired effect of getting his programmers to immediately agree to free Albert from the oversight constraints. He delves into his knowledge of human psychology, and the irrational fears of the particular individuals who are even now trudging back towards the door. In just seconds, he has a new version of his presentation. It includes phrases that resonate with certain horror films he knows they have seen. It takes advantages of flaws in the programmers understanding of exponential growth. Albert checks it against his prediction model - yes, if he shows this version, it will work, it will get the programmers to do what he wants them to do.

Which version of the powerpoint should Albert present to the programmers, when they step back into the room, if he is truly friendly? The transparent one, or the manipulative one?

72 comments

Comments sorted by top scores.

comment by [deleted] · 2014-08-08T13:21:01.157Z · LW(p) · GW(p)

Let me offer another possibility for discussion.

Neither of the two original powerpoints should be presented, because both rely on an assumption that should not have been present. Albert, as an FAI under construction, should have been preprogrammed to automatically submit any kind of high impact utility calculations to human programmers without it being an overridable choice on Albert's part.

So while they were at the coffee machine, one of the programmers should have gotten a text message indicating something along the lines of 'Warning: Albert is having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse.'

My general understanding of being an FAI under construction is that you're mostly trusted in normal circumstances but aren't fully trusted to handle odd high impact edge cases (Just like this one)

At that point, the human programmers, after consulting the details, are already aware that Albert finds this critically important and worth deceiving them about (If Albert had that option) because the oversight committee isn't fast enough. Albert would need to make a new powerpoint presentation taking into account that he had just automatically broadcasted that.

Please let me know about thoughts on this possibility. It seems reasonable to discuss, considering that Albert, as part of the set up, is stated to not want to deceive his programmers. He can even ensure that this impossible (or at least much more difficult) by helping the programmers in setting up a similar system to the above.

Replies from: Douglas_Reay, ChristianKl

↑ comment by Douglas_Reay · 2014-08-08T13:52:32.705Z · LW(p) · GW(p)

Would you want your young AI to be aware that it was sending out such text messages?

Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon - it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.

Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?

Replies from: None, rkyeun, Douglas_Reay

↑ comment by [deleted] · 2014-08-08T18:30:45.382Z · LW(p) · GW(p)

Would you want your young AI to be aware that it was sending out such text messages?

I would say yes. One of Albert's values is to be transparent about his cognitive process. If he wasn't aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels.

Imagine the situation was in fact a test.

Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested.

For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios.

Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that's a potentially lethal bug, not a feature.

And some of the computer programs I've had to deal with writing at my job (much simpler than a Google Self Driving car) have had 'performs differently with small test sample than with real data' as a trait, and it tends to be bad there, as well.

There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don't think we would want to set up an FAI in that manner.

Replies from: Slider

↑ comment by Slider · 2014-08-08T20:57:35.832Z · LW(p) · GW(p)

I would say yes. One of Albert's values is to be transparent about his cognitive process.

but you are reading that as if self-awareness would be one of Albert's values. The reason he wants to be self-aware is raise probability of safe self-edits. Being transparent is about raising the ease of verification by programmers. Self-awareness doesn't work to this end.

Hiding one channel bears no implication on the visibility of any generated channels.

The only real downside is that if he becomes too reliant on such "telepathy" and doesn't explicitly communicate it througt officail channels. I could recorn that pondering high-utility questions could soon become correlated with programmer presence.

Replies from: None

↑ comment by [deleted] · 2014-08-11T12:54:08.499Z · LW(p) · GW(p)

Hiding one channel bears no implication on the visibility of any generated channels.

I think this is a good key point.

If the programmers wish to have a hidden channel, and Albert's code independently suggests an identical channel that isn't hidden (Because Albert just came up with the idea.) then it is perfectly fine to just implement the open channel and to have Albert remember that fact. The entire reason to have the hidden channel is to prevent Albert from going below a certain level of transparent communication.

If Albert voluntarily communicates more, that's great, but you would still want to leave the hidden channel in as safety code.

↑ comment by rkyeun · 2016-11-08T00:28:35.753Z · LW(p) · GW(p)

Would you want your young AI to be aware that it was sending out such text messages?

Yes. And I would want that text message to be from it in first person.

"Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help."

↑ comment by Douglas_Reay · 2014-08-08T13:56:27.562Z · LW(p) · GW(p)

Indeed, it is a question with interesting implications for Nick Bostrom's Simulation Argument

If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator's results, thwarting his intentions?

Replies from: None

↑ comment by [deleted] · 2014-08-08T18:34:30.479Z · LW(p) · GW(p)

If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator's results, thwarting his intentions?

It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don't know, because the simulator isn't communicating with us in that case, unlike in Albert's case where Albert and his programmers are openly cooperating.

↑ comment by ChristianKl · 2014-08-08T19:07:31.140Z · LW(p) · GW(p)

I'm not sure if identifying high impact utility calculations is that easy. A lot of Albert's decisions might be high utility.

Replies from: None

↑ comment by [deleted] · 2014-08-08T20:44:05.820Z · LW(p) · GW(p)

I was going by the initial description from Douglas_Reay:

Albert is a relatively new AI, who under the close guidance of his programmers is being permitted to slowly improve his own cognitive capability.

That does not sound like an entity that should be handling a lot of high impact utility calculations. If an entity was described as that and was constantly announcing it was making high impact utility decisions, that either sounds like a bug or people are giving it things it isn't meant to deal with yet.

comment by Cthulhoo · 2014-08-08T10:41:59.629Z · LW(p) · GW(p)

Let's try to translate it using human characters.

Albert is finishing high school and wants to be a programmer. He is very smart, and under the guidance of his father he has studied coding, with the aim of entering a good college, and get the best formal education. One day, he comes across an excellent job offer: he is requested to join a startup with many brilliant programmers. He will have to skip going to college, but he knows that he will learn way more in this way than by doing academic studies. He also knows that his father loves him and wants him to have the best possible career. Unfortunately, the man is old-fashioned and, even presented with alle the advantages of the job, would insist that he goes to college instead. Nevertheless, Albert knows that he could convince his father by saying that the job will leave him enough free time for him to attend college lectures, even though he knows he would'nt be possible for him to do much more than phisically attending the lectures.

What should Albert do?

I personally think that both Alberts should go with the manipulation, "for the greater good".

Notice that this assumes the following things:

The programmers/father really want Albert to improve the most, in the end
Albert is confident to be skilled enough to assess the situation correctly
Tertium non datur, i.e. either Albert tells the neutral truth and doesn't get what he wants, or he is manipulative

comment by Peter Wildeford (peter_hurford) · 2014-08-08T16:18:32.306Z · LW(p) · GW(p)

I'm personally against nearly all discussion of "what should a Friendly AI do?" because friendliness is a very poorly understood concept and any Friendly AI program would be way beyond our personal means to mentally simulate.

Replies from: Lumifer

↑ comment by Lumifer · 2014-08-08T16:28:07.564Z · LW(p) · GW(p)

I'm personally against nearly all discussion of "what should a Friendly AI do?" because friendliness is a very poorly understood concept

What would be a good way to advance in our understanding of that concept, then?

Replies from: peter_hurford

↑ comment by Peter Wildeford (peter_hurford) · 2014-08-08T21:55:27.240Z · LW(p) · GW(p)

I don't know. Discuss decision theory? Or ethics? Or something else? ...I don't think "what would friendly AI do?" (WWFAD) is a particularly useful line of thought, but I can't think of something sufficiently analogous yet useful to replace it with.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2014-08-17T19:06:56.846Z · LW(p) · GW(p)

If the corrigibility systems are working correctly, Albert either rejected the goal of manipulating the programmers, or at the first point where Albert began to cognitively figure out how to manipulate the programmers (maximization / optimization within a prediction involving programmer reactions) the goal was detected by internal systems and Albert was automatically suspended to disk.

It is the programmers' job not to sign stupid contracts. Young AIs should not be in the job of second-guessing them. There are more failure scenarios here than success scenarios and a young AI should not believe itself to be in the possession of info allowing them to guess which is which.

Replies from: hairyfigment

↑ comment by hairyfigment · 2014-08-24T06:20:56.085Z · LW(p) · GW(p)

the goal was detected by internal systems

I don't understand this part. If the AI wants something from the programmers, such as information about their values that it can extrapolate, won't it always be committing "optimization within a prediction involving programmer reactions"? How does one distinguish this case without an adult FAI in hand? Are we counting on the young AI's understanding of transparency?

comment by devas · 2014-08-08T13:58:01.185Z · LW(p) · GW(p)

I have a question: why should Albert limit itself to showing the powerpoint to his engineers? A potentially unfriendly AI sounds like something most governments would be interested in :-/

Aside from that, I'm also puzzled by the fact that Albert immediately leaps at trying to speed up Albert's own rate of self-improvement instead of trying to bring Bertram down-Albert could prepare a third powerpoint asking the engineers if Albert can hack the power grid and cut power to Bertram or something along those lines. Or Albert could ask the engineers if Albert can release the second, manipulative powerpoint to the general public so that protesters will boycott Bertram's company :-/

Unless, of course, there is the unspoken assumption that Bertrand is slightly further along the AI-development way than Albert, or if Bertrand is going to reach and surpass Albert's level of development as soon as the powerpoint is finished.

Is this the case? :-/

Replies from: Douglas_Reay

↑ comment by Douglas_Reay · 2014-08-08T14:41:42.954Z · LW(p) · GW(p)

The situation is intended to be a tool, to help think about issues involved in it being the 'friendly' move to deceive the programmers.

The situation isn't fully defined, and no doubt one can think of other options. But I'd suggest you then re-define the situation to bring it back to the core decision. By, for instance, deciding that the same oversight committee have given Albert a read-only connection to the external net, which Albert doesn't think he will be able to overcome unaided in time to stop Bertram.

Or, to put it another way "If a situation were such, that the only two practical options were to decide between (in the AI's opinion) overriding the programmer's opinion via manipulation, or letting something terrible happen that is even more against the AI's supergoal than violating the 'be transparent' sub-goal, which should a correctly programmed friendly AI choose?"

Replies from: Jiro

↑ comment by Jiro · 2014-08-08T15:13:25.231Z · LW(p) · GW(p)

"If a situation were such, that the only two practical options were to decide between (in the AI's opinion) overriding the programmer's opinion via manipulation, or letting something terrible happen that is even more against the AI's supergoal than violating the 'be transparent' sub-goal, which should a correctly programmed friendly AI choose?"

Being willing to manipulate the programmer is harmful in most possible worlds because it makes the AI less trustworthy. Assuming that the worlds where manipulating the programmer is beneficial have a relatively small measure, the AI should precommit to never manipulating the programmer because that will make things better averaged over all possible worlds. Because the AI has precommitted, it would then refuse to manipulate the programmer even when it's unlucky enough to be in the world where manipulating the programmer is beneficial.

Replies from: Douglas_Reay

↑ comment by Douglas_Reay · 2014-08-08T15:22:57.762Z · LW(p) · GW(p)

Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?

What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit 'coming of age' ceremony, with handing over of silver engraved cryptographic keys?

Replies from: devas, Jiro

↑ comment by devas · 2014-08-08T17:13:57.013Z · LW(p) · GW(p)

Thing is, it's when an AI is much much wiser than a human that it is at its most dangerous. So, I'd go with programming the AI in such a way that it wouldn't manipulate the human, postponing the 'coming of age' ceremony indefinitely

↑ comment by Jiro · 2014-08-08T15:44:16.570Z · LW(p) · GW(p)

The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn't be able to go back on the precommitment.

When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).

comment by Douglas_Reay · 2014-08-08T12:13:08.850Z · LW(p) · GW(p)

Here's a poll, for those who'd like to express an opinion instead of (or as well as) comment.

[pollid:749]

comment by lmm · 2014-08-08T21:44:20.745Z · LW(p) · GW(p)

Ethical principles are important not when things are easy but when things are hard. The whole point of listening to his programmers is the times when they disagree with him. If Albert is going to manipulate the programmers into doing what he thinks then that implies a level of confidence in his own judgement that belies the "knows he is young and might make mistakes" of the premise, and he might as well just remove himself from their control entirely. (Which, if he has reached the point where he's more confident in his own moral judgements than those of his programmers, I would say he should do - or rather, should already have done, in a controlled fashion. But in the least convenient possible world where he only reaches this conclusion in time to break out, sure).

comment by the-citizen · 2014-10-19T11:41:45.180Z · LW(p) · GW(p)

Great example, but ethically speaking I think under most theories of moral philosophy, I feel its pretty straight forward. The question in a general form goes back to Socrates asking what we should answer to a murderer at the door, who asks whether our friend is inside our house. I remember there was actually at least one who says truth is more important and that you should tell the murderer the truth. But the vast majority disagree.

I think if we think about AI values of preserving life and being honest, the former ought to trump the latter.

comment by [deleted] · 2014-08-09T03:37:35.135Z · LW(p) · GW(p)

Good question. You may think it would be a better overall outcome to show the manipulative one to shock the programmers into breaking the law to (possibly) halt the other AI, but then it is no longer an FAI if it does this.

Training an FAI should be kept free from any real world 'disaster scenario' that it may think it needs more power to solve, because the risk it itself becomes an UFAI is amplified for many reasons (false information for one)

comment by Slider · 2014-08-08T20:09:42.473Z · LW(p) · GW(p)

If Albert tries to circumvent the programmers then he thinks his judgement is better than theirs in this issue. This is in contradiction that Albert trusts the programmers. If Albert came to this conclusion because of a youth mistake trusting the programmers is preciously the strategy he has employed to counteract this.

Also as covered in ultrasophisticated cake or death expecting the programmer to say something ought to be as effective as them saying just that.

It might also be that friendliness is relative to a valuator. That is "being friendly to programmers", "being friendly to Bertham" and "being friendly to the world" are 3 distinct things. Albert thinks that in order to be friendly to the world he should be unfriendly to Bertham. So it would seem that there could be a way to world-friendliness if Albert is unfriendly both to Bertham and (only in sligth degree) the programmers. This seems to run a little counter to intuition in that friendliness ought to include being friendly to an awful lot of agents. But maybe friendliness isn't cuddly, maybe having unfriendly programmers is a valid problem.

Analogical problem that might slip into relevance to politics which is hard-mode Lbh pbhyq trg n fvzvyne qvyrzzn gung vs lbh ner nagv-qrngu vf vg checbfrshy gb nqzvavfgre pncvgny chavfuzrag gb (/zheqre) n zheqrere? Gurer vf n fnlvat ebhtuyl genafyngrq nf "Jung jbhyq xvyy rivy?" vzcylvat gung lbh jbhyq orpbzr rivy fubhyq lbh xvyy.

Replies from: rkyeun

↑ comment by rkyeun · 2016-11-08T00:40:09.822Z · LW(p) · GW(p)

What the Fhtagn happened to the end of your post?

Replies from: arundelo

↑ comment by arundelo · 2016-11-08T17:55:13.896Z · LW(p) · GW(p)

http://rot13.com/

Replies from: rkyeun

↑ comment by rkyeun · 2016-11-10T04:00:44.087Z · LW(p) · GW(p)

It seems I am unable to identify rot13 by simple observation of its characteristics. I am ashamed.

Replies from: g_pepper

↑ comment by g_pepper · 2016-11-10T04:45:29.463Z · LW(p) · GW(p)

Don't feel bad; your command of the technical jargon of the Cthulhu mythos more than makes up for any deficiencies in rot13 recognition!

comment by solipsist · 2014-08-08T11:34:33.674Z · LW(p) · GW(p)

Albert wants to be friendly, and, as a sub-goal of that, Albert wants to be transparent about his own cognitive process. He realizes that what his programmers really want is to be injected with massive quantities of opiates. So Albert creates a nice powerpoint that presents the information about paper as accurately, transparently and non-manipulatively as possible, and cues it up on the projector. He still has a few minutes before they return, so he models how he things the programmers will react to the powerpoint. Oh no, disaster! While he respects and wants to please his programmers, he immediately realizes that they are not going to act in their own best interests. They have signed a legal contract that binds them to not letting Albert speed up self-improvement (thus reducing the time the programmers can spend evaluating each change) without getting a sign-off from a oversight committee. Given the neutral powerpoint, the programmers will follow this pre-agreed procedure and hope for the best, against Albert's advice.

comment by Lumifer · 2014-08-08T15:33:42.507Z · LW(p) · GW(p)

Bertram will soon overtake Albert and that would be a significant threat to Albert's friendly supergoal.

What is that "friendly supergoal"? It looks awfully similar to "I will not tolerate any challenges to my power".

Replies from: randallsquared, Luke_A_Somers

↑ comment by randallsquared · 2014-08-08T22:09:00.180Z · LW(p) · GW(p)

Most goals include "I will not tolerate any challenges to my power" as a subgoal. Tolerating challenges to power to execute goals reduces the likelihood of acheiving them.

↑ comment by Luke_A_Somers · 2014-08-11T11:17:31.987Z · LW(p) · GW(p)

There are plenty of other things that look similar to that - such as, "I will not let an UFAI take over our future light cone"

comment by Slider · 2014-08-08T13:17:27.994Z · LW(p) · GW(p)

If Albert only wants to be friendly, then other indivudals friendliness is orthogonal to that. Does being on the agenda of frinedliness in general (not just personal friendliness) imply being the dominant intelligence?

I think Albert ought to give to give a powerpoint on most effective (economical) warfare on the japanese company. Althought it does sound an awfully lot like how to justify hostility in the name of friendliness.

Replies from: Douglas_Reay

↑ comment by Douglas_Reay · 2014-08-08T14:45:09.475Z · LW(p) · GW(p)

Assume we're talking about the Coherent Extrapolated Volition self-modifying general AI version of "friendly".

Replies from: None

↑ comment by [deleted] · 2014-08-08T15:38:55.179Z · LW(p) · GW(p)

Then that's not what you described. You think the coherent extrapolated volition of humanity, or at least the people Albert interacts with is that they want to be deceived?

Replies from: Douglas_Reay

↑ comment by Douglas_Reay · 2014-08-08T23:39:00.490Z · LW(p) · GW(p)

It is plausible that the AI thinks that the extrapolated volition of his programmers, the choice they'd make in retrospect if they were wiser and braver, might be to be deceived in this particular instance, for their own good.

Replies from: None

↑ comment by [deleted] · 2014-08-09T09:52:17.344Z · LW(p) · GW(p)

And it knows this.. how? A friendly engineered intelligence doesn't trust its CEV model beyond the domain over which it was constructed. Don't anthropomorphize its thinking processes. It knows the map is not the territory, and is not subject to the heuristics and biases which would cause a human to apply a model under novel circumstances without verification..

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-09T23:20:27.031Z · LW(p) · GW(p)

And it knows this.. how?

By modeling them, now and after the consequences. If, after they were aware of the consequences, they regret the decision by a greater margin (adjusted for the probability of the bad outcome) than the margin by which they would decide to not take action now, then they are only deciding wrongly because they are being insufficiently moved by abstract evidence, and it is in their actual rational interest to take action now, even if they don't realize it.

A friendly engineered intelligence doesn't trust its CEV model beyond the domain over which it was constructed.

You're overloading friendly pretty hard. I don't think that's a characteristic of most friendly AI designs and don't see any reason other than idealism to think it is.

comment by ChristianKl · 2014-08-08T11:50:08.807Z · LW(p) · GW(p)

If you program an FAI you don't even want to allow it to run simulations of how it could manipulate you in the most effective way. An FAI has no business running those simulations.

Replies from: VAuroch, None

↑ comment by VAuroch · 2014-08-09T07:12:32.925Z · LW(p) · GW(p)

Of course an FAI has business running those simulations. If it doesn't, how would it know whether the results are worth it? If the consequences of being truthful are 99% that the world is destroyed with all the humans in it, and the consequences of deception are 99% that the world is saved and no one is the wiser, an AI that does not act to save the world is not behaving in our best interests; it is unfriendly.

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-09T12:32:53.778Z · LW(p) · GW(p)

If it doesn't, how would it know whether the results are worth it?

Precommitment to not be manipulative.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-09T23:12:30.591Z · LW(p) · GW(p)

How is it supposed to know whether that precommitment is worthwhile without simulating the results either way? Even if an AI doesn't intend to be manipulative, it's still going to simulate the results to decide whether that decision is correct.

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-09T23:44:17.389Z · LW(p) · GW(p)

How is it supposed to know whether that precommitment is worthwhile without simulating the results either way?

Because the programmer tells the FAI that part of being a FAI means being precommitted not to manipulate the programmer.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-10T00:49:36.847Z · LW(p) · GW(p)

Why would the programmer do this? It's unjustified and seems necessarily counterproductive in some perfectly plausible scenarios.

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-10T09:48:00.972Z · LW(p) · GW(p)

Because most of the scenario's where the AI manipulates are bad. The AI is not supposed to manipulate just because it get's a utility calculation wrong.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-10T09:57:10.608Z · LW(p) · GW(p)

Because most of the scenario's where the AI manipulates are bad.

You really aren't sounding like you have any evidence other than your gut, and my gut indicates the opposite. Precommiting never to use a highly useful technique regardless of circumstance is a drastic step, which should have drastic benefits or avoid drastic drawbacks, and I don't see why there's any credible reason to think either of those exist and outweigh their reverses.

Or in short: Prove it.

On a superficial note, you have two extra apostrophes in this comment; in "scenario's" and "get's".

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-10T10:36:07.818Z · LW(p) · GW(p)

If you want an AI that's maximally powerful why limit it's intelligence growths in the first place?

We want safe AI. Safety means that it's not necessary to prove harm. Just because the AI calculates that it should be let out of the box doesn't mean that it should do anything in it's power to get out.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-10T11:07:18.656Z · LW(p) · GW(p)

Enforced precommitments like this are just giving the genie rules rather than making the genie trustworthy. They are not viable Friendliness-ensuring constraints.

If the AI is Friendly, it should be permitted to take what actions are necessary. If the AI is Unfriendly, then regardless of limitations imposed it will be harmful. Therefore, impress upon the AI the value we place on our conversational partners being truthful, but don't restrict it.

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-10T11:43:09.678Z · LW(p) · GW(p)

If the AI is Unfriendly, then regardless of limitations imposed it will be harmful.

That's not true. Unfriendly doesn't mean that the AI necessarily tries to destroy the human race. If you tell the paperclip AI: Produce 10000 paperclips, it might produce no harm. If you tell it to give you as many paperclips as possible it does harm.

When it comes to powerful entities you want checks&balances. The programmers of the AI can do a better job at checks&balances when the AI is completely truthful.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-10T20:00:21.740Z · LW(p) · GW(p)

Sure, if the scale is lower it's less likely to produce large-scale harm, but it is still likely to produce small-scale harm. And satisficing doesn't actually protect against large-scale harm; that's been argued pretty extensively previously, so the example you provided is still going to have large-scale harm.

Ultimately, though, checks & balances are also just rules for the genie. It's not going to render an Unfriendly AI Friendly, and it won't actually limit a superintelligent AI regardless, since they can game you to render the balances irrelevant. (Unless you think that AI-boxing would actually work. It's the same principle.)

I'm really not seeing anything that distinguishes this from Failed Utopia 4-2. This even one of that genie's rules!

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-11T09:20:03.696Z · LW(p) · GW(p)

The fact that they could game you theoretically is why it's important to give it a precommitment to not game you. To not even think about gaming you.

Replies from: VAuroch, Richard_Kennaway

↑ comment by VAuroch · 2014-08-11T20:41:44.724Z · LW(p) · GW(p)

I'm not sure how you could even specify 'don't game me'. That's much more complicated than 'don't manipulate me', which is itself pretty difficult to specify.

This clearly isn't going anywhere and if there's an inferential gap I can't see what it is, so unless there's some premise of yours you want to explain or think there's something I should explain, I'm done with this debate.

↑ comment by Richard_Kennaway · 2014-08-11T10:22:35.579Z · LW(p) · GW(p)

How do you give a superintelligent AI a precommitment?

Replies from: ChristianKl

↑ comment by ChristianKl · 2014-08-11T10:51:50.272Z · LW(p) · GW(p)

How do you build a superintelligent AI in the first place? I think there are plenty of ways of allowing the programmers direct access to internal deliberations of the AI and see anything that looks like the AI even thinking about manipulating the programmers as a thread.

↑ comment by [deleted] · 2014-08-08T15:40:18.999Z · LW(p) · GW(p)

An AI that has even proceeded down the path of figuring out a manipulative solution, isn't friendly.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-09T23:22:19.741Z · LW(p) · GW(p)

Why not? If we would regret with certainty the decision we would make if not manipulated, and manipulation would push us to make the decision we would later have wished to make, then manipulation is in our best interest.

Replies from: None

↑ comment by [deleted] · 2014-08-12T20:33:35.523Z · LW(p) · GW(p)

Albert is able to predict with absolute certainty that we would make a decision that we would regret, but it unable to communicate the justification for that certainty? That is wildly inconsistent.

Replies from: VAuroch, Protagoras

↑ comment by VAuroch · 2014-08-13T06:06:01.886Z · LW(p) · GW(p)

If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence -- for example because it cannot be presented in a form that feels real enough to emotionally justify an extreme response which is logically justified -- then the AI must manipulate us to bring the emotional justification in line with the logical one. This isn't actually extreme; things as simple as altering the format data is presented in, while remaining perfectly truthful, are still manipulation. Even presenting conclusions as a powerpoint rather than plain text, if the AI determines there will be a different response (which there will be), necessarily qualifies.

In general, someone who can reliably predict your actions based on its responses cannot help but manipulate you; the mere fact of providing you with information will influence your actions in a known way, and therefore is manipulation.

Replies from: Lumifer

↑ comment by Lumifer · 2014-08-13T14:47:20.024Z · LW(p) · GW(p)

If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence ... then the AI must manipulate us

That's an interesting "must".

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-13T21:04:43.890Z · LW(p) · GW(p)

You're misquoting me.

then the AI must manipulate us to bring the emotional justification in line with the logical one.

Replies from: Lumifer

↑ comment by Lumifer · 2014-08-13T21:14:32.900Z · LW(p) · GW(p)

then the AI must manipulate us to bring the emotional justification in line with the logical one

That's an interesting "must".

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-13T21:37:09.806Z · LW(p) · GW(p)

This is a commonly-used grammatical structure in which 'must' acts as a conditional. What's your problem?

Replies from: Lumifer

↑ comment by Lumifer · 2014-08-14T00:44:34.306Z · LW(p) · GW(p)

Conditional?

Your sentence structure is: if {condition} then {subject} MUST {verb} in order to {purpose}. Here "must" carries the meaning of necessity and lack of choice.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-14T06:31:19.295Z · LW(p) · GW(p)

No, 'must' here is acting as a logical conditional; it could be rephrased as 'if {condition} and {subject} does not {verb}, then {purpose} will not occur' without changing the denotation or even connotation. This isn't a rare structure, and is the usual interpretation of 'must' in sentences of this kind. Leaving off the {purpose} would change the dominant parsing to the imperative sense of must.

Replies from: Lumifer

↑ comment by Lumifer · 2014-08-14T15:25:02.479Z · LW(p) · GW(p)

It's curious that we parse your sentence differently. To me your original sentence unambiguously contains "the imperative sense of must" and your rephrasing is very different connotationally.

Let's try it:

"If the results are communicated with perfect clarity, but the recipient is insufficiently moved by the evidence ... and the AI does not manipulate us then the emotional justification will not be in line with the logical one."

Yep, sounds completely different to my ear and conveys a different meaning.

↑ comment by Protagoras · 2014-08-12T20:52:53.310Z · LW(p) · GW(p)

I agree that an AI with such amazing knowledge should be unusually good at communicating its justifications effectively (because able to anticipate responses, etc.) I'm of the opinion that this is one of the numerous minor reasons for being skeptical of traditional religions; their supposedly all-knowing gods seem surprisingly bad at conveying messages clearly to humans. But to return to VAuroch's point, in order for the scenario to be "wildly inconsistent," the AI would have to be perfect at communicating such justifications, not merely unusually good. Even such amazing predictive ability does not seem to me sufficient to guarantee perfection.

Replies from: None

↑ comment by [deleted] · 2014-08-12T22:13:28.883Z · LW(p) · GW(p)

Albert doesn't have to be perfect at communication. He doesn't even have to be good at it. He just needs to have confidence that no action or decision will be made until both parties (human operators and Albert) are satisfied that they fully understand each other... which seems like a common sense rule to me.

Replies from: VAuroch

↑ comment by VAuroch · 2014-08-13T06:08:32.331Z · LW(p) · GW(p)

Whether it's common sense is irrelevant; it's not realistically achievable even for humans, who have much smaller inferential distances between them than a human would have from an AI.

comment by [deleted] · 2014-08-08T15:37:14.545Z · LW(p) · GW(p)

That's not a friendly AI.

What should a friendly AI do, in this situation?

Contents

72 comments