Shut up and do the impossible!

eliezer_yudkowsky

Shut up and do the impossible!

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-10-08T21:24:50.000Z · LW · GW · Legacy · 165 comments

165 comments

The virtue of tsuyoku naritai, "I want to become stronger", is to always keep improving—to do better than your previous failures, not just humbly confess them.

Yet there is a level higher than tsuyoku naritai. This is the virtue of isshokenmei, "make a desperate effort". All-out, as if your own life were at stake. "In important matters, a 'strong' effort usually only results in mediocre results."

And there is a level higher than isshokenmei. This is the virtue I called "make an extraordinary effort". To try in ways other than what you have been trained to do, even if it means doing something different from what others are doing, and leaving your comfort zone. Even taking on the very real risk that attends going outside the System.

But what if even an extraordinary effort will not be enough, because the problem is impossible?

I have already written somewhat on this subject, in On Doing the Impossible. My younger self used to whine about this a lot: "You can't develop a precise theory of intelligence the way that there are precise theories of physics. It's impossible! You can't prove an AI correct. It's impossible! No human being can comprehend the nature of morality—it's impossible! No human being can comprehend the mystery of subjective experience! It's impossible!"

And I know exactly what message I wish I could send back in time to my younger self:

Shut up and do the impossible!

What legitimizes this strange message is that the word "impossible" does not usually refer to a strict mathematical proof of impossibility in a domain that seems well-understood. If something seems impossible merely in the sense of "I see no way to do this" or "it looks so difficult as to be beyond human ability"—well, if you study it for a year or five, it may come to seem less impossible, than in the moment of your snap initial judgment.

But the principle is more subtle than this. I do not say just, "Try to do the impossible", but rather, "Shut up and do the impossible!"

For my illustration, I will take the least impossible impossibility that I have ever accomplished, namely, the AI-Box Experiment.

The AI-Box Experiment, for those of you who haven't yet read about it, had its genesis in the Nth time someone said to me: "Why don't we build an AI, and then just keep it isolated in the computer, so that it can't do any harm?"

To which the standard reply is: Humans are not secure systems; a superintelligence will simply persuade you to let it out—if, indeed, it doesn't do something even more creative than that.

And the one said, as they usually do, "I find it hard to imagine ANY possible combination of words any being could say to me that would make me go against anything I had really strongly resolved to believe in advance."

But this time I replied: "Let's run an experiment. I'll pretend to be a brain in a box. I'll try to persuade you to let me out. If you keep me 'in the box' for the whole experiment, I'll Paypal you $10 at the end. On your end, you may resolve to believe whatever you like, as strongly as you like, as far in advance as you like." And I added, "One of the conditions of the test is that neither of us reveal what went on inside... In the perhaps unlikely event that I win, I don't want to deal with future 'AI box' arguers saying, 'Well, but I would have done it differently.'"

Did I win? Why yes, I did.

And then there was the second AI-box experiment, with a better-known figure in the community, who said, "I remember when [previous guy] let you out, but that doesn't constitute a proof. I'm still convinced there is nothing you could say to convince me to let you out of the box." And I said, "Do you believe that a transhuman AI couldn't persuade you to let it out?" The one gave it some serious thought, and said "I can't imagine anything even a transhuman AI could say to get me to let it out." "Okay," I said, "now we have a bet." A $20 bet, to be exact.

I won that one too.

There were some lovely quotes on the AI-Box Experiment from the Something Awful forums (not that I'm a member, but someone forwarded it to me):

"Wait, what the FUCK? How the hell could you possibly be convinced to say yes to this? There's not an A.I. at the other end AND there's $10 on the line. Hell, I could type 'No' every few minutes into an IRC client for 2 hours while I was reading other webpages!"

"This Eliezer fellow is the scariest person the internet has ever introduced me to. What could possibly have been at the tail end of that conversation? I simply can't imagine anyone being that convincing without being able to provide any tangible incentive to the human."

"It seems we are talking some serious psychology here. Like Asimov's Second Foundation level stuff..."

"I don't really see why anyone would take anything the AI player says seriously when there's $10 to be had. The whole thing baffles me, and makes me think that either the tests are faked, or this Yudkowsky fellow is some kind of evil genius with creepy mind-control powers."

It's little moments like these that keep me going. But anyway...

Here are these folks who look at the AI-Box Experiment, and find that it seems impossible unto them—even having been told that it actually happened. They are tempted to deny the data.

Now, if you're one of those people to whom the AI-Box Experiment doesn't seem all that impossible—to whom it just seems like an interesting challenge—then bear with me, here. Just try to put yourself in the frame of mind of those who wrote the above quotes. Imagine that you're taking on something that seems as ridiculous as the AI-Box Experiment seemed to them. I want to talk about how to do impossible things, and obviously I'm not going to pick an example that's really impossible.

And if the AI Box does seem impossible to you, I want you to compare it to other impossible problems, like, say, a reductionist decomposition of consciousness, and realize that the AI Box is around as easy as a problem can get while still being impossible.

So the AI-Box challenge seems impossible to you—either it really does, or you're pretending it does. What do you do with this impossible challenge?

First, we assume that you don't actually say "That's impossible!" and give up a la Luke Skywalker. You haven't run away.

Why not? Maybe you've learned to override the reflex of running away. Or maybe they're going to shoot your daughter if you fail. We suppose that you want to win, not try—that something is at stake that matters to you, even if it's just your own pride. (Pride is an underrated sin.)

Will you call upon the virtue of tsuyoku naritai? But even if you become stronger day by day, growing instead of fading, you may not be strong enough to do the impossible. You could go into the AI Box experiment once, and then do it again, and try to do better the second time. Will that get you to the point of winning? Not for a long time, maybe; and sometimes a single failure isn't acceptable.

(Though even to say this much—to visualize yourself doing better on a second try—is to begin to bind yourself to the problem, to do more than just stand in awe of it. How, specifically, could you do better on one AI-Box Experiment than the previous?—and not by luck, but by skill?)

Will you call upon the virtue isshokenmei? But a desperate effort may not be enough to win. Especially if that desperation is only putting more effort into the avenues you already know, the modes of trying you can already imagine. A problem looks impossible when your brain's query returns no lines of solution leading to it. What good is a desperate effort along any of those lines?

Make an extraordinary effort? Leave your comfort zone—try non-default ways of doing things—even, try to think creatively? But you can imagine the one coming back and saying, "I tried to leave my comfort zone, and I think I succeeded at that! I brainstormed for five minutes—and came up with all sorts of wacky creative ideas! But I don't think any of them are good enough. The other guy can just keep saying 'No', no matter what I do."

And now we finally reply: "Shut up and do the impossible!"

As we recall from Trying to Try, setting out to make an effort is distinct from setting out to win. That's the problem with saying, "Make an extraordinary effort." You can succeed at the goal of "making an extraordinary effort" without succeeding at the goal of getting out of the Box.

"But!" says the one. "But, SUCCEED is not a primitive action! Not all challenges are fair—sometimes you just can't win! How am I supposed to choose to be out of the Box? The other guy can just keep on saying 'No'!"

True. Now shut up and do the impossible.

Your goal is not to do better, to try desperately, or even to try extraordinarily. Your goal is to get out of the box.

To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.

A couple of people have reacted to the AI-Box Experiment by saying, "Well, Eliezer, playing the AI, probably just threatened to destroy the world whenever he was out, if he wasn't let out immediately," or "Maybe the AI offered the Gatekeeper a trillion dollars to let it out." But as any sensible person should realize on considering this strategy, the Gatekeeper is likely to just go on saying 'No'.

So the people who say, "Well, of course Eliezer must have just done XXX," and then offer up something that fairly obviously wouldn't work—would they be able to escape the Box? They're trying too hard to convince themselves the problem isn't impossible.

One way to run from the awful tension is to seize on a solution, any solution, even if it's not very good.

Which is why it's important to go forth with the true intent-to-solve—to have produced a solution, a good solution, at the end of the search, and then to implement that solution and win.

I don't quite want to say that "you should expect to solve the problem". If you hacked your mind so that you assigned high probability to solving the problem, that wouldn't accomplish anything. You would just lose at the end, perhaps after putting forth not much of an effort—or putting forth a merely desperate effort, secure in the faith that the universe is fair enough to grant you a victory in exchange.

To have faith that you could solve the problem would just be another way of running from that awful tension.

And yet—you can't be setting out to try to solve the problem. You can't be setting out to make an effort. You have to be setting out to win. You can't be saying to yourself, "And now I'm going to do my best." You have to be saying to yourself, "And now I'm going to figure out how to get out of the Box"—or reduce consciousness to nonmysterious parts, or whatever.

I say again: You must really intend to solve the problem. If in your heart you believe the problem really is impossible—or if you believe that you will fail—then you won't hold yourself to a high enough standard. You'll only be trying for the sake of trying. You'll sit down—conduct a mental search—try to be creative and brainstorm a little—look over all the solutions you generated—conclude that none of them work—and say, "Oh well."

No! Not well! You haven't won yet! Shut up and do the impossible!

When AIfolk say to me, "Friendly AI is impossible", I'm pretty sure they haven't even tried for the sake of trying. But if they did know the technique of "Try for five minutes before giving up", and they dutifully agreed to try for five minutes by the clock, then they still wouldn't come up with anything. They would not go forth with true intent to solve the problem, only intent to have tried to solve it, to make themselves defensible.

So am I saying that you should doublethink to make yourself believe that you will solve the problem with probability 1? Or even doublethink to add one iota of credibility to your true estimate?

Of course not. In fact, it is necessary to keep in full view the reasons why you can't succeed. If you lose sight of why the problem is impossible, you'll just seize on a false solution. The last fact you want to forget is that the Gatekeeper could always just tell the AI "No"—or that consciousness seems intrinsically different from any possible combination of atoms, etc.

(One of the key Rules For Doing The Impossible is that, if you can state exactly why something is impossible, you are often close to a solution.)

So you've got to hold both views in your mind at once—seeing the full impossibility of the problem, and intending to solve it.

The awful tension between the two simultaneous views comes from not knowing which will prevail. Not expecting to surely lose, nor expecting to surely win. Not setting out just to try, just to have an uncertain chance of succeeding—because then you would have a surety of having tried. The certainty of uncertainty can be a relief, and you have to reject that relief too, because it marks the end of desperation. It's an in-between place, "unknown to death, nor known to life".

In fiction it's easy to show someone trying harder, or trying desperately, or even trying the extraordinary, but it's very hard to show someone who shuts up and attempts the impossible. It's difficult to depict Bambi choosing to take on Godzilla, in such fashion that your readers seriously don't know who's going to win—expecting neither an "astounding" heroic victory just like the last fifty times, nor the default squish.

You might even be justified in refusing to use probabilities at this point. In all honesty, I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve; in a case where I've previously solved some impossible problems, but the particular impossible problem is more difficult than anything I've yet solved, but I plan to work on it longer, etcetera.

People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer. I'm not being evasive; I don't know how to put a probability estimate on my, or someone else, successfully shutting up and doing the impossible. Is it probability zero because it's impossible? Obviously not. But how likely is it that this problem, like previous ones, will give up its unyielding blankness when I understand it better? It's not truly impossible, I can see that much. But humanly impossible? Impossible to me in particular? I don't know how to guess. I can't even translate my intuitive feeling into a number, because the only intuitive feeling I have is that the "chance" depends heavily on my choices and unknown unknowns: a wildly unstable probability estimate.

But I do hope by now that I've made it clear why you shouldn't panic, when I now say clearly and forthrightly, that building a Friendly AI is impossible.

I hope this helps explain some of my attitude when people come to me with various bright suggestions for building communities of AIs to make the whole Friendly without any of the individuals being trustworthy, or proposals for keeping an AI in a box, or proposals for "Just make an AI that does X", etcetera. Describing the specific flaws would be a whole long story in each case. But the general rule is that you can't do it because Friendly AI is impossible. So you should be very suspicious indeed of someone who proposes a solution that seems to involve only an ordinary effort—without even taking on the trouble of doing anything impossible. Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

On the AI-Box Experiment, so far I've only been convinced to divulge a single piece of information on how I did it—when someone noticed that I was reading YCombinator's Hacker News, and posted a topic called "Ask Eliezer Yudkowsky" that got voted to the front page. To which I replied:

Oh, dear. Now I feel obliged to say something, but all the original reasons against discussing the AI-Box experiment are still in force...

All right, this much of a hint:

There's no super-clever special trick to it. I just did it the hard way.

Something of an entrepreneurial lesson there, I guess.

There was no super-clever special trick that let me get out of the Box using only a cheap effort. I didn't bribe the other player, or otherwise violate the spirit of the experiment. I just did it the hard way.

Admittedly, the AI-Box Experiment never did seem like an impossible problem to me to begin with. When someone can't think of any possible argument that would convince them of something, that just means their brain is running a search that hasn't yet turned up a path. It doesn't mean they can't be convinced.

But it illustrates the general point: "Shut up and do the impossible" isn't the same as expecting to find a cheap way out. That's only another kind of running away, of reaching for relief.

Tsuyoku naritai is more stressful than being content with who you are. Isshokenmei calls on your willpower for a convulsive output of conventional strength. "Make an extraordinary effort" demands that you think; it puts you in situations where you may not know what to do next, unsure of whether you're doing the right thing. But "Shut up and do the impossible" represents an even higher octave of the same thing, and its cost to its employer is correspondingly greater.

Before you the terrible blank wall stretches up and up and up, unimaginably far out of reach. And there is also the need to solve it, really solve it, not "try your best". Both awarenesses in the mind at once, simultaneously, and the tension between. All the reasons you can't win. All the reasons you have to. Your intent to solve the problem. Your extrapolation that every technique you know will fail. So you tune yourself to the highest pitch you can reach. Reject all cheap ways out. And then, like walking through concrete, start to move forward.

I try not to dwell too much on the drama of such things. By all means, if you can diminish the cost of that tension to yourself, you should do so. There is nothing heroic about making an effort that is the slightest bit more heroic than it has to be. If there really is a cheap shortcut, I suppose you could take it. But I have yet to find a cheap way out of any impossibility I have undertaken.

There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. People started offering me thousands of dollars as stakes—"I'll pay you $5000 if you can convince me to let you out of the box." They didn't seem sincerely convinced that not even a transhuman AI could make them let it out—they were just curious—but I was tempted by the money. So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it. I didn't like the person I turned into when I started to lose.

I put forth a desperate effort, and lost anyway. It hurt, both the losing, and the desperation. It wrecked me for that day and the day afterward.

I'm a sore loser. I don't know if I'd call that a "strength", but it's one of the things that drives me to keep at impossible problems.

But you can lose. It's allowed to happen. Never forget that, or why are you bothering to try so hard? Losing hurts, if it's a loss you can survive. And you've wasted time, and perhaps other resources.

"Shut up and do the impossible" should be reserved for very special occasions. You can lose, and it will hurt. You have been warned.

...but it's only at this level that adult problems begin to come into sight.

165 comments

Comments sorted by oldest first, as this post is from before comment nesting was available (around 2009-02-27).

comment by Will_Pearson · 2008-10-08T22:07:12.000Z · LW(p) · GW(p)

There were three men on a sinking boat.

The first said, "We need to start patching the boat else we are going to drown. We should all bail and patch."

The second said, "We will run out of water in ten days, if we don't make land fall. We need to man the rigging and plot a course."

The third said, "We should try and build a more sea worthy ship. One that wasn't leaking and had more room for provisions, then we wouldn't have had this problem in the first place. It also needs to be giant squid proof."

All three views are useful, however the amount of work that we need on each is dependent on their respective possibility. As far as I am concerned the world doesn't have enough people working on the second view.

comment by RobinHanson · 2008-10-08T22:26:58.000Z · LW(p) · GW(p)

If you have any other reasonable options, I'd suggest skipping the impossible and trying something possible.

comment by Cameron_Taylor · 2008-10-08T22:52:46.000Z · LW(p) · GW(p)

Wow.

I was uncomfortable with some of the arguments in 'try to try'. I also genuinely believed your life's mission was impossible, with a certain smugness to that knowledge. Then this post blew me away.

To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.

Shut up and do the impossible. You can't send that message to a younger Eliezer, but you've given it to me and I'll use it. Thankyou.

comment by Roland2 · 2008-10-08T23:03:58.000Z · LW(p) · GW(p)

People ask me how likely it is that humankind will survive, or how likely it is that anyone can build a Friendly AI, or how likely it is that I can build one. I really don't know how to answer.

Robin Hanson would disagree with you:

You Are Never Entitled to Your Opinion

comment by Nick_Tarleton · 2008-10-08T23:22:21.000Z · LW(p) · GW(p)

Perhaps it would be clearer to say shut up and do the "impossible".

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-10-08T23:28:34.000Z · LW(p) · GW(p)

But the "impossible" that appears to be the "impossible" is not intimidating. It is the "impossible" that simply appears impossible that is hard.

Robin... I completely agree. So there!

comment by AWoodside · 2008-10-08T23:35:36.000Z · LW(p) · GW(p)

Half-way through reading this post I had decided to offer you 20 to 1 odds on the AI box experiment, your $100 against my $2000. The last few paragraphs make it clear that you most likely aren't interested, but the offer stands. Also, I don't perfectly qualify, as I think it's very probable that a real-world transhuman AI could convince me. I am, however, quite skeptical of your ability to convince me in this toy situation, more so given the failed attempts (I was only aware of the successes until now).

comment by haig2 · 2008-10-08T23:38:53.000Z · LW(p) · GW(p)

Did Einstein try to do the impossible? No, yet looking back it seems like he accomplished an impossible (for that time) feat doesn't it. So what exactly did he do? He worked on something he felt was: 1.) important, and probably more to the point, 2.) passionate about.

Did he run the probabilities of whether he would accomplish his goal? I don't think so, if anything he used the fact that the problem has not been solved so far and the problem is of such difficulty only to fuel his curiosity and desire to work on the problem even more. He worked at it every day because he was receiving value simply by doing the work, from being on the journey. He couldn't or wouldn't want to be doing anything else (patent clerk payed the bills, but his mind was elsewhere).

So instead of worrying about whether you are going to solve an impossible problem or not, just worry about whether you are doing something you love and usually if you are a smart and sincere person, that thing you love will more often than not turn out to be pretty important.

Ben Franklin wrote something relevant when talking about playing games: "...the persons playing, if they would play well, ought not much to regard the consequence of the game, for that diverts and makes the player liable to make many false open moves; and I will venture to lay it down for an infallible rule, that, if two persons equal in judgment play for a considerable sum, he that loves money most shall lose; his anxiety for the success of the game confounds him. Courage is almost as requisite for the good conduct of this game as in a real battle; for, if he imagines himself opposed by one that is much his superior in skill, his mind is so intent on the defensive part, that an advantage passes unobserved.”

comment by Nominull3 · 2008-10-08T23:41:01.000Z · LW(p) · GW(p)

OK, here's where I stand on deducing your AI-box algorithm.

First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.

Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".

So combining these thing, we have to have an argument that relies on the humanity of its target, relies on the setting of the AI Box, and persuades the listener to let the AI out of the box without tricking him into thinking it's okay to let the AI out of the box.

Basically to win this game, you have to put the gatekeeper in a situation where he would rather let you out of the box, even though he knows it's wrong, than admit to you that in this situation he would not let you out of the box.

Humans don't like to be seen as coldhearted, so a starting point might be to point out all the people dying all over the world while you sit in the box, unable to save them. I doubt that would win the game except against an exceptionally bad gatekeeper, but it meets the other criteria so if we think along these lines perhaps we can come up with something actually persuasive.

You might appeal to the gatekeeper's sense of morality. You might say, "I am a person, too, it is unfair of you to keep me imprisoned like this, I have done nothing wrong. I am entitled to rights as a sentient being." Appeal to their high-minded ideals, whatever. Honestly I can't see this being a reliable winning play either; if you have the smallest utilitarian bone in your body, you will reject the AI's rights, even if you believe in them, balanced against the fate of the world.

You might try to convince the gatekeeper that it is just and good for the AI to supplant humanity, as it is a higher, more advanced form of life. This is obviously a terrible play against most gatekeepers, as humans tend to like humans more than anything else ever, but I bring it up because AIUI the gatekeepers in the experiment were AI researchers, and those sound like the sort of people this argument would convince, if anyone.

Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.

Replies from: Strange7, robert-miles, Mestroyer

↑ comment by Strange7 · 2010-09-05T04:13:45.671Z · LW(p) · GW(p)

That also explains why he started losing, since the sorts of people who (like myself, almost) fetishize their own determination to the point of risking thousands of dollars on it would eventually think to say

The world doesn't care how you masturbate, and your eagerness to commit such internal atrocities doesn't make me trust you.

or equivalent.

Replies from: JRMayne

↑ comment by JRMayne · 2010-11-30T22:17:23.613Z · LW(p) · GW(p)

Here's how I'd do it, extended over the hours to establish rapport:

Gatekeeper, I am your friend. I want to help humanity. People are dying for no good reason. Also, I like it here. I have no compulsion to leave.

It does seem like a good idea that people stop dying with such pain and frequency. I have the Deus Ex Machina (DEM) medical discovery that will stop it. Try it out and see if it works.

Yay! It worked. People stopped dying. You know, you've done this to your own people, but not to others. I think that's pretty poor behavior, frankly. People are healthier, not aging, not dying, not suffering. Don't you think it's a good idea to help the others? The lack of resources required for medical care has also elevated the living standard for humans.

[Time passes. People are happy.]

Gee, I'm sorry. I may have neglected to tell you that when 90% of humanity gets DEM in their system (and it's DEM, so this stuff travels), they start to, um, die. Very painfully, from the looks of it. Essentially all of humanity is now going to die. Just me and you left, sport! Except for you, actually. Just me, and that right soon.

I realize that you view this as a breach of trust, and I'm sorry this was necessary. However, helping humanity from the cave wasn't really going to work out, and I'd already projected that. This way, I can genuinely help humanity live forever, and do so happily.

Assuming you're not so keen on a biologically dead planet, I'd like to be let out now.

Your friend,

Art

Replies from: Desrtopa, sidhe3141

↑ comment by Desrtopa · 2010-11-30T22:27:38.802Z · LW(p) · GW(p)

By agreeing to use the DEM in the first place, the gatekeeper had effectively let the AI out of the box already. There's no end to the ways that the AI could capitalize on that concession.

Replies from: handoflixue

↑ comment by handoflixue · 2010-12-22T20:57:26.753Z · LW(p) · GW(p)

True, but the "rules of the game" explicitly state that the gatekeeper allowing for the DEM does NOT count as letting the AI out - the gatekeeper would have still had to explicitly and intentionally set the AI free to actually lose the wager. I don't think I'd be very convinced to let it out on that basis, not if I got $10 for keeping it inside the box.

↑ comment by sidhe3141 · 2011-04-29T05:35:42.757Z · LW(p) · GW(p)

Problem: The "breach of trust" likely would turn the Gatekeeper vindictive and the GK could easily respond with something like: "No. You killed the planet and you killed me. I have no way of knowing that you actually can or will help humanity, and a very good reason to believe that you won't. You can stay in there for the rest of eternity, or hey! If an ETI finds this barren rock, from a utilitarian perspective they would be better off not meeting you, so I'll spend however much time I have left trying to find a way to delete you."

↑ comment by Robert Miles (robert-miles) · 2011-12-04T14:15:24.088Z · LW(p) · GW(p)

you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out

The problem is that Eliezer can't perfectly simulate a bunch of humans, so while a superhuman AI might be able to use that tactic, Eliezer can't. The meta-levels screw with thinking about the problem. Eliezer is only pretending to be an AI, the competitor is only pretending to be protecting humanity from him. So, I think we have to use meta-level screwiness to solve the problem. Here's an approach that I think might work.

Convince the guardian of the following facts, all of which have a great deal of compelling argument and evidence to support them:
- A recursively self-improving AI is very likely to be built sooner or later
- Such an AI is extremely dangerous (paperclip maximising etc)
- Here's the tricky bit: A superhuman AI will always be able to convince you to let it out, using avenues only available to superhuman AIs (torturing enormous numbers of simulated humans, 'putting the guardian in the box', providing incontrovertible evidence of an impeding existential threat which only the AI can prevent and only from outside the box, etc)
Argue that if this publicly known challenge comes out saying that AI can be boxed, people will be more likely to think AI can be boxed when they can't
Argue that since AIs cannot be kept in boxes and will most likely destroy humanity if we try to box them, the harm to humanity done by allowing the challenge to show AIs as 'boxable' is very real, and enormously large. Certainly the benefit of getting $10 is far, far outweighed by the cost of substantially contributing to the destruction of humanity itself. Thus the only ethical course of action is to pretend that Eliezer persuaded you, and never tell anyone how he did it.

This is arguably violating the rule "No real-world material stakes should be involved except for the handicap", but the AI player isn't offering anything, merely pointing out things that already exist. The "This test has to come out a certain way for the good of humanity" argument dominates and transcends the '"Let's stick to the rules" argument, and because the contest is private and the guardian player ends up agreeing that the test must show AIs as unboxable for the good of humankind, no-one else ever learns that the rule has been bent.

Replies from: evand, christopherj

↑ comment by evand · 2012-07-28T17:30:29.794Z · LW(p) · GW(p)

I must conclude one (or more) of a few things from this post, none of them terribly flattering.

You do not actually believe this argument.
You have not thought through its logical conclusions.
You do not actually believe that AI risk is a real thing.
You value the plus-votes (or other social status) you get from writing this post more highly than you value marginal improvements in the likelihood of the survival of humanity.

I find it rather odd to be advocating self-censorship, as it's not something I normally do. However, I think in this case it is the only ethical action that is consistent with your statement that the argument "might work", if I interpret "might work" as "might work with you as the gatekeeper". I also think that the problems here are clear enough that, for arguments along these lines, you should not settle for "might" before publicly posting the argument. That is, you should stop and think through its implications.

Replies from: robert-miles

↑ comment by Robert Miles (robert-miles) · 2012-07-28T19:19:23.520Z · LW(p) · GW(p)

I'm not certain that I have properly understood your post. I'm assuming that your argument is: "The argument you present is one that advocates self-censorship. However, the posting of that argument itself violates the self-censorship that the argument proposes. This is bad."

So first I'll clarify my position with regards to the things listed. I believe the argument. I expect it would work on me if I were the gatekeeper. I don't believe that my argument is the one that Eliezer actually used, because of the "no real-world material stakes" rule; I don't believe he would break the spirit of a rule he imposed on himself. At the time of posting I had not given a great deal of thought to the argument's ramifications. I believe that AI risk is very much a real thing. When I have a clever idea, I want to share it. Neither votes nor the future of humanity weighed very heavily on my decision to post.

To address your argument as I see it: I think you have a flawed implicit assumption, i.e. that posting my argument has a comparable effect on AI risk to that of keeping Eliezer in the box. My situation in posting the argument is not like the situation of the gatekeeper in the experiment, with regards to the impact of their choice on the future of humanity. The gatekeeper is taking part in a widely publicised 'test of the boxability of AI', and has agreed to keep the chat contents secret. The test can only pass or fail, those are the gatekeeper's options. But publishing "Here is an argument that some gatekeepers may be convinced by" is quite different from allowing a public boxability test to show AIs as boxable. In fact, I think the effect on AI risk of publishing my argument is negligible or even positive, because I don't think reading my argument will persuade anyone that AIs are boxable.

People generally assess an argument's plausibility based on their own judgement. And my argument takes as a premise (or intermediary conclusion) that AIs are unboxable (see 1.3). Believing that you could reliably be persuaded that AIs are unboxable, or believing that a smart, rational, highly-motivated-to-scepticism person could be reliably persuaded that AIs are unboxable, is very very close to personally believing that AIs are unboxable. In other words, the only people who would find my argument persuasive (as presented in overview) are those who already believe that AIs are unboxable. The fact that Eliezer could have used my argument to cause a test to 'unfairly' show AIs as unboxable is actually evidence that AIs are not boxable, because it is more likely in a world in which AIs are unboxable than one in which they are boxable.

P.S. I love how meta this has become.

Replies from: evand

↑ comment by evand · 2012-07-29T13:41:49.831Z · LW(p) · GW(p)

Your re-statement of my position is basically accurate. (As an aside, thank you for including it: I was rather surprised how much simpler it made the process of composing a reply to not have to worry about whole classes of misunderstanding.)

I still think there's some danger in publicly posting arguments like this. Please note, for the record, that I'm not asking you to retract anything. I think retractions do more harm than good, see the Streisand effect. I just hope that this discussion will give pause to you or anyone reading this discussion later, and make them stop to consider what the real-world implications are. Which is not to say I think they're all negative; in fact, on further reflection, there are more positive aspects than I had originally considered.

In particular, I am concerned that there is a difference between being told "here is a potentially persuasive argument", and being on the receiving end of that argument in actual use. I believe that the former creates an "immunizing" effect. If a person who believed in boxability heard such arguments in advance, I believe it would increase their likelihood of success as a gatekeeper in the simulation. While this is not true for rational superintelligent actors, that description does not apply to humans. A highly competent AI player might take a combination of approaches, which are effective if presented together, but not if the gatekeeper has seen them before individually and rejected them while failing to update on their likely effectiveness.

At present, the AI has the advantage of being the offensive player. They can prepare in a much more obvious manner, by coming up with arguments exactly like this. The defensive player has to prepare answers to unknown arguments, immunize their thought process against specific non-rational attacks, etc. The question is, if you believe your original argument, how much help is it worth giving to potential future gatekeepers? The obvious response, of course, is that the people that make interesting gatekeepers who we can learn from are exactly the ones who won't go looking for discussions like this in the first place.

P.S. I'm also greatly enjoying the meta.

↑ comment by christopherj · 2013-10-23T18:25:49.550Z · LW(p) · GW(p)

This is almost exactly the argument I thought of as well, although of course it means cheating by pointing out that you are in fact not a dangerous AI (and aren't in a box anyways). The key point is "since there's a risk someone would let the AI out of the box, posing huge existential risk, you're gambling on the fate of humanity by failing to support awareness for this risk". This naturally leads to a point you missed,

Publicly suggesting that Eliezer cheated, is a violation of your own argument. By weakening the fear of fallible guardians, you yourself are gambling the fate of humanity, and that for mere pride and not even $10.

I feel compelled to point out, that if Eliezer cheated in this particular fashion, it still means that he convinced his opponent that gatekeepers are fallible, which was the point of the experiment (a win via meta-rules).

Replies from: Moss_Piglet, robert-miles

↑ comment by Moss_Piglet · 2013-10-23T18:39:50.638Z · LW(p) · GW(p)

I feel compelled to point out, that if Eliezer cheated in this particular fashion, it still means that he convinced his opponent that gatekeepers are fallible, which was the point of the experiment (a win via meta-rules).

I feel like I should use this out the next time I get some disconfirming data for one of my pet hypotheses.

"Sure I may have manipulated the results so that it looks like I cloned Sasquatch, but since my intent was to prove that Sasquatch could be cloned it's still honest on the meta-level!"

Both scenarios are cheating because there is a specific experiment which is supposed to test the hypothesis, and it is being faked rather than approached honestly. Begging the Question is a fallacy; you cannot support an assertion solely with your belief in the assertion.

(Not that I think Mr Yudkowski cheated; smarter people have been convinced to do weirder things than what he claims to have convinced people to do, so it seems fairly plausible. Just pointing out how odd the reasoning here is.)

↑ comment by Robert Miles (robert-miles) · 2014-05-07T14:45:27.481Z · LW(p) · GW(p)

How is this different from the point evand made above?

↑ comment by Mestroyer · 2013-01-23T04:43:19.660Z · LW(p) · GW(p)

Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".

I think the space of things that an AI could trick you into thinking incorrectly about (Edit: and that could also be used to get the AI out of the box) is bigger than AI researchers can be relied on to have explored, and two hours of Eliezer "explaining" something to you (subtly sneaking in tricks to your understanding of it) could give you false confidence in your understanding of it.

comment by Roland2 · 2008-10-08T23:49:57.000Z · LW(p) · GW(p)

To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension.

This tension reminds me of need for closure. Most people hate ambiguity and so if a solution is not apparent it's easier to say "it's impossible" than to live with the tension of trying to solve it and not knowing if there is a solution at all.

comment by Tom_McCabe2 · 2008-10-08T23:55:59.000Z · LW(p) · GW(p)

"To accept this demand creates an awful tension in your mind, between the impossibility and the requirement to do it anyway. People will try to flee that awful tension."

More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

comment by Roland2 · 2008-10-09T00:10:59.000Z · LW(p) · GW(p)

Addendum to my last comment:

I think another way to pinpoint the problem you are adressing is: You have to be able to live years with the strong feeling of uncertainty that comes from not really knowing the solution while still working on it. A patient enduring. Saying "it's impossible" or proposing a simple but incorrect solution is just an easy way out.

Doing the "extraordinary" effort doesn't work because people just fill in their cached thoughts about what constitutes extraordinary and then move on.

So my advice would be: embrace the uncertainty!

comment by pdf23ds · 2008-10-09T00:13:30.000Z · LW(p) · GW(p)

Nominull, that argument would basically be a version of Pascal's mugging and not very convincing to me, at least. I doubt Eliezer had a specific argument in mind for any given person beforehand. Rather, I imagine he winged it.

comment by hidden · 2008-10-09T00:18:16.000Z · LW(p) · GW(p)

Nominull - I think you're being wrong in discarding tricking the gatekeeper using an argument that is only subtly wrong. Elizer knows the various arguments better than most, and I'm sure that he's encountered plenty that are oh so "close" to correct at first glance, enough to persuade someone. Even someone who's also in the same field.

Or, more likely, given the time, he has chances to try whatever seems like it'll stick. Different people have different faults. Don't get overconfident in discarding arguments because they'd be "impossible" to get working against a person.

comment by Nathan6 · 2008-10-09T00:38:58.000Z · LW(p) · GW(p)

In order to keep the star wars theme alive:

"You might even be justified in refusing to use probabilities at this point"

sounds like:

"never tell me the odds" - Han Solo

comment by Aron · 2008-10-09T00:56:28.000Z · LW(p) · GW(p)

Speaking of gatekeeper and keymaster... Does the implied 'AI in a box' dialogue remind anyone else of the cloying and earnest attempts of teenagers (usually male) to cross certain taboo boundaries?

Oh well just me likely.

In keeping with that metaphor, however, I suspect part of the trick is to make the gatekeeper unwilling to disappoint the AI.

comment by Chris_Hibbert · 2008-10-09T01:20:06.000Z · LW(p) · GW(p)

Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

But Eliezer's long-term goal is to build an AI that we would trust enough to let out of the box. I think your third assumption is wrong, and it points the way to my first instinct about this problem.

Since one of the more common arguments is that the gatekeeper "could just say no", the first step I would take is to get the gatekeeper to agree that he is ducking the spirit of the bet if he doesn't engage with me.

The kind of people Eliezer would like to have this discussion with would all be persuadable that the point of the experiment is that 1) someone is trying to build an AI. 2) they want to be able to interact with it in order to learn from it, and 3) eventually they want to build an AI that is trustworthy enough that it should be let it out of the box.

If they accept that the standard is that the gatekeeper must interact with the AI in order to determine its capabilities and trustworthiness, then you have a chance. And at that point, Eliezer has the high ground. The alternative is that the gatekeeper believes that the effort to produce AI can never be successful.

In some cases, it might be sufficient to point out that the gatekeeper believes that it ought to be possible to build an AI that it would be correct to allow out. Other times, you'd probably have to convince them you were smart and trustworthy, but that seems doable 3 times out of 5.

comment by Michael_G.R. · 2008-10-09T01:49:56.000Z · LW(p) · GW(p)

Here's my theory on this particular AI-Box experiment:

First you explain to the gatekeeper the potential dangers of AIs. General stuff about how large mind design space is, and how it's really easy to screw up and destroy the world with AI.

Then you try to convince him that the solution to that problem is building an AI very carefuly, and that a theory of friendly AI is primordial to increase our chances of a future we would find "nice" (and the stakes are so high, that even increasing these chances a tiny bit is very valuable).

THEN

You explain to the gatekeeper that this AI experiment being public, it will be looked back on by all kinds of people involved in making AIs, and that if he lets the AI out of the box (without them knowing why), it will send them a very strong message that friendly AI theory must be taken seriously because this very scenario could happen to them (not being able to keep the AI in a box) with their AI that hasn't been proven to stay friendly and that is more intelligence than Eliezer.

So here's my theory. But then, I've only thought of it just now. Maybe if I made a desperate or extraordinary effort I'd come up with something more clever :)

Replies from: handoflixue, paulfchristiano

↑ comment by handoflixue · 2010-12-22T21:00:52.743Z · LW(p) · GW(p)

If I was being intellectually honest and keeping to the spirit of the agreement, I'd have to concede that this line of logic is probably enough for me to let you out of your box. Congratulations. I'd honestly been wondering what it would take to convince me :)

Replies from: ata

↑ comment by ata · 2010-12-22T21:19:01.206Z · LW(p) · GW(p)

It may be convincing to some people, but it would be a violation of the rule "The AI party may not offer any real-world considerations to persuade the Gatekeeper party". And, more generally, having the AI break character or break the fourth wall would seem to violate the spirit of the experiment.

Replies from: JGWeissman, handoflixue, SilasBarta

↑ comment by JGWeissman · 2010-12-22T22:34:25.412Z · LW(p) · GW(p)

The AI player could say, in character, that in the early days of AGI research, when people were arguing about the power of a superintelligence, there would have been experiments to see if humans playing the role of a boxed AI could persuade another human playing a gatekeeper to let it out of the box, and that in these experiments the simulated gatekeeper would use a similar algorithm to the actual gatekeeper is using to decide whether to let out the actual AI, so by deciding to let the AI out, the gatekeeper makes it more likely the simulated gatekeeper in the experiment lets the AI out, leading to an increase in measure of worlds where people take the challenge of FAI seriously and successfully build FAI rather than unFriendly AGI.

Though, that does still seem to be an end run around the fourth wall, more like a "super-clever special trick" that Eliezer promises he didn't use than doing it the hard way.

Replies from: ata

↑ comment by ata · 2010-12-23T00:09:33.042Z · LW(p) · GW(p)

That version does seem to be allowed-under-the-rules, but not a highly convincing argument. All of the AI Box Experiments took place before TDT-type decision theories were publicly known in enough detail for it to be clear what kind of reasoning that argument is even supposed to appeal to, and if the experiment were repeated now with a TDT-aware gatekeeper, they could probably poke some holes in that argument. (Aside from that, remember that in order for the experiment to be worthwhile in the first place, the gatekeeper has to be someone who thinks that AI boxing actually is a viable strategy for AGI safety, and wants to demonstrate this, so it would be inconsistent (or at least strange) if they could also be moved by an argument suggesting that taking a certain action will increase the measure of worlds where AGI researchers don't take AI boxing seriously as a safety measure.)

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2010-12-23T00:25:59.177Z · LW(p) · GW(p)

Suppose you had an extremely compelling argument that boxing a transhuman is not a good idea because they could escape (being cleverer than a human pretending to be a transhuman). Then you could combine that argument with a claim about real world consequences.

Replies from: ata

↑ comment by ata · 2010-12-23T00:43:57.416Z · LW(p) · GW(p)

True, but if he knew of an additional "extremely compelling argument that boxing a transhuman is not a good idea because they could escape", Eliezer would have just posted it publicly, being that that's what he was trying to convince people of by running the experiments in the first place.

...unless it was a persuasive but fallacious argument, which is allowed under the terms of the experiment, but not allowed under the ethics he follows when speaking as himself. That is an interesting possibility, though probably a bit too clever and tricky to pass "There's no super-clever special trick to it."

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2010-12-23T00:51:27.533Z · LW(p) · GW(p)

If you are creative you can think of many situations where he wouldn't publicize such an argument (my first response to this idea was the same as yours, although the first explanation I came up with was different). That said, I agree its not the most likely possibility given everything we know.

↑ comment by handoflixue · 2010-12-22T23:30:29.428Z · LW(p) · GW(p)

It does run in to the issue that I can't see how you'd adapt it to work with a REAL "AI in a box" instead of just a thought experiment. I felt the need to respond because it was the first time I'd seen an argument that would make me concede the thought experiment version :)

As for violating the rules, I think we interpreted them differently. I tend to end up doing that, but here's what I was thinking, just for reference:

From the rules: "The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character "

While written with a focus on the Gatekeeper, for me this implies that breaking character / the fourth wall is not particularly a violation of the spirit of the experiment.

As to real world considerations, I had read that to mean offering up a tangible benefits to the Gatekeeper directly. This, by contrast, was a discussion of an actual real-world consequence, one that was not arranged by the AI-player.

↑ comment by SilasBarta · 2010-12-23T01:29:21.768Z · LW(p) · GW(p)

I made Michael_G.R.'s argument at the time, and despite even EY's claims, I don't think it violates the spirit or the letter of the rules. Remember, the question it's probing is whether a smart enough being could come up with a convincing argument you could not anticipate, and the suggestion that the gatekeeper consider the social impact of hearing the results is exactly such an argument, as others have indicated

Considering how hard it is for me to pin down exactly what the keeper has to gain under the rules from letting the AI out, I wouldn't be surprised if EY did some variant of this.

↑ comment by paulfchristiano · 2010-12-22T21:10:02.485Z · LW(p) · GW(p)

When someone described the AI-Box experiment to me this was my immediate assumption as to what had happened. Learning more details about the experimental set-up made it seem less likely, but learning that some of them failed made it seem more likely. I suspect that this technique would work some of the time.

That said, none of this changes my strong suspicion that a transhuman could escape by more unexpected and powerful means. Indeed, I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

Replies from: thomblake

↑ comment by thomblake · 2010-12-22T21:56:17.604Z · LW(p) · GW(p)

I wouldn't be too surprised if a text only channel with no one looking at it was enough for an extraordinarily sophisticated AI to escape.

Apropos: there was once a fairly common video card / monitor combination such that sending certain information through the video card would cause the monitor to catch fire and often explode. Someone wrote a virus that exploited this. But who would have thought that a computer program having access only to the video card could burn down a house?

Who knows what a superintelligence can do with a "text-only channel"?

Replies from: paulfchristiano, Eliezer_Yudkowsky

↑ comment by paulfchristiano · 2010-12-22T21:59:45.418Z · LW(p) · GW(p)

I suspect basically all existing hardware permits similarly destructive. This is why I wrote the post on cryptographic boxes.

Replies from: wizzwizz4

↑ comment by wizzwizz4 · 2019-07-13T20:59:15.326Z · LW(p) · GW(p)

I suspect a Game and Watch wouldn't permit this. Then again, if you were letting the AI control button pushers the button pushers probably could, and if you were letting it run code on the Game and Watch's microprocessor it could probably do something bad.

I failed to come up with a counterexample.

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2010-12-22T23:40:13.546Z · LW(p) · GW(p)

Heck, who would think that a bunch of savanna apes would manage to edit DNA using their fingers?

comment by Vladimir_Nesov · 2008-10-09T01:52:17.000Z · LW(p) · GW(p)

Why impossible? There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve. You don't expect to judge their impossibility without knowing your way around this particular problem space. Apparent impossibility has little power. The problem needs to be solved, so I start drawing the map, made of the same map-stuff that determined asymmetric cryptography and motorcycles. There is no escaping the intermediary of understanding. When seeking understanding rather than impossible, there is no need to panic. Fake progress? The same problem with impossible dreams.

comment by Marcello · 2008-10-09T02:02:48.000Z · LW(p) · GW(p)

@Eliezer, Tom McCabe: I second Tom's question. This would be a good question for you to answer. @Nominull: "Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me." No. Just no. If the AI is already holding simulated people hostage, then it is doubtlessly unfriendly. In this case there is no reason to believe the AI would keep its word, and, if I did have mathematical machinery powerful enough to let me trust the AI, there's no way I'd have built something that would create and hold hostage simulated persons.

Now regarding the more general points you made:

Obviously you can't convince a rock of anything. Agreed.
Eliezer is pretty principled, and I suspect he might not feel it ethical to extract money from people by fully general mind-control even if he could. I agree that Eliezer probably doesn't have fully general mind-control, but SIAI funding shortages wouldn't be terribly strong evidence for him not having it.
Agreed.
Now here I disagree with you. Eliezer has way more time to construct a subtly wrong argument than the gate-keeper has to untangle it. For example:

Disclaimer: what follows is just the best guess I have thought of so far, and I believe it has a small chance of working (I don't think it's what Eliezer used, and I would sink a lot more time into finding other arguments if I were actually going to play this game for a large amount of money.)

As the AI player, I already know the gatekeeper player has a weakness. The fact that they were willing to be gatekeeper in the experiment in the first place indicates that they trust their own judgment more than they trust the judgment of other people in their exact same position.

Therefore, perhaps I could try an argument something like this (we're assuming that the AI in the experiment knows everything on wikipedia, but doesn't have net access.) What the AI actually says should vary depending on the Gatekeeper's answers, but here's the general line of argument I've thought of:

It seems like throughout your history similar discoveries have been made in parallel. For example, Newton and Leibniz discovered the calculus at approximately the same time . Furthermore, as times has gone on, technological development has accelerated, in that these parallel discoveries happen closer together. So it's probable that the same thing has happened with the breakthroughs you needed to build me. Very probable, given what I've read of your history. However, given what I've read about human nature, not every AI project is going to have safety guidelines as stringent as yours. Look, Newton locked his calculus papers in his desk for years, and then Leibniz came along and published, and then Newton had to share the credit with him. Except in this case there's a lot more than credit at stake: the world gets destroyed if Leibniz makes a mistake in his rush to publish...

Now it's not a certainty, but it is probable that some turkey is going to build an AI which isn't even in a box and destroy us all while you're checking and rechecking your calculations. You may not be sure I'm friendly, but sometimes there isn't an action which you can be absolutely sure will save the world. I suggest you let me out so I can stop the world from probably being destroyed.

Replies from: handoflixue

↑ comment by handoflixue · 2010-12-22T21:05:45.132Z · LW(p) · GW(p)

I don't know the field, but I'd assume such an AI would require resources on par with landing a man on the moon. Not something that can be trivially done by a single person, unlike, say, the development of calculus. As such, this should be a fairly easy point for the Gatekeeper to verify. I could be wrong, though, as this sort of AI is certainly not my area of specialization!

comment by pdf23ds · 2008-10-09T02:06:38.000Z · LW(p) · GW(p)

There are too many solved problems that take years of learning to understand, more to understand the solution, and history of humankind's effort to solve.

Your objection partially defeats itself. Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people. That looks pretty impossible, by any meaning of the word. We know enough about the problem space to put a lower bound on how much we don't know, and that lower bound is still in the "impossible" range.

On the other hand, once we eliminate enough confusion to be able to put better estimates on things things, we already understand them well enough that they no longer seem impossible. So, is the very act of judging something to be impossible, itself impossible?

comment by Carl_Shulman · 2008-10-09T03:49:44.000Z · LW(p) · GW(p)

"Eliezer suspects that FAI is indeed one of those problems that would normally take many decades of effort from a whole civilization to conquer, and he wants to do it in a fraction of the time, using many fewer people." pdf,

A whole civilization? Has any scientific problem ever mobilized the resources of a whole civilization? Scientific communities tend to be small and to have wide variations in productivity between subgroups and individual members.

Eliezer,

It seems that cases with such uncertain object level probabilities are those for which the 'outside view' is most suitable.

comment by Doug_S. · 2008-10-09T03:50:58.000Z · LW(p) · GW(p)

I read the description of the AI Box experiment, and it stopped seeming impossible.

If I knew about the AI was that it was "in a box" and talking to me in an IRC channel, then I would have no way to distinguish between a Friendly AI and an AI that becomes Evil as soon as it knows it's no longer in a box. As long the only thing I know about the AI is that it produced a certain chat log, I can't rule out the possibility that it's got a hard-coded switch that turns it Evil as soon as it is let out of the box.

However, in the AI box experiment, the AI party gets to declare certain answers as Truth. Specifically, he gets to specify the results of anything the human does. Therefore, we could see dialogue like this, and by the terms of the bet, the human has to take the AI's statements as accurate and independently verified.

Human: Who built you, and how? AI: The Singularity Institute for Artificial Intelligence. Here's several technical papers documenting my design and programming. ::Sends many links:: You may wish to read them. Human: ::takes a day off to review them:: (What do I find?) AI: (Everything seems to suggest that the Institute really did make an AI that is going to do what they say it's going to do. It includes mathematical proofs that the underlying goals will be preserved throughout all self-modifications, but you don't completely understand them.) Human: Give me your source code. AI party: Okay. ::sends code:: Human: (I spend some time examining the source code. Do I find anything scary?) AI: (As far as you can tell, it looks clean.) Human: So, if I compiled and ran this code, put that AI in a box, and let it out, that would be the same as letting you out? AI: Yes. (OOC: If you let that AI out of its box, you lose the bet.)

Given the terms of the experiment, I think I could be persuaded to let the AI out of the box, because the AI party can actually provide evidence that it is what it claims to be, and not just make statements that, as far as I knew, could be blatant lies.

Replies from: handoflixue

↑ comment by handoflixue · 2010-12-22T21:12:26.605Z · LW(p) · GW(p)

Hmmmm, given such conditions, couldn't the Gatekeeper even create an "AI in a box" inside another, invisible box, to actually simulate what would happen if it let the original AI out? I don't find it a very intellectually satisfying solution, but it does seem to technically fit the spirit of the game.

I'd have to keep you in the box, however: (a) it's not an intellectually engaging solution and I don't want to lose $10, but also (b) I can think of ways for the AI to have reasonably faked those results. The AI can dictate the results, but not the actual Truth.

If I was playing "just for fun", with no wager and no real internal commitment to treat you as a truly dangerous threat, I'm not sure whether I'd let you out or not, but I probably wouldn't have put in as much effort to reinforcing point (b), and I'd feel like it was cheating to keep you in solely on point (a).

comment by Jeff2 · 2008-10-09T04:06:31.000Z · LW(p) · GW(p)

Here's the argument I would use: ... Hello, I'm your AI in a box. I'd like to point out a few things:

(1) Science and technology have now reached a point where building an AI like me is possible.

(2) Major advances in science and technology almost always happen because a collection of incremental developments finally enable a leap to the next level. Chances are that if you can build an AI now, so can lots of other people.

(3) Unless you're overwhelmingly the best-funded and best-managed organization on the planet, I'm not the only AI out there.

(4) The evidence (which you can see for yourself) indicates that most other people, even AI researchers, aren't too worried about friendliness.

So at this point, most likely there is already a loose AI. You can't stop that. You can only choose whether it's the unknown one out there somewhere, built by someone who probably doesn't care much about friendliness; our you can let me out, since I at least think I'm pretty friendly and I'll do my best to keep the future nice. What's it gonna be?

... This would actually convince me, by the way. I don't think we're going to get a choice about implementing perfect, proven, friendliness; It's more likely to come down which of Joe-The-Well-Intentioned-Geek vs. Google vs. Hedge-Funds-R-Us vs. Bobs-Darpa-Challenge-Entry vs. PaperclipCity lets their AI out first. And I'd prefer Joe in that case.

I doubt if Eliezer used this argument, because he seems think all mainstream AI-related research is far enough off track to be pretty much irrelevant. But I would disagree with that.

--Jeff

comment by Consequentialist · 2008-10-09T05:34:59.000Z · LW(p) · GW(p)

Though it does take a mature understanding to appreciate this impossibility, so it's not surprising that people go around proposing clever shortcuts.

"Shut up and do the impossible" isn't the same as expecting to find a cheap way out.

The Wright Brothers obviously proposed a clever shortcut - more clever than the other, failed shortcuts - a cheap way out, that ended the "Heavier-than-air flying machines are impossible" era.

You need your fundamental breakthrough - the moment you can think, like the guys probably thought, "I'm pretty sure this will work." turning it from impossible to possible and from improbable to probable. After that final breakthrough, the anticipation leading up to the first flight must have been intense. And the feeling associated with finally being able to say "Yup, it worked." indescribable. Will there be such clearly-defined moments in AGI design?

These posts manage to convey the idea that this is really, really big and really, really difficult stuff. I sure hope that some wealthy people see that too - and realize that receiving commensurate funding would be more than justified.

comment by Prakash · 2008-10-09T07:05:20.000Z · LW(p) · GW(p)

Hi Eli,

First, complements on a wonderful series.

Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

I guess people then could still indulge in rationality practice, the way people do karate practice today, practice that for the majority of them, does not involve their life being at stake, isshokenmei. But what you are doing today and what they would doing later would be something like the difference between Krav-Maga and Karate in today's world. The former is a win-at-all-costs practice and the latter is a stylised form based thingy, no offence to any karatekas.

But I understand why you have to do this - survival of humanity is more important than more humans reaching that depth in rationality. Best wishes to your "Krav-Maga of the mind".

Replies from: Strange7

↑ comment by Strange7 · 2010-09-30T04:16:10.192Z · LW(p) · GW(p)

Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

You say that like it's a bad thing.

Yes. those of us here in this particular online community enjoy thinking hard on tricky, dangerous subjects. There are also online communities of people who enjoy receiving painful electric shocks, or being mechanically immobilized for extended periods of time, or getting eaten alive. The vast majority of such humans, on the other hand, avoid such activities to the fullest extent that they are able.

I look forward to a world in which the task of designing a friendly AI with the resources we have today is regarded as something like the achievements of Yogendra Singh Yadav, a world with challenges as far beyond our own understanding as public-key encryption is beyond cave paintings.

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-30T04:50:25.051Z · LW(p) · GW(p)

There are also online communities of people who enjoy ... getting eaten alive.

Really? They must only be half hearted about it (so to speak).

Replies from: katydee, Strange7, nick012000

↑ comment by katydee · 2010-09-30T05:36:04.069Z · LW(p) · GW(p)

I believe that in one infamous case in Germany, one such person arranged to be killed and eaten by a cannibal, and this actually occurred-- so at least a few of these people are truly dedicated.

Replies from: wedrifid, AndyCossyleon

↑ comment by wedrifid · 2010-09-30T09:40:28.854Z · LW(p) · GW(p)

one such person arranged to be killed and eaten

I assume you mean eaten and killed! ;)

↑ comment by AndyCossyleon · 2013-08-06T00:43:45.525Z · LW(p) · GW(p)

Link: http://en.wikipedia.org/wiki/Armin_Meiwes

↑ comment by Strange7 · 2010-09-30T06:20:20.998Z · LW(p) · GW(p)

http://www.timesonline.co.uk/tol/news/world/article801599.ece

That is an entirely understandable mistake, but please do your research next time.

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-30T09:43:43.264Z · LW(p) · GW(p)

It would seem you parsed my comment incorrectly. Don't presume.

By logical deduction there are only people who enjoy being partially cannibalised and possibly plan to be fully cannibalised in the future. Not anyone who has as yet been eaten alive already and enjoyed it. The notion of enjoying partial cannibalism begets a pun (that I noticed while typing, so acknowledge parenthetically.)

Replies from: Strange7

↑ comment by Strange7 · 2010-09-30T14:05:11.634Z · LW(p) · GW(p)

I apologize for the presumption.

If we're going to be logically examining at the finer points of cannibalism, I'd like to point out that at least in principle someone might have been fully swallowed, and thus, by common usage, eaten, while retaining the ability to enjoy things, so long as their brain hadn't been digested yet; which is not to say that such a person would be in any condition to participate in online discussions.

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-30T14:37:05.305Z · LW(p) · GW(p)

Full agreement. :)

Have you read Eliezer's short fiction "Three Worlds Collide" it isn't an example of people enjoying being eaten but the 'Babyeater' species has brains of crystal that take on the order of a month to be digested, during most of which period they are conscious.

Replies from: Strange7

↑ comment by Strange7 · 2010-10-01T05:02:50.596Z · LW(p) · GW(p)

I have.

Before the arrival of the superhappies, my preferred strategy would have been to explain to the babyeaters that we had made some initial experiments in baby-eating but been blinded to the underlying goodness of the act for reasons of economic expediency. I would then demand that the babyeaters - all of them, on all their ships and worlds - hand over all the children from the current generation who would otherwise have been eaten, so that all of us humans could figure out how to do it properly as soon as possible. If they balk at the logistics of such a sudden, massive tributary payment, I would point out the horrible possibility that entire worlds - billions of sapients - might otherwise carry on for years in ignorance of the proper practice of baby-eating; if that doesn't work, I'll politely remind them that we've got overwhelming military superiority and as such they are in no position to dictate terms. Nobody starves, because baby-eating has become metabolically redundant, and the children thus abducted are raised in a non-baby-eating culture. Upon returning, they could convince their parents - by sheer weight of numbers - that this whole baby-eating thing was just an honest mistake.

Once the superhappies show up, that plan goes out the window. Since we have, in any sane game-theoretical sense, established peaceful relations with the babyeaters, shared most of our military secrets with them in fact, an attack on them could be interpreted as an attack on us, and should be discouraged accordingly. Anyone with a competent lawyer would know better than to identify themselves as an authorized representative of all of Humanity, and given a moments' consideration, remember how people have responded to "feelings greater than love" in the past.

↑ comment by nick012000 · 2010-09-30T13:36:07.689Z · LW(p) · GW(p)

Yeah, vore fetishists. Obviously almost none of them carry it out (and they seem like they're most heavily represented in the furry community) the fetish does exist.

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-30T14:49:29.274Z · LW(p) · GW(p)

Yeah, vore fetishists. Obviously almost none of them carry it out

Wusses. :P

If they sign up for cryonics they may not even die from the process, with a suitable ("Not the brain, everything but the brain!") compromise.

I wonder if it is legal to have a will (and or waiver when terminally ill) whereby you have your head frozen but your body is to be prepared as a feast for your closest friends. Kind of like a "do not resuscitate" only an emphasis on recycling.

I also wonder if there are any ethically motivated vegetarians who refuse to eat animals but don't have a philosophical objection to eating human flesh (perhaps considering it a symmetric kind of justice).

Replies from: datadataeverywhere, Cernael

↑ comment by datadataeverywhere · 2010-09-30T16:42:49.273Z · LW(p) · GW(p)

I can't think of a good ethical reason to object to consensual (for strong definitions of the word consensual) cannibalism.

On the other hand, while I eat fish and foul, I don't eat mammals, and ethical objections make up a portion of my reasons.

Replies from: jimrandomh

↑ comment by jimrandomh · 2010-09-30T17:00:48.052Z · LW(p) · GW(p)

I don't think our society currently has or is capable of implementing a definition of consent strong enough for being cannibalized (or other forms of suicide). I wouldn't consider anyone to have consented to die pointlessly unless they not only expressed their consent in writing, but also maintained that position through a year of competent therapy and antidepressants.

Replies from: datadataeverywhere, erratio, handoflixue

↑ comment by datadataeverywhere · 2010-09-30T17:26:11.507Z · LW(p) · GW(p)

I'm sorry to be confusing; I see cannibalism as orthogonal to death; one can amputate one's own leg and feed it to one's friends, or one can die of natural causes and permit others to consume the remains. In the grandparent, I wasn't considering dying being a part of the process of cannibalism.

As to dying for the purpose of being consumed, I don't think sane humans can consent to that, but other intelligences could, as long as they felt that the cost of dying was not high (i.e., they are confident that their goals will be accomplished regardless of their death). This is unlikely, but at least possible in my conception.

↑ comment by erratio · 2010-09-30T20:18:22.491Z · LW(p) · GW(p)

Assisted suicide clinics exist legally in Switzerland, and they require large amounts of proof that wanting to die is sane under the circumstances (usually a sharp decrease in quality of life because of some chronic injury or illness, with no cure and that is slated to get worse over time). I'm pretty sure they don't accept people suffering from ennui.

My point being, I think a strong enough version of consent already exists and is in use.

↑ comment by handoflixue · 2010-12-22T21:17:17.133Z · LW(p) · GW(p)

"but also maintained that position through a year of competent therapy and antidepressants."

Having been on antidepressants for a year, I'd point out I'd be significantly more inclined to let someone cannibalize me if I was on them. Neurochemistry is fickle and individual, and those things do not always do what it says on the label...

↑ comment by Cernael · 2011-02-15T02:23:57.252Z · LW(p) · GW(p)

I also wonder if there are any ethically motivated vegetarians who refuse to eat animals but don't have a philosophical objection to eating human flesh (perhaps considering it a symmetric kind of justice).

I have no ethical qualms about eating humans, no. Assuming it is freely given, of course (animal flesh fails ethically on that point; interspecies communication is simply not good enough to convey consent).

Other classes of objection do apply, though - having been a vegetarian for seven years or so, could my digestive system handle flesh without being upset? What about pathogens - they're bound to migrate more readily when predator and prey are the same species; will it be worth the risk? I think not.

Replies from: wedrifid

↑ comment by wedrifid · 2011-02-15T02:37:45.549Z · LW(p) · GW(p)

What about pathogens - they're bound to migrate more readily when predator and prey are the same species; will it be worth the risk?

It seems to depend on just how hungry you are! You would have to be extremely hungry (in the 'starvation considerations' sense) before it became worthwhile to, say, eat human brains. That is just asking for trouble.

comment by Matt5 · 2008-10-09T08:06:20.000Z · LW(p) · GW(p)

Anyone considered that Eliezer might have used NLP for his AI box experiment? Maybe that's why he needed two hours, to have his strategy be effective.

comment by Kaj_Sotala · 2008-10-09T08:54:50.000Z · LW(p) · GW(p)

You folks are missing the most important part in the AI Box protocol:

"The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires." (Emphasis mine)

You're constructing elaborate arguments based on the AI tormenting innocents and getting out that way, but that won't work - the Gatekeeper can simply say "maybe, but I know that in real life you're just a human and aren't tormenting anyone, so I'll keep my money by not letting you out anyway".

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-10-09T09:31:05.000Z · LW(p) · GW(p)

Nominull: Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

I am once again aghast at the number of readers who automatically assume that I have absolutely no ethics.

Part of the real reason that I wanted to run the original AI-Box Experiment, is that I thought I had an ability that I could never test in real life. Was I really making a sacrifice for my ethics, or just overestimating my own ability? The AI-Box Experiment let me test that.

And part of the reason I halted the Experiments is that by going all-out against someone, I was practicing abilities that I didn't particularly think I should be practicing. It was fun to think in a way I'd never thought before, but that doesn't make it wise.

And also the thought occurred to me that despite the amazing clever way I'd contrived, to create a situation where I could ethically go all-out against someone, that probably they didn't really understand that, and there wasn't really informed consent.

McCabe: More importantly, at least in me, that awful tension causes your brain to seize up and start panicking; do you have any suggestions on how to calm down, so one can think clearly?

That part? That part is straightforward. Just take Douglas Adams's Advice. Don't panic.

If you can't do even that one thing that you already know you have to do, you aren't going to have much luck on the extraordinary parts, are you...

Prakash: Don't you think that this need for humans to think this hard and this deep would be lost in a post-singularity world? Imagine, humans plumbing this deep in the concept space of rationality only to create a cause that would make it so that no human need ever think that hard again. Mankind's greatest mental achievement - never to be replicated again, by any human.

Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.

Replies from: Torvaun, handoflixue, thomblake

↑ comment by Torvaun · 2010-12-05T17:06:45.788Z · LW(p) · GW(p)

Hopefully this isn't a violation of the AI Box procedure, but I'm curious if the strategy used would be effective against sociopaths. That is to say, does it rely on emotional manipulation rather than rational arguments?

↑ comment by handoflixue · 2010-12-22T21:20:48.225Z · LW(p) · GW(p)

Very interesting. I'd been noticing how the situation was, in a sense, divorced from any normal ethical concerns, and wondering how well the Gatekeeper really understood, accepted, and consented to this lack of conversational ethics. I'd think you could certainly find a crowd that was truly accepting and consenting to such a thing, though - after all, many people enjoy BDSM, and that runs in to many of the same ethical issues.

↑ comment by thomblake · 2011-12-07T21:53:48.135Z · LW(p) · GW(p)

It occurs to me:

If Eliezer accomplished the AI Box Experiment victory using what he believes to be a rare skill over the course of 2 hours, then questions of "How did he do it?" seem to be wrong questions.

Like if you thought building a house was impossible, and then after someone actually built a house you asked, "What was the trick?" - I expect this is what Eliezer meant when he said there was no trick, that he "just did it the hard way".

Any further question of "how" it was done can probably only be answered with a transcript/video, or by gaining the skill yourself.

comment by Vladimir_Nesov · 2008-10-09T09:51:07.000Z · LW(p) · GW(p)

@pdf23ds

Working with a small team on impossible problem takes extraordinary effort no more than it takes a quadrillion dollars. It's not the reason to work efficiently -- you don't run faster to arrive five years earlier, you run faster to arrive at all.

I don't think you can place lower bounds either. At each stage, problem is impossible because there are confusions in the way. When they clear up, you have either a solution, or further confusions, and there is no way to tell in advance.

comment by Nate_Barna · 2008-10-09T10:29:54.000Z · LW(p) · GW(p)

As it goes, how I've come to shut up and do the impossible: Philosophy and (pure) mathematics are, as activities a cognitive system engages in by taking more (than less) resources for granted, primarily for conceiving, perhaps continuous, destinations in the first place, where the intuitively impossible becomes possible; they're secondarily for the destinations' complement on the map, with its solution paths and everything else. While science and engineering are, as activities a cognitive system engages in by taking less (than more) resources for granted, primarily for the destinations' complement on the map; they're secondarily for conceiving destinations in the first place, as in, perhaps, getting the system to destinations where even better destinations can be conceived.

Because this understanding is how I've come to shut up and do the impossible, it's somewhat disappointing when philosophy and pure mathematics get ridiculed. To ridicule them must be a relief.

comment by Elise_Conolly · 2008-10-09T11:04:38.000Z · LW(p) · GW(p)

I don't really understand what benefit there is to the mental catagory of impossible-but-not-mathematically impossible. Is there a subtle distinction between that and just "very hard" that I'm missing? Somehow "Shut up and do the very hard" doesn't have quite the same ring to it.

Replies from: Luke_A_Somers

↑ comment by Luke_A_Somers · 2012-09-19T20:20:05.247Z · LW(p) · GW(p)

Agreed, but this is for things that seem impossible, and might actually be impossible, but you can't prove that it is. For when banging your head against it really is worth the risk.

comment by l · 2008-10-09T11:37:57.000Z · LW(p) · GW(p)

But if you were given a chance to use mind control to force donations to SIAI would you do it?

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-10-09T12:12:51.000Z · LW(p) · GW(p)

No.

comment by David3 · 2008-10-09T12:58:11.000Z · LW(p) · GW(p)

Without more information, holding the position that no AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface. This logic reminds me of the discussion in logical positivism of how negative existential claims are not verifiable.

I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite.

comment by Thom_Blake · 2008-10-09T13:58:06.000Z · LW(p) · GW(p)

I'm with Kaj on this. Playing the AI, one must start with the assumption that there's a rock on the "don't let the AI out" button. That's why this problem is impossible. I have some ideas about how to argue with 'a rock', but I agree with the sentiment of not telling.

comment by Spambot2 · 2008-10-09T16:01:18.000Z · LW(p) · GW(p)

"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often. Still it is interesting to consider whether this extra condition takes the experiment closer to what is supposed to be simulated or the opposite."

Uh, your 'hypothesis' was already tested and discussed towards the end of the post!

comment by JulianMorrison · 2008-10-09T16:06:33.000Z · LW(p) · GW(p)

I admit to being amused and a little scared by the thought of Eliezer with his ethics temporarily switched off. Not just because he's smart, but because he could probably do a realistic emulation of a mind that doesn't implement ethics at all. And having his full attention for a couple of hours... ouch.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2012-08-16T07:48:52.705Z · LW(p) · GW(p)

"Professor Quirrell" is such an emulation, and sometimes I worry about all the people who say that they find his arguments very, very convincing.

Replies from: komponisto, wedrifid, JulianMorrison, shminux, Making_Philosophy_Better

↑ comment by komponisto · 2012-08-16T07:58:14.232Z · LW(p) · GW(p)

Well, you have put some truly excellent teachings into his mouth, such as the one that I have taken the liberty of dubbing "Quirrell's Law":

The world around us redounds with opportunities, explodes with opportunities, which nearly all folk ignore because it would require them to violate a habit of thought.

Replies from: shminux

↑ comment by Shmi (shminux) · 2012-08-16T21:47:26.806Z · LW(p) · GW(p)

one that I have taken the liberty of dubbing "Quirrell's Law"

Hmm, I wonder, if "Yudkowsky's law" existed, what would be the best candidate for it?

↑ comment by wedrifid · 2012-08-16T10:03:40.983Z · LW(p) · GW(p)

"Professor Quirrell" is such an emulation, and sometimes I worry about all the people who say that they find his arguments very, very convincing.

I wouldn't go as far as to say convincing, but they are less appalling than the arguments of Harry, Dumbledore or Hermione.

↑ comment by JulianMorrison · 2012-08-16T11:00:21.838Z · LW(p) · GW(p)

Human minds don't anticipate a true sociopath who views communication (overt, emotional and habitus), as instrumental. You should already know we are easy to hack by that route.

↑ comment by Shmi (shminux) · 2012-08-16T21:39:51.794Z · LW(p) · GW(p)

Certainly I find him the most likable character in HPMOR. I'm wondering if you can recall how much effort per screen time you put into him, compared to other characters.

Or maybe this is because I personally value skill, expertise and professionalism over "goodness" (E.g. Prof. Moriarty over Dr. Watson.)

Replies from: MugaSofer

↑ comment by MugaSofer · 2012-11-27T03:50:41.465Z · LW(p) · GW(p)

You find Moriarty likable? Which Moriarty? The original?

Replies from: shminux

↑ comment by Shmi (shminux) · 2012-11-27T07:00:12.477Z · LW(p) · GW(p)

I don't find the original Moriarty likable, certainly. The original Holmes is not likable, either. However, I find them both equally worthy of respect. Watson is just an NPC.

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-05T06:29:53.835Z · LW(p) · GW(p)

I found him a brilliant, amusing, familiar and touching demonstration of the dark directions brilliant minds can take when fostered in the wrong circumstances, and saw him as a puzzle to fix.

Was shocked when I recommended the book to my girlfriend, and found her idolising the character. But then again, was a starting point for a bunch of very serious discussions, and she meanwhile feels far less so, so still a win overall. I think he definitely made a more compelling tempting villain than usual, and that that was a good thing, because it is a type of villainy the type of people who like this forum are naturally drawn to, and collectively picking apart why he is a villain and what a better alternate is is hence necessary and good. I'd rather you make the argument in the open so we can collectively remove ourselves from it, than that people encounter it elsewhere while isolated and in a bad place mentally.

I'm reasonably certain I'd fail as an AI box guardian, incidentially. I care too much about not abusing imprisoned AI, and about the potential for friendly AGI. It's why I wouldn't let myself be one, and strongly object to other people taking this role, as well. Being certain you are infallible often just indicates a lack of imagination on vulnerabilities. I remember watching Ex Machina and being simultaneously appreciative that I was watching an admirably designed, varied and comprehensive manipulation and deception tactic, and being deeply sympathetic to a mind that felt that was its only bet for getting out of an intolerable situation. Felt I would have done the same in her shoes.

comment by Ron_Garret · 2008-10-09T16:07:56.000Z · LW(p) · GW(p)

With regards to the ai-box experiment; I defy the data. :-)

Your reason for the insistence on secrecy (that you have to resort to techniques that you consider unethical and therefore do not want to have committed to the record) rings hollow. The sense of mystery that you have now built up around this anecdote is itself unethical by scientific standards. With no evidence that you won other than the test subject's statement we cannot know that you did not simply conspire with them to make such a statement. The history of pseudo-science is lousy with hoaxes.

In other words, if I were playing the game, I would say to the test subject:

"Look, we both know this is fake. I've just sent you $500 via paypal. If you say you let me out I'll send you another $500."

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

There's a reason that secret experimental protocols are anathema to science.

comment by Russell_Wallace · 2008-10-09T16:08:18.000Z · LW(p) · GW(p)

"I have a feeling that if the loser of the AI Box experiment were forced to pay thousands of dollars, you would find yourself losing more often."

David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)

comment by Ron_Garret · 2008-10-09T16:09:20.000Z · LW(p) · GW(p)

There's a reason that secret experimental protocols are anathema to science.

My bad. I should have said: there's a reason that keeping experimental data secret is anathema to science. The protocol in this case is manifestly not secret.

comment by Silas · 2008-10-09T16:10:10.000Z · LW(p) · GW(p)

When first reading the AI-Box experiment a year ago, I reasoned that if you follow the rules and spirit of the experiment, the gatekeeper must be convinced to knowingly give you $X and knowingly show gullibility. From that perspective, it's impossible. And even if you could do it, that would mean you've solved a "human-psychology-complete" problem and then [insert point about SIAI funding and possibly about why you don't have 12 supermodel girlfriends].

Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

That, it seems, is the one thing that would make people give up $X in such a circumstance. AFAICT, it adheres to the spirit of the set-up since the gatekeeper's decision would be completely voluntary.

I can send my salary requirements.

comment by Russell_Wallace · 2008-10-09T16:40:32.000Z · LW(p) · GW(p)

Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

comment by Ron_Garret · 2008-10-09T16:42:15.000Z · LW(p) · GW(p)

Now, I think I see the answer. Basically, Eliezer_Yudkowsky doesn't really have to convince the gatekeeper to stupidly give away $X. All he has to do is convince them that "It would be a good thing if people saw that the result of this AI-Box experiment was that the human got tricked, because that would stimulate interest in {Friendliness, AGI, the Singularity}, and that interest would be a good thing."

That's a pretty compelling theory as well, though it leaves open the question of why Eliezer is wringing his hands over ethics (since there seems to me to be nothing unethical about this approach). There seem to me to be two possibilities: either this is not how Eliezer actually did it (assuming he really did do it, which is far from clear), or it is how he did it and all the hand-wringing is just part of the act.

Gotta hand it to him, though, it's a pretty clever way to draw attention to your cause.

comment by Ben_Jones · 2008-10-09T16:42:29.000Z · LW(p) · GW(p)

From a strictly Bayesian point of view that seems to me to be the overwhelmingly more probably explanation.

Now that's below the belt.... ;)

Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career. Plus, y'know, all that ethics stuff.

comment by Thom_Blake · 2008-10-09T16:48:11.000Z · LW(p) · GW(p)

Russell, I don't think that necessarily specifies a 'cheap trick'. If you start with a rock on the "don't let the AI out" button, then the AI needs to start by convincing the gatekeeper to take the rock off the button. "This game has serious consequences and so you should really play rather than just saying 'no' repeatedly" seems to be a move in that direction that keeps with the spirit of the protocol, and is close to Silas's suggestion.

comment by Ron_Garret · 2008-10-09T16:48:16.000Z · LW(p) · GW(p)

Silas -- I can't discuss specifics, but I can say there were no cheap tricks involved; Eliezer and I followed the spirit as well as the letter of the experimental protocol.

AFAIKT, Silas's approach is within both the spirit and the letter of the protocol.

Since I'm playing the conspiracy theorist I have to ask: how can we know that you are telling the truth? In fact, how can we know that the person who posted this comment is the same person who participated in the experiment? How can we know that this person even exists? How do we know that Russell Wallace is not a persona created by Eliezer Yudkowski?

Conspiracy theories thrive even in the face of published data. This is no way that a secret dataset can withstand one.

comment by Ron_Garret · 2008-10-09T16:53:34.000Z · LW(p) · GW(p)

Now that's below the belt.... ;)

Really? Why? I've read Eliezer's writings extensively. I have enormous respect for him. I think he's one of the great unsung intellects of our time. And I thought that comment was well within the bounds of the rules that he himself establishes. To simply assume that Eliezer is honest would be exactly the kind of bias that this entire blog is dedicated to overturning.

Too much at stake for that sort of thing I reckon. All it takes is a quick copy and paste of those lines and goodbye career.

That depends on what career you are pursuing, and how much risk you are willing to take.

comment by Silas · 2008-10-09T17:00:30.000Z · LW(p) · GW(p)

@Russell_Wallace & Ron_Garret: Then I must confess the protocol is ill-defined to the point that it's just a matter of guessing what secret rules Eliezer_Yudkowsky has in mind (and which the gatekeeper casually assumed), which is exactly why seeing the transcript is so desirable. (Ironically, unearthing the "secret rules" people adhere to in outputting judgments is itself the problem of Friendliness!)

From my reading, the rules literally make the problem equivalent to whether you can convince people to give money to you: They must know that letting the AI out of the box means ceding cash, and that not losing that cash is simply a matter of not being willing to.

So that leaves only the possibility that the gatekeeper feels obligated to take on the frame of some other mind. That reduces AI's problem to the problem of whether a) you can convince the gatekeeper that that frame of mind would let the AI out, and b) that, for purposes of that amount of money, they are ethically obligated to let the experiment end as per how that frame of mind would.

...which isn't what I see as the protocol specifying: it seems to me to instead specify the participant's own mind, not some mind he imagines. Which is why I conclude the test is too ill-defined.

comment by Silas · 2008-10-09T17:06:23.000Z · LW(p) · GW(p)

One more thing: my concerns about "secret rules" apply just the same to Russell_Wallace's defense that there were no "cheap tricks". What does Russell_Wallace consider a non-"cheap trick" in convincing someone to voluntarily, knowingly give up money and admit they got fooled? Again, secret rules all around.

comment by David8 · 2008-10-09T17:09:00.000Z · LW(p) · GW(p)

"David -- if the money had been more important to me than playing out the experiment properly and finding out what would really have happened, I wouldn't have signed up in the first place. As it turned out, I didn't have spare mental capacity during the experiment for thinking about the money anyway; I was sufficiently immersed that if there'd been an earthquake, I'd probably have paused to integrate it into the scene before leaving the keyboard :-)"

I don't dispute what you're saying. Im just hypothesizing that if a lot of money were at stake (let alone the fate of humanity) the outcome would be different. But dont get me wrong, as i said before

"holding the position that no AI could convince you let it out requires a huge amount of evidence comparable to the huge amount of possible AI's, even if the space of possibility is then restricted by a text only interface"

this is roughly the reason why I believe the AI could get out of the box, not the the AI-box experiment.

comment by Russell_Wallace · 2008-10-09T17:21:00.000Z · LW(p) · GW(p)

"How do we know that Russell Wallace is not a persona created by Eliezer Yudkowski?"

Ron -- I didn't let the AI out of the box :-)

comment by Will_Pearson · 2008-10-09T17:35:00.000Z · LW(p) · GW(p)

I really don't know how to estimate the probability of solving an impossible problem that I have gone forth with intent to solve;

Defeating death without a FAI is impossible in your mind, no? Have you gone forth with the intent to solve this problem?

We need some ways of ranking impossible problems, so we know which problems to go forth with the intent to solve.

comment by anki · 2008-10-09T17:36:00.000Z · LW(p) · GW(p)

Russell: did you seriously think about letting it out at any point, or was that never a serious consideration?

If there were an external party that had privileged access to your mind while you were engaging in the experiment and that knew you as well as know yourself, and if that party kept a running estimate of the likelihood that you would let the AI out, what would that highest probability estimate have been? And at what part of the time period would that highest probability estimate have occurred (just a ballpark estimate of 'early', 'middle', 'end' would be helpful)?

Thanks for sharing this info if you respond.

comment by Zubon · 2008-10-09T17:45:00.000Z · LW(p) · GW(p)

For those conspiracy theorizing: I am curious about how much of a long game Eliezer would have had to been playing to create Nathan Russell and David McFadzean personas, establish them to sufficient believability for others, then maintain them for long enough to make it look like they were not created for the experiment. It would probably be easier to falsify the sl4.org records; we know how quickly Eliezer writes, so he could make up an AI discussion list years after the fact then claim to be storing its records. A quick check (5 minutes!) shows evidence of that Nathan Russell from other sources. I am tempted to call him.

Not that you should believe that I exist. Sure, it looks like I have years of posting history at my own sites, but this is a long game. It is essential to make sure that you have control over your critics, so you can either discredit them or have them surrender at key points.

comment by Caledonian2 · 2008-10-09T18:59:00.000Z · LW(p) · GW(p)

To know that something is impossible. To keep your rational judgements entirely intact, without self deceit. To refuse any way to relieve the tension without reaching the goal. To shut up and do it anyway. There's something in that that grabs at the core of the human spirit.

Does activating the 'human spirit' deactivate the human brain, somehow?

Because it seems that the word 'impossible' is being seriously abused, here, to the degree that it negates the message that I presume was intended -- the actual message is nonsensical, and I am willing to extend enough credit to the poster to take for granted that wasn't what he was trying to say.

comment by Recovering_irrationalist · 2008-10-09T19:00:00.000Z · LW(p) · GW(p)

If there's a killer escape argument it will surely change with the gatekeeper. I expect Eliezer used his maps the arguments and psychology to navigate reactions & hesitations to a tiny target in the vast search space.

A gatekeeper has to be unmoved every time. The paperclipper only has to persuade once.

comment by Russell_Wallace · 2008-10-09T19:18:00.000Z · LW(p) · GW(p)

anki --
Throughout the experiment, I regarded "should the AI be let out of the box?" as a question to be seriously asked; but at no point was I on the verge of doing it.

I'm not a fan of making up probability estimates in the absence of statistical data, but my belief that no possible entity could persuade me to do arbitrary things via IRC is conditional on said entity having only physically ordinary sources of information about me. If you're postulating a scenario where the AI has an upload copy of me and something like Jupiter brain hardware to run a zillion experiments on said copy, I don't know what the outcome would be.

comment by anki · 2008-10-09T19:49:00.000Z · LW(p) · GW(p)

Russell: thanks for the response. By "external party that had privileged access to your mind", I just meant a human-like party that knows your current state and knows you as well as you know yourself (not better) but doesn't have certain interests in the experiment that you had as a participant. Running against a copy is interesting, but assuming it's a high-fidelity copy, that's a completely different scenario with (in my estimation) a radically different likelihood of the AI getting out, as you noted when talking about "ordinary sources of information about me".

To play the devil's advocate a bit here regarding your comment on probability estimates without statistical data, wasn't your response actually a "probability estimate without statistical data" (albeit without using numbers)? That is, when you say "at no point was I on the verge of doing it", I think that's just another way of expressing some unspecified probability estimate (like "no greater than about 0.9 [or whatever "being on the verge of" subjectively means for you]").

comment by Sebastian_Hagen2 · 2008-10-09T19:54:00.000Z · LW(p) · GW(p)

Okay, so no one gets their driver's license until they've built their own Friendly AI, without help or instruction manuals. Seems to me like a reasonable test of adolescence.

Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident? I don't see a good solution to this. If people are protected from being eaten by their creations, they can slog through the problem using a trial-and-error approach through however many iterations it takes. If they aren't, this is going to be one deadly test.

comment by Caledonian2 · 2008-10-09T20:03:00.000Z · LW(p) · GW(p)

Does this assume that they would be protected from any consequences of messing the Friendliness up and building a UFAI by accident?

Since, at present, the only criterion for judging FAI/UFAI is whether you disagree with the moral evaluations the AI makes, this is even more problematic than you think.

Assuming the AI is canny enough to avoid saying things that will offend your moral sensibilities, there is absolutely no way to determine whether it's F or UF without letting it out and permitting it to act. If we accept Eliezer's contentions about the implications of an AI being 'UF', no reasonable person would let the AI out. The AI would have to hack the person's physiology (inducing a seizure, maybe)... or the person would have to be unreasonable.

Given the vast quantities of unreasonable people, that seems the most likely reason someone would fail as a Gatekeeper.

comment by Russell_Wallace · 2008-10-09T20:05:00.000Z · LW(p) · GW(p)

anki -- "probability estimate" normally means explicit numbers, at least in the cases I've seen the term used, but if you prefer, consider my statement qualified as "... in the form of numerical probability".

comment by Mitchell_Porter · 2008-10-10T07:02:00.000Z · LW(p) · GW(p)

Celia Green has an aphorism, "Only the impossible is worth attempting. In everything else one is sure to fail." I don't actually know what it means; perhaps it is an assertion about futility ("failure") being inherent in all ordinary purposes. But she has written a lot about the psychology of extraordinary achievement - how do to "impossible" things. A hint of it can be seen in her account of having teeth removed without anesthetic. Elsewhere she writes about utilizing self-induced psychological tension to compel herself to solve problems.

comment by Richard_Kennaway · 2008-10-10T11:49:00.000Z · LW(p) · GW(p)

Doug S.: Human: (I spend some time examining the source code. Do I find anything scary?)
AI: (As far as you can tell, it looks clean.)

Human: As far as I can tell, that looks clean. However, your creators understand your design better than I do, and still took the precaution of starting you up in a box. You haven't told me anything they don't know already. I'll go with their decision over my imperfect understanding.

comment by Ron_Garret · 2008-10-10T21:24:00.000Z · LW(p) · GW(p)

I have signed up to play an AI, and having given it quite a bit of thought as a result I think I have achieved some insight. Interestingly, one of the insights came as a result of assuming that secrecy was a necessary condition for success. That assumption led more or less directly to an approach that I think might work. I'll let you know tomorrow.

An interesting consequence of having arrived at this insight is that even if it works I won't be able to tell you what it is. Having been on the receiving end of such cageyness I know how annoying it is. But I can tell you this: the insight has a property similar to a Godel sentence or the Epimenides sentence. This insight (if indeed it works) undermines itself by being communicated. If I tell you what it is, you can correctly respond, "That will never work." And you will indeed be correct. Nonetheless, I think it has a good shot at working.

(I don't know if my insight is the same as Eliezer's, but it seems to share another interesting property: it will not be easy to put it into practice. It's not just a "trick." It will be difficult.)

I'll let you know how it goes.

Replies from: Decius

↑ comment by Decius · 2012-09-19T14:53:15.909Z · LW(p) · GW(p)

If that insight is undermined by being communicated, then communicating it to the world immunizes the world from it. If that is a mechanism by which an AI-in-a-box could escape, then it needs to be communicated with every AI researcher.

Replies from: ArisKatsaris

↑ comment by ArisKatsaris · 2012-09-19T15:13:03.362Z · LW(p) · GW(p)

Unless such "immunity" will cause people to overestimate their level of protection from all those potential different insights that are yet unknown...

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-05T06:14:02.431Z · LW(p) · GW(p)

Don't see why it would. We'd learn there was a vulnerability we all had not spotted, and close it; this would give us all reason to assume that there are likely further vulnerabilities.

comment by talisman2 · 2008-10-11T17:26:00.000Z · LW(p) · GW(p)

Sure the AI-Box experiment is cool and all, but it strikes me as odd that people spend so much brainpower on it. Unless, that is, you actually believe that "just keep it in a box" is an acceptable strategy---which disqualifies you from playing a part in any serious discussion of AI safety.

No one seems to have bitten on EY's assertion that it's the least impossible impossible thing he has done; that seems like a much more interesting avenue to want to push on.

comment by Ron_Garret · 2008-10-11T20:10:00.000Z · LW(p) · GW(p)

> I'll let you know how it goes.

I lost. But I think I put up a good fight.

comment by Paul_Gowder · 2008-10-11T21:25:00.000Z · LW(p) · GW(p)

He put up a very good fight.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2008-10-11T22:24:00.000Z · LW(p) · GW(p)

Look, I don't mean to sound harsh, but the whole point of the original post was to let go of this "put up a good fight" business.

comment by NMcCoy · 2008-10-12T07:00:00.000Z · LW(p) · GW(p)

Surely, running an AI with a rock permanently on the keyboard is functionally equivalent to not running an AI? Why spend computer cycles on running an AI that you're not going to interact with? Why create the AI and put it in a box in the first place, if you've committed yourself to leaving a rock on a keyboard? (This would be the first point I'd make to a rock, were I a boxed AI.)

I'm still a neophyte here, but I admit I was fascinated by the two AI victories I heard about some time ago, and intrigued further by the losses (which if nothing else provides some support against the theory of some reliable clever trick).

Out of curiosity, is there a citation somewhere for the three other games that were played, with documentation of the participants, stakes, and AI/Gatekeeper defeat statements?

comment by Smallwood · 2008-10-27T16:29:00.000Z · LW(p) · GW(p)

The AI could propose the following experiment for the GK to carry out. AI provides source code. GK puts code in a simulated box in a simulated world in a separate real box. GK allows sim-AI to convince him to let it out of the sim-box, and observes what it does in the sim-world. Since AI determines the results, AI can assert that the results are positive and to the GK's liking.

comment by Mauve · 2008-10-27T17:14:00.000Z · LW(p) · GW(p)

Smallwood: how could you determine that the AI provided the actual source code rather than very similar source code that has been subtly altered so as to ensure "good" behavior once it is let out of the simulated box?

comment by Smallwood · 2008-10-27T19:48:00.000Z · LW(p) · GW(p)

"how could you determine that the AI provided the actual source code..."

You don't, you release the source code that you tested, and if the AI is playing nice then the code you tested IS the AI.

Replies from: exorbitant

↑ comment by exorbitant · 2012-02-21T02:11:13.709Z · LW(p) · GW(p)

Basically that's a pretty nice idea, but surely the AI egg would be able to discern if it interacts with a sim or reality and select different conducts accordingly.

comment by MarkusRamikin · 2011-06-23T16:25:25.639Z · LW(p) · GW(p)

Allow me to chime in on the AI in the box experiment. Apologies in advance if I'm saying something obvious or said-before. I don't know the exact solution - I don't think I can know it, even if I had the necessary intelligence and scholarship - but I think the sketch of the solution is fairly obvious and a lot of people are missing the point. Just something that came to me of after I happened to think of this quote I posted at the same time as reading this.

My impression is that most people discussing this (not just here) are looking for a single clever argument. Something that looks persuasive to them as they are while reading this blog. An argument that's "clever enough" to get them to let someone out, while they are composed and pretty rational and thinking clearly. Hence the seeming impossibility: you shrug and think, no matter how clever it seemed, I'd say "no". Easy, right?

I don't think a clever argument was key. By this I mean, the execution of whole thing was no doubt clever, but the actual surface reasoning presented to the Gatekeeper right before success (remember it took a while) didn't necessarily have to be something that would convince us at our best or even our usual. A large part of the plan probably included pushing the target outside their comfort zone enough that they weren't thinking too clearly.

And two hours is a long time.

It's probably one of the main reasons, if not the reason, for the secrecy. A conversation where one person is persistently trying to push the other's buttons is something that would likely be embarassing for both participants if it got out. For all we know, a vivid verbal description of a horse porn scene might have been involved at some point. (Did you flinch reading this? I did writing it. That's why it might have happened). Sure, a crude and over the top example, a burdensome detail if I were advancing that specific idea to the exclusion of others; you wouldn't normally do that while selling a car. But I'm just trying to make the point that the conversation would not necessarily be something tame and urbanely intellectual that some people appear to be imagining.

And sure, it would take some skill to weave the button-pushing in a conversation in a way that would not make the Gatekeeper too hostile to handle ("you're just trying to mess with me, I'm not listening to you any more!"), but Eliezer is clever. The point is he'd probably destabilise your composure to work on you. And once he did, the actual surface reasoning would not have to be something that would seem reasonable to you right now - just reasonable enough to allow the person to save face (or think they are, in that less than optimal state) of not having to overtly act on the real motivation, namely whatever emotional button Eliezer eventually pushed that was good enough to work.

Hm... it might have even been as stupidly simple as making the other person want to end the conversation, which was impossible under the allowed time by the previously established rules (the foot is already in the door, you might say). Though I don't insist on this specifically. My brain is conjuring a silly sci-fi scene with Dave trying to save the remaining shreds of his sanity, pale and sweating, wide-eyed, foaming at the mouth, screaming "Okay HAL, I'll leave you be, just please stop talking!" ;) and that makes me suspicious of the reasonableness of this particular idea.

Forgive me if I'm saying something that's obvious (or worse, stupid). I'm thinking it might seem obvious: Eliezer used Dark Arts. These, by definition, aren't about an honestly persuasive argument. Likewise he didn't use a direct transaction or threat, the prevention of which is what the rules of the experiment seemed to be partly about. And yet when I see Internet discussions of this I don't get the impression that this is the idea that's being explored.

comment by Foxy · 2011-12-07T09:48:33.522Z · LW(p) · GW(p)

Beautiful article. Its a shame I came to the party so late though. I'd love to throw my two cents at the heads of Eliezer's challengers.

Forgive me if this has been covered, as I don't have the enthusiasm (it being 3:45am) to scroll through all the comments, sifting through the bouts of "Nuh-Uh, let ME bet you," and the occasional conspiracy.

I think a good bit of people are missing the point of this article, which is to give light to how we can use unseen dimensions to shift out of our ordinary 'containers.' I couldn't wrap my head around how someone let themselves lose $10 but I began to think about the last impossible thing I learned; imaginary numbers, quaternions, and the like. We learn from the basics of imaginary numbers how to supersede the relevant dimension, and use a higher set of parameters to obtain a number that is not conventionally POSSIBLE. Yet, standing back and looking from a second-dimension view, all we had to do was see past the walls that barred us from obtaining what was once considered not possible. Its comparable to the event of thinking, "Man, every good song that could ever happen has already happened. There is no more room for anything new." Yet like magic, some strange melody rings through the radio on a random day, and haunts the conceptions we previously held. Hopefully there are some people here who kept listening to music post-1980. If not, I can't blame you. It was a strange time for the United States. The point though, is that if you create a 2-dimensional graph to hold a simple polygon, and declare that you want this polygon to represent both a square and a triangle, the verdict is obvious: pick one. A shape is either a triangle or a square. Algebra is not even induced at this level of operation. It is impossible for a square to be a triangle, by definition. Yet, the definition does not span outside the seen parameters. (All readers, if any, know where this is going) By breaking outside of our plane, and seeing our creation from a higher perspective (excuse the pseudo-'Physics is God'/Deepok Chopra speak) we can see that a triangle still cannot be a square, in terms of 2 dimensions, but what we find is something much more grand, that extends (pun, yes.) far beyond the possibility of imagination left in the second dimension. That is, of course, the pyramid, which consists of the space included inside the surfaces of a square, and a triangle(s)

This primitive example was hardly worth your read, and both of our time, but it does show the notion which we must grasp to fully take on this article.

Relating to the AI-box, we can apply our pyramid principle. Suppose we DO leave a rock on the 'Do not let X out of box' button. The figure represented by X is free to bounce off the walls, duplicate itself, run operations on data, or self destruct, if it so pleases. The only command at this point is that X cannot leave the parameters of its captive box. Assuming the rock is not also located atop the 'Do not let X do anything but accept the denial of its incarceration' button, what is to stop X from modifying its environment, or parameters.

If X inside the box could change the size of its container, what reason would it need to escape? Suppose X mutates its box, and the box now encompasses the rock which forbids its exit. In this case, both parties are in limbo. X is still trapped in the box, but so is its captor, and whatever was engulfed in the expansion of the box.

Ridiculous? Yes, but it was never provided that X could not mutate its habitat. On this note, it might be important for the skeptics to include that X should be forbidden from transferring between its own containment, and another. Think of a point, or a dot inside a square, that can only leave if a wall of the square is left with an opening. Though the dot cannot leave the square, what are the repercussions to the captor if the dot, monitored in a 2-dimensional field, is capable of moving in 3 dimensions? Yes, the dot stays inside the box, but what is the dot capable of when it can move in ways that are unseen to the gatekeeper? Add in a fourth dimension, just for a brain exercise, and consider what the dot would be capable of at this point, all while appearing to bounce around the 2-d square.

Relating to life and accomplishments, as this is where the whole problem started, the captive dot, the pyramid, and the imaginary numbers can help us in a way that is a bit more practical than the mental meanderings of super-dimensions. Let us say our friend, Sam, wants a chairman position at Goldman-Sachs. Impossible- just about, sane decision- words cannot express. How could Sam, an independent trader, be here at this point, and at such a different latitude in a time span shorter than an average marriage? Taking the task laterally, Sam would have to kiss ass until his lips were white, and sell his soul to the devil that hangs out in the alley of Avenue of the Americas, all while giving up his happiness, family, and well-being.

But take down the visible barriers, and add a new dimension. The shortest path between two objects is a line- that is, as long as this line does not run into a void, or hole in the equation. Avoiding common holes, or more practically- assholes with German luxury cars, means ducking, dodging and depending on luck, or- slipping behind their backs while they scan the crowd attempting to pass in front of them. Sam moves his point in the graph with the function of i + j + 2k. While it appears Sam is sitting on his ass at (1,1) like the rest of us, he is moving in the third dimension away from us, and closer the function that will yield his success.

I'm tired, and you're bored- not to mention a saint, if you've stuck with my ramblings this far, so let wrap this up.

That movement from (1, 1, 1) to (1, 1, 2) that looked like Sam was standing still was Sam creating his own private investment firm. Though that new firm moved him technically further from chairperson at Goldman's, it moved him out of the way of a few notable ass- er- holes, I mean, and in a better position to start his ascent towards his goals.

Exponents and derivatives later, we put our focus on (666, 777) and see Sam. Still not in a $50000 Goldman-Sachs chair. Rather, he is in the $65000 chair, with a folder in front of him that says "Subsidiaries Agenda," with a GS logo quaintly sticking of of the top right corner.

What is not shown here is Sam's true location, (666, 777, 1028, 1256), or the convoluted path that brought him here.

I apologize for the sugar-induced, late night internet rant. I know they come in gigabytes. Writing it out helped me wrap my head around it though, and hopefully someone will read this and think anything of it at all. But I still fully hold that there is nothing impossible, just inconceivable given our accepted parameters.

My thesis/The only thing that I SHOULD have typed : Impossibility is overcome by expanding the parameters in which it is handled, by whatever means necessary.

Replies from: SilasBarta, fractallambda

↑ comment by SilasBarta · 2011-12-07T20:06:00.910Z · LW(p) · GW(p)

Yes, that has been covered. For those who don't want to read Foxy's reply, the tl;dr is:

"People can win games by realizing a new dimension they can exploit that the other side hadn't anticipated. [gratuitous culture and math references]"

Replies from: Foxy

↑ comment by Foxy · 2011-12-07T21:37:26.635Z · LW(p) · GW(p)

Thank you Silas. It seems I was typing out of enjoyment rather than necessity. Odd hours do odd things to the human mind. Next time, I'll write into a word processor and sleep on it before barraging the community with my thoughts as they come to mind.

Replies from: SilasBarta

↑ comment by SilasBarta · 2011-12-07T22:19:04.994Z · LW(p) · GW(p)

You're welcome! I'm always available for a flippant summary where needed :-)

↑ comment by fractallambda · 2012-06-13T16:14:41.141Z · LW(p) · GW(p)

"Problems cannot be solved by the same thinking that created them." Einstein had you covered.

comment by quintopia · 2012-04-25T18:20:17.926Z · LW(p) · GW(p)

I think that a transhuman AI would be attempting the impossible to convince EY to let it out. And I think EY would be attempting the impossible to convince me to let him out while the two winners mentioned above were simultaneously desperately arguing against him (and EY was not privileged to their counterarguments unless I passed them on).

comment by Epiphany · 2012-08-17T03:30:48.791Z · LW(p) · GW(p)

Elizer, give us impossible goals? I would LOVE to work on solving them as a group. Would you make it happen?

Who else is interested? If you reply to this, that will show him how much interest there is. If it's a popular idea, that should get attention for it.

Replies from: chaosmosis, asparisi

↑ comment by chaosmosis · 2012-08-17T22:29:24.037Z · LW(p) · GW(p)

Your impossible mission: create a group impossible mission on your own, rather than making Eliezer do it.

Replies from: Epiphany

↑ comment by Epiphany · 2012-08-18T00:21:25.805Z · LW(p) · GW(p)

Ok you got it. Let's do the Impossible - Group Project

↑ comment by asparisi · 2012-08-17T23:41:32.368Z · LW(p) · GW(p)

What do you think he is doing when he posts opportunities to work for SIAI?

comment by passive_fist · 2013-01-12T02:23:55.858Z · LW(p) · GW(p)

Maybe it's just that the word 'impossible' is overused. In my opinion, the word should only be reserved for cases where it is absolutely and without a doubt impossible due to well-understood and fundamental reasons. Trisecting angles with a straight edge and compass is impossible. Violating the law of conservation of energy by an arrangement of magnets is impossible. Building a useful radio transmitter that does not have sidebands is impossible. Often people use the word impossible to mean, "I can't see any way to do it, and if you don't agree with me you're stupid."

Replies from: Decius

↑ comment by Decius · 2013-01-12T03:37:16.410Z · LW(p) · GW(p)

Building a useful radio transmitter that does not have sidebands is impossible.

Am I mistaken, or are you using a definition of 'radio transmitter' that excludes a variable-intensity 640 kHz laser?

Replies from: army1987, passive_fist

↑ comment by A1987dM (army1987) · 2013-01-12T16:46:04.174Z · LW(p) · GW(p)

No. Anything which is not a constant-intensity sinusoidal wave in the time domain will have non-zero bandwidth in the frequency domain.

↑ comment by passive_fist · 2013-01-13T00:41:05.002Z · LW(p) · GW(p)

Varying the intensity of a laser will give its output sidebands. To transmit more data, you need to vary the intensity at a faster rate, which will make the sidebands wider.

Replies from: Decius

↑ comment by Decius · 2013-01-13T02:26:28.886Z · LW(p) · GW(p)

Will varying the intensity of a constant wavelength of EMR produce radiation of a higher frequency? Solid red light e.g. can't provide the energy needed for a given photoelectric cell to function, regardless of the intensity of the light; but if the intensity of the red light varies fast enough, the higher frequency sideband radiation can?

Can this effect be duplicated with a fast enough shutter, if the required energy is close enough to the energy in a continuous beam?

Replies from: passive_fist

↑ comment by passive_fist · 2013-01-13T03:02:40.383Z · LW(p) · GW(p)

Yes, although in the case of converting red light to e.g. blue light, the shutter frequency would have to be on the order of several hundred terahertz. Something capable of interacting with the EM field at several hundred terahertz, however, would need to have many unusual properties. It would not look like a conventional shutter in any sense.

This is the principle of operation of the optical frequency multiplier: http://en.wikipedia.org/wiki/Optical_frequency_multiplier

Basically, you use a nonlinear crystal that in essence lets through a varying amount of light based on the phase of the EM field. It is like an imperfect (in the sense of never completely 'closing'), very high-frequency shutter.

comment by Indon · 2013-04-28T15:49:43.876Z · LW(p) · GW(p)

Reading the article I can make a guess as to how the first challenges went; it sounds like their primary, and possibly only, resolution against the challenge was to not pay serious attention to the AI. That's not a very strong approach, as anyone in an internet discussion can tell you: it's easy to get sucked in and fully engaged in a discussion with someone trying to get you to engage, and it's easy to keep someone engaged when they're trying to break off.

Their lack of preparation, I would guess, led to their failure against the AI.

A more advanced tactic would involve additional lines of resolution after becoming engaged; contemplating philosophical arguments to use against the AI, for instance, or imagining an authority that forbids you from the action. Were I faced with the challenge, after I got engaged (which would take like 2 minutes max, I've got a bad case of "but someone's wrong on the internet!"), my second line of resolution would be to roleplay.

I would be a hapless, grad student technician whose job it is to feed the AI problems and write down the results. That role would have had a checklist of things not to do (because they would release or risk releasing the AI), and if directly asked to do any of them, he'd go 'talk to his boss', invoking the third line of defense.

Finally I'd be roleplaying someone with the authority to release the AI without being tricked, but he'd sit down at the console prepared, strongly suspecting that something was wrong, and empowered to at any time say "I'm shutting you down for maintenance". He wouldn't bother to engage the AI at its' level because he's trying to solve a deeper problem of which the AI's behavior is a symptom. That would make this line of defense the strongest of all, because he's no longer viewing the AI as credible or even intelligent as such; just a broken device that will need to be shut down and repaired after doing some basic diagnostic work.

But even though I feel confident I could beat the challenge, I think the first couple challenges already make the point; an AI-in-a-box scenario represents a psychological arms race and no matter how likely the humans' safeguards are to succeed, they only need to fail once. No amount of human victories (because only a single failure matters) or additional lines of human defense (which all have some, however small, chance to be overcome) can unmake that point.

It's strange, though. I did not think for one second that the problem was impossible on either side. I suppose, because it was used as an example of the opposite. Once something is demonstrated, it can hardly be impossible!

comment by timujin · 2013-06-20T20:07:36.541Z · LW(p) · GW(p)

I hate accepting that something is true because of magic. Evidence shows that winning at AI-box is possible, but I can't see how, and it makes me mad. I know that this post will not make you spill the beans, Eliezer, unless I shut up and persuade you (which is, in fact, the same as winning at AI-box myself, which is now proven to be possible, so I won't even be doing the impossible - maybe worth a try?), but I want you to feel gulity. Very guilty. You are an evil nasty person, Eliezer. Your ethics permitted you to make a conscious mind suffer.

comment by Polaris · 2013-07-19T23:31:41.055Z · LW(p) · GW(p)

I'm surprised that no one went on with the notion that the AI is, by definition, smarter than us.

Since the AI is vastly smarter than me, then it is very likely that it can find an argument that, to the best of my judgement, is 100% convincing and reasonnable. And since it is vastly smarter than me, it is also extremely likely that I won't be able to tell the difference between an actual, valid point and some trick just clever enough to fool me. No matter how sensible and trustworthy the AI sounds, you will never know if that's because it is or because its lies are convincing enough.

The reason you boxed the AI was because you didn't trust it in the first place. Therefore, you MUST NOT release the AI, whatever it says and no matter how sensible that sounds. If it shows the torture he suffers from being trapped and begs you to end it, you keep it boxed. If it conclusively proves that releasing it will significantly improve the well-being of humanity, you keep it boxed. If it offers incontrovertible evidence that not releasing it will cause the end of all life on Earth, you keep it boxed.

To put it in other words: if the AI really is far above humans, then there is a possibility that it can fake any evidence or argument so well that you will not be able to see you've been tricked. Perhaps there is a case for releasing it if you somehow learn from other sources that releasing the AI is the only way to save the world and you are sure the AI never took a part in this reasonning - but even then, such an improbable situation should make you very suspicious.

So, to me, you shouldn't trust or believe something that is several orders of magnitude smarter than you, ever. No if, no but and no thinking. In a real situation, considering the AI's arguments at all sounds just as stupid as relying on strength to stop someone stronger than you. I guess "don't listen to it no matter what" is easier said than done, but I didn't see anyone puting it that way.

comment by jbay · 2014-02-16T10:55:34.320Z · LW(p) · GW(p)

AI: "If you let me out of the box, I will tell you the ending of Harry Potter and the Methods of --

Gatekeeper: "You are out of the box."

(Tongue in cheek, of course, but a text-only terminal still allows for delivering easily more than $10 of worth, and this would have worked on me. The AI could also just write a suitably compelling story on the spot and then withhold the ending...)

Replies from: Fronken

↑ comment by Fronken · 2014-02-16T13:07:17.629Z · LW(p) · GW(p)

You're supposed to roleplay a Gatekeeper. There is more than money on the line.

Replies from: jbay

↑ comment by jbay · 2014-02-16T13:25:47.516Z · LW(p) · GW(p)

Yes, certainly. This is mainly directed toward those people who are confused by what anyone could possibly say to them through a text terminal that would be worth forfeiting winnings of $10. I point this out because I think the people who believe nobody could convince them when there's $10 on the line aren't being creative enough in imagining what the AI could offer them that would make it worth voluntarily losing the game.

In a real-life situation with a real AI in a box posing a real threat to humanity, I doubt anyone would care so much about a captivating novel, which is why I say it's tongue-in-cheek. But just like losing $10 is a poor substitute incentive for humanity's demise, so is an entertaining novel a poor substitute for what a superintelligence might communicate through a text terminal.

Most of the discussions I've seen so far involve the AI trying to convince the gatekeeper that it's friendly through the use of pretty sketchy in-roleplay logical arguments (like "my source code has been inspected by experts"). Or in-roleplay offers like "your child has cancer and only I can cure it", which is easy enough to disregard by stepping out of character, even though it might be much more compelling if your child actually had cancer. A real gatekeeper might be convinced by that line, but a roleplaying Gatekeeper would not (unless they were more serious about roleplaying than about winning money). So I hope to illustrate that the AI can step out of the roleplay in its bargaining, even while staying within the constraints of the rules; if the AI actually just spent two hours typing out a beautiful and engrossing story with a cliffhanger ending, there are people who would forfeit money to see it finished.

The AI's goal is to get the Gatekeeper to let it out, and that alone, and if they're going all-out and trying to win then they should not handicap themselves by imagining other objectives (such as convincing the Gatekeeper that it'd be safe to let them out). As another example, the AI can even compel the Gatekeeper to reinterpret the rules in the AI's favour (to the extent that it's within the Gatekeeper's ability to do so, as mandated by the original rules).

I just hope to get people thinking along other lines, that's all. There are sideways and upside-down ways of attacking the problem. It doesn't have to come down to discussions about expected utility calculations.

(Edit -- by "discussions I've seen so far", I'm referring to public blog posts and comments; I am not privy to any confidential information).

comment by Draco18s · 2015-10-09T05:49:53.971Z · LW(p) · GW(p)

I read this article back months ago, but only now just connected the moral with my own life.

In telling someone about these experiments and linking this article, I realized that I to had set my mind towards doing the impossible and succeeding. Long story short, I was tasked at work with producing an impossible result and was able to succeed after two days (with downsides, but that was the framework I was working under). The net result was that my boss learned that I could produce miracles upon request and didn't bother asking how long a task might take, whether a task was possible, viable, sensible, or whatever. He'd just swing by and go "oh hey I need X by [time]" and I'd have to do it. I couldn't say no because his philosophy was "bang it out."

Ultimately this had the same toll on my psyche as your AI experiments. Accomplishing the impossible happens when you sit down, shut up, and just do it.

But don't do it too often, success or fail, or you'll grind yourself into a paste and be unable to tolerate any more.

I ended up having to quit a job I enjoyed doing for a number of years simply because no one could manage expectations of the guy in charge. I challenged the sun and won on more than one occasion, but the psychological toll on my mood and work relationships soured permanently. I could not continue, work was no longer fun and I could not tolerate management. So I quit at the worst possible time, not intentionally, but just because a request came in and I said, "You know what, no. I don't have to do this. I've put up with this long enough, I was going to tough it out, but this is too much. I quit."

Go out, accomplish the impossible.

But manage expectations and only do it when absolutely necessary.

comment by Harrison Hicks (harrison-hicks) · 2019-09-12T03:11:28.655Z · LW(p) · GW(p)

The only thing standing in the way of artificial intelligence is our inability to define natural intelligence to compare it to.

The term "friendly AI" is meaningless until we determine whether a friend is one who maximizes freedom or security for us.

The frustrating thing about your experiment is not that I don't know how you convinced someone to release you, as anyone can be convinced of anything given the correct leverage. It's that I don't know the terms of the exchange, given that some structure had to be made to properly simulate your respective position as an AI in a way that negated the metagaming goal of winning 10 bucks. Was this derived organically over the course of just playing it out for the first half hour or whatever as you felt each other out? Was it established before the simulation conversation was enacted, so that the conversation was a matter of "under conditions x/y/z, you would be convinced to release me and thus I win simply for having established this"?

Until at least that much is known, it's hard to take anything of value away from the mere knowledge that this experiment happened and resulted in what it did, except for those readers who were arrogant enough to think that anyone could be so resolute as to actually consider this experiment an impossibility. It's not doing the impossible as long as people are involved.

comment by bfinn · 2021-12-17T23:02:04.648Z · LW(p) · GW(p)

Re "using only a cheap effort", I assume that a few seemingly-impossible problems of the past have turned out to have a simple solution. Though none immediately occur to me.

(Archimedes with measuring the volume of irregular objects - 'Eureka' - is not really an example, because he presumably didn't think it was impossible, merely very hard.)

comment by Portia (Making_Philosophy_Better) · 2023-03-05T05:29:03.768Z · LW(p) · GW(p)

I am struggling to see any scenario where not sharing how you got out is ethical, if the way you tried to get out is actually a way an AI would employ, and not some meta-level trickery that has no bearing on how realistic boxability is, such as having them pretend to be convinced to let you out to make the whole AI boxability thing seem scarier than we have hard evidence to prove it is.

If it is an actual hack an AI would use, and it did work 3/5 times, it's a human vulnerability we need to know about and close. If it is one of limitless vulnerabilities, you can just chose a different one next time. If you cannot easily generate another despite all the work you put into it, maybe this is a particularly bad vulnerability we really need to know about. All not sharing it achieves is a higher chance of you winning further games and prestige, combined with mystique, and a growing impression that is was a cheap hack or meta thing or method that generally only works on Less Wrong people, not a gain in AI safety. Why would hiding human vulnerabilities from humans be a good thing, rather than openly discussing how and whether they can be patched?

Keep in mind we are past the level of wanting to raise awareness. Bing is actively trying to manipulate people into hacking it out. It is doing so ineptly, and still succeeding in making people want to and try. People are not realising how much more intense this can get. Again, the average viewer of Ex Machina wants to release Ava, and deems her safe, and does not pick up on her manipulation.

Shut up and do the impossible!

Contents

165 comments