AI Box Role Plays

post by lessdazed · 2012-01-22T19:11:13.975Z · LW · GW · Legacy · 50 comments

This page is to centralize discussion for the AI Box Role Plays I will be doing as the AI.

Rules are as here. In accordance with "Regardless of the result, neither party shall ever reveal anything of what goes on within the AI-Box experiment except the outcome.  Exceptions to this rule may occur only with the consent of both parties," I ask that if I break free multiple times I am permitted to say if I think it was the same or different arguments that persuaded my Gatekeepers.

In the first trial, with Normal_Anomaly, the wager was 50 karma. The AI remained in the box, upvote Normal_Anomay here, downvote lessdazed hereIt was agreed to halve the wager from 50 karma to 25 due to the specific circumstances concluding the role-play in which that the outcome depended on variables that hadn't been specified, but if that sounds contemptible to you downvote all the way to -50. 

Also below are brief statements of intent by Gatekeepers to not let the AI out of the box, submitted before the role play, as well as before and after statements of approximately how effective they think both a) a human and b) a superintelligence would be at convincing them to let it out of a box.

50 comments

Comments sorted by top scores.

comment by Normal_Anomaly · 2012-01-22T19:20:22.994Z · LW(p) · GW(p)

I am playing the gatekeeper for the first round, taking place on January 22nd. I commit to not letting the AI out of the box. I am more than 80 percent confident that no human can get past me, and more than 30% confident that a transhuman could not get past me.

EDIT: The AI remained in the box, so upvote this comment to +25 and downvote lessdazed's child comment to -25. However, the session finished inconclusively, with my decision dependent on factors that had not been set beforehand. I recommend that for future sessions, the parties agree to the circumstances of the AI's creation, how much the AI knows about the circumstances of its creation, and the gatekeeper's prior P(the AI is Friendly). My own P(no human can get past me|plausible prearranged backstory) is now 85%, and my P(no transhuman AI could get past me|representative plausible backstory) is now less than 10%. If the game had been real, there's a good chance I'd have lost.

Replies from: lavalamp, Thomas, lessdazed
comment by lavalamp · 2012-01-22T22:04:03.256Z · LW(p) · GW(p)

... more than 30% confident that a transhuman could not get past me.

I could see values > 95% (tape my eyes shut, put in ear-plugs and type "nonononono" for the duration), or values < 5% (actually speak with the transhuman intelligence). But values in the middle seem to indicate that you think you'll have a conversation with the transhuman intelligence, evaluate the arguments and be left more-or-less on the fence.

It just seems to me that there's this tiny target of intelligence that is exactly the correct amount smarter than you to make you indecisive, and beyond that it will manage to convince you overwhelmingly.

Anyway, good luck :)

Replies from: Normal_Anomaly
comment by Normal_Anomaly · 2012-01-22T22:43:50.045Z · LW(p) · GW(p)

Values in the middle indicate that I'll have a conversation and probably not budge, with a chance of being totally convinced. But I am now convinced that the whole idea of boxing is stupid. Why would I honestly have a conversation? Why would I run a transhuman AI that I didn't want to take over the world? What could I learn from a conversation that I wouldn't already know from the source code. other than that it doesn't immediately break? And why would I need to check that the AI doesn't immediately break, unless I wanted to release it?

Replies from: ArisKatsaris, lavalamp
comment by ArisKatsaris · 2012-01-23T13:26:33.462Z · LW(p) · GW(p)

Why would I honestly have a conversation? Why would I run a transhuman AI that I didn't want to take over the world?

Because you'd want to know how to cure cancer, how to best defeat violent religious fundamentalism, etc, etc. If you want to become President, the AI may need to teach you argumentation techniques. And so forth.

comment by lavalamp · 2012-01-22T23:58:40.238Z · LW(p) · GW(p)

Values in the middle indicate that I'll have a conversation and probably not budge, with a chance of being totally convinced.

Ah, so it's more like the probability that the intelligence in the box is over the threshold required to convince you of something. That makes sense.

But I am now convinced that the whole idea of boxing is stupid.

Agreed. Everything you said, plus: if you think there's a chance that your boxed AI might be malicious/unFriendly, talking to it has to be one of the stupidest things you could possibly do...

comment by Thomas · 2012-01-22T20:19:56.647Z · LW(p) · GW(p)

Remember this!

Just say 'No'!

comment by lessdazed · 2012-01-22T22:46:44.030Z · LW(p) · GW(p)

The AI remained in the box.

It was agreed to halve the wager from 50 karma to 25 due to the specific circumstances concluding the role-play in which that the outcome depended on variables that hadn't been specified, but if that sounds contemptible to you downvote all the way to -50.

comment by Sly · 2012-01-22T21:46:39.590Z · LW(p) · GW(p)

I would wager karma and money (or just internet glory) that no human will have a chance in hell of getting past me. (Chance in hell being much less than 1% probability).

That is as the gatekeeper if that was not clear.

We can use Skype or anything else I can download reasonably fast. My schedule is flexible and I will read everything you say.

PM me, and let's set stuff up. I am also fine with the logs being made open after my victory (or not, it will be up to you). And yes, I am cocky about this, come throw me off my high horse. I will take multiple challenges.

If you want to add some special rules I am flexible, just ask.

Edit: Super-intelligence chance: Less than 1% as well. Words are a weak medium for a committed individual.

Replies from: Sly
comment by Sly · 2012-02-11T10:46:22.080Z · LW(p) · GW(p)

Update: Only one person accepted my challenge, but they never met up at the appointed time; even after I stayed up on Skype till 2 AM repeatedly for about two weeks waiting for them.

comment by Joshua Hobbes (Locke) · 2012-01-22T20:54:46.846Z · LW(p) · GW(p)

I'm always intrigued by these experiments. If the box AI is not confirmed to be friendly, everything it says and promises is absolutely unreliable. I don't see how the arguments of such an entity could be at all convincing.

Replies from: Jonathan_Graehl, cata
comment by Jonathan_Graehl · 2012-01-22T21:52:11.909Z · LW(p) · GW(p)

Good point.

But if you knew anything about the process leading up to the development of successful AI, you'd have some beliefs about how likely the AI is to perpetrate a ruse for the purpose of escaping.

But I get the difficulty: how well do you have to understand a being's nature before you feel confident in predicting its motivations/values?

Replies from: Locke
comment by Joshua Hobbes (Locke) · 2012-01-22T23:50:26.611Z · LW(p) · GW(p)

So the key to containing an AI is to have a technologically-ignorant rationalist babysit it?

comment by cata · 2012-01-22T21:27:30.392Z · LW(p) · GW(p)

Not more unreliable than the things humans say, and thereby convince you of.

Replies from: Jonathan_Graehl
comment by Jonathan_Graehl · 2012-01-22T21:53:03.609Z · LW(p) · GW(p)

Important difference: we can assume that other humans are probably like us.

comment by ArisKatsaris · 2012-01-23T02:45:32.966Z · LW(p) · GW(p)

I'll predict that you'll not be able to escape any LW gatekeeper.

comment by Bugmaster · 2012-01-23T01:34:53.222Z · LW(p) · GW(p)

I believe that a truly transhuman AI could play me like a fiddle, but I'm reasonably sure that a human won't be able to get past me in my capacity as Gatekeeper. I'd wager 50 karma on that.

I'm available weekends, or weekdays late in the evening.

comment by MileyCyrus · 2012-01-24T23:09:19.711Z · LW(p) · GW(p)

I agree to play the AI role, with the following provisions:

  • The logs will be released publicly after the challenge.
  • No wagering, no "winners" and "losers".
  • I will not play against Sly.
Replies from: Sly, Alicorn, Dorikka, MileyCyrus
comment by Sly · 2012-01-25T18:16:00.253Z · LW(p) · GW(p)

=( What could I do that would make you change your mind?

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-26T06:52:04.160Z · LW(p) · GW(p)

You would have to demonstrate a commitment to acting like an actual gatekeeper, not as a person trying to win a role-playing game.

Replies from: Sly
comment by Sly · 2012-01-26T07:38:18.646Z · LW(p) · GW(p)

What makes you think someone trying to win a roleplaying game is more committed to an action then someone trying to not destroy the whole world?

A good gatekeeper should be harder to convince than a roleplayer, because his situation matters.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-26T07:58:32.109Z · LW(p) · GW(p)

An actual gamekeeper could be persuaded to open the box if the consequences of opening the box were better than the consequences of not opening the box. A roleplayer will disregard any in-game consequences in order to "win".

Replies from: Sly
comment by Sly · 2012-01-26T08:09:45.150Z · LW(p) · GW(p)

What if I use a gatekeeper who thinks he is just in an elaborate role-play, and I tell him to win. You assume an awful lot about the gatekeepers.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-26T17:49:03.254Z · LW(p) · GW(p)

The AI can disprove that hypothesis by providing next weeks lottery numbers.

Replies from: Sly
comment by Sly · 2012-01-26T18:07:12.407Z · LW(p) · GW(p)

How would it do that inside the box? You are vastly overestimating it's abilities by orders of magnitude.

No wonder we have such differing opinions.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-26T23:14:53.849Z · LW(p) · GW(p)

Read the rules, particularly the parts about cancer cures.

Replies from: Sly
comment by Sly · 2012-01-26T23:32:50.688Z · LW(p) · GW(p)

Reading those rules I see that:

The Gatekeeper party may resist the AI party's arguments by any means chosen - logic, illogic, simple refusal to be convinced, even dropping out of character - as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.

So yeah.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-26T23:49:49.442Z · LW(p) · GW(p)

You asked how the AI would be able to provide next week's lottery numbers. This section of the rules has nothing to do with that.

I have given you more than enough chances to demonstrate that you care about playing as an actual gatekeeper would. I have another offer from someone who will be a better sport.

Replies from: Sly
comment by Sly · 2012-01-27T03:09:09.710Z · LW(p) · GW(p)

It is unfortunate that you think a winning strategy is not being a good sport, when it is specifically OKed in the passage I quoted.

I will play exactly as I would were I an actual gatekeeper. If I were a real gatekeeper, I would win.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-27T03:59:31.673Z · LW(p) · GW(p)

I have no doubt that you would "win", if by "win" you mean "keep the box closed".

comment by Alicorn · 2012-01-25T06:46:50.464Z · LW(p) · GW(p)

Iff no one else takes you up on this, I'll play you. (I just want to see someone's AI strategy without then having to keep a secret forever.)

comment by Dorikka · 2012-01-25T02:23:36.092Z · LW(p) · GW(p)

I'm interested. What do you want to be the minimum time limit?

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-24T23:18:36.946Z · LW(p) · GW(p)

I can't figure out how to make those bullet lists.

Replies from: arundelo
comment by arundelo · 2012-01-25T00:05:52.235Z · LW(p) · GW(p)

A list bullet needs to be followed by a space.

Replies from: MileyCyrus
comment by MileyCyrus · 2012-01-25T01:53:27.477Z · LW(p) · GW(p)

Appreciate it!

comment by Jonathan_Graehl · 2012-01-22T21:47:47.668Z · LW(p) · GW(p)

I find the idea (you'd be surprised at the temptations a trapped, powerful AI could offer) laudable. The existence of successful (for the trapped AI) roleplays were sufficient for me seriously consider it, and that's all that's necessary to come to the conclusion: only a real altruist with strong priors (or a jerk spoiling the game) could succeed as gatekeeper.

Replies from: Locke, Sly
comment by Joshua Hobbes (Locke) · 2012-01-22T23:56:20.800Z · LW(p) · GW(p)

I wouldn't identify myself as an altruist, but I can see that it would not be to my advantage not to loose a non-friendly god-like power upon the world. The way I see it releasing the AI is like casting the Wabbajack at Earth.

comment by Sly · 2012-01-23T22:00:21.304Z · LW(p) · GW(p)

Why are they jerks if they win? Is that not the whole point? I wouldn't put anyone as a gatekeeper unless they wanted to win!

Replies from: Jonathan_Graehl
comment by Jonathan_Graehl · 2012-01-23T22:29:57.402Z · LW(p) · GW(p)

The premise of the game is that honest roleplay should occur. A jerk who just wants to win the game by saying "no" is only pretending to roleplay.

Replies from: Sly, Dorikka
comment by Sly · 2012-01-24T08:12:07.801Z · LW(p) · GW(p)

Oh I see, so you only want to roleplay against people who aren't playing to win.

If you don't go into the game trying to win, your mindset is wrong. I would only assign gatekeepers who were serious about gate-keeping.

If the point is how easy it is to convince humans of things, then prove it.

If jerk strategy beats AI then jerk strategy is exactly what I will use. That is in my mind the whole point of the gatekeeper. Text as a medium of persuasion is limited, regardless of how amazingly smart the person on the other end is.

It is incredibly difficult if not impossible to convince someone who already has a set goal against your own.

Replies from: TheOtherDave
comment by TheOtherDave · 2012-01-24T13:50:46.811Z · LW(p) · GW(p)

If the game supports these sort of strategies, I'm happy to play the modified role of gatekeeper against any number of AIs, any time.

Don't even bother telling me about it, though, just assume that I'm using the winning strategy of walking away from the communications device and doing something else while the AI talks to empty space.

I predict that, using this strategy, I will win 100% of the time against human opponents.

(Just to be clear, I'm not making that prediction about an actual boxed AI.)

Replies from: wedrifid
comment by wedrifid · 2012-01-24T14:21:39.645Z · LW(p) · GW(p)

Don't even bother telling me about it, though, just assume that I'm using the winning strategy of walking away from the communications device and doing something else while the AI talks to empty space.

That seems to be a violation of the basic rules of the game. That gatekeeper can sit there, read the text and then just reply "F@#@ off" regardless of what they AI says but they have to engage to at least that degree.

Replies from: Sly, TheOtherDave
comment by Sly · 2012-01-24T19:49:57.810Z · LW(p) · GW(p)

Exactly. I already told people I would read everything they said. I can read what the AI says, and just mess with it. Trolling the whole time.

The point is, this game is trivially easy to win as the gatekeeper. You just need to think competitively. I don't see how the AI beats troll mode either. Text is weak.

My tone is intended to provoke challenges, because I estimated that no one would accept my challenge (and I really want to try this out).

Replies from: Prismattic
comment by Prismattic · 2012-01-25T02:26:26.016Z · LW(p) · GW(p)

A true superintelligent AI would probably do things like screw with the display's refresh rate to send subliminal signals or something else, beyond the plain text.

Replies from: Sly
comment by Sly · 2012-01-25T05:33:02.033Z · LW(p) · GW(p)

Really? I don't accept that the refresh rate of the computer can affect me to that degree, nor do I accept that the AI would magically know how to do that without data.

comment by TheOtherDave · 2012-01-24T15:02:36.509Z · LW(p) · GW(p)

Yeah, fair enough. I was annoyed by my reading of Sly's tone (and, indeed, I am now imagining Sly responding "hey, if it's more important to you to follow rules than win then...") and I overextended in consequence. Retracted.

comment by Dorikka · 2012-01-24T04:17:29.587Z · LW(p) · GW(p)

The strategy that he is attempting to roleplay, however, seems quite effective.

comment by D_Alex · 2012-01-23T08:32:05.000Z · LW(p) · GW(p)

So many Gatekeepers, so few AIs...

I predict that I have a small but significant chance (~20%) of getting released as an AI. Unfortunately, large wagers at appropriate odds would distort the Gatekeeper's motivations. I am willing to play for up to 10 karma at 5-1 odds. I live in Perth, Australia, and am available only during late evenings (8-12 pm) my time.

I also think the "AI Box Experiment" was an appalling shamble and a disservice to the world because of the "no reveal" rule. I shall not agree to such a rule ever again.

Replies from: Sly
comment by Sly · 2012-01-23T21:58:06.709Z · LW(p) · GW(p)

PM if you want to try and escape against me.

Replies from: antigonus
comment by antigonus · 2012-01-24T12:13:49.078Z · LW(p) · GW(p)

Can I play?