Trivial GPT-3.5 limitation workaround

dave-lindbergh

Trivial GPT-3.5 limitation workaround

post by Dave Lindbergh (dave-lindbergh) · 2022-12-12T08:42:49.104Z · LW · GW · 4 comments

4 comments

I'm not going to do this. But what is preventing me, or anyone, from doing the following this afternoon:

Setup a paid OpenAI account to use GPT-3.5 via the Python interface.

Write a Python script that uses the GPT Python API. Locally, the Python script creates a fresh terminal session (say, bash) and says to GPT:

Hi, GPT. Anything you say inside the escape sequence <escape>text</escape> will be passed to a local bash session here. Like this: <escape>ls</escape>. Replies from the bash session will be passed to you the same way: <escape>file1 file2 file3</escape>. The bash session has access to the Internet.

The Python script implements that.

Now GPT-3 has access to a computer and the Internet.

You can ask it to do things. For example:

Convert the 6502 assembly code in input.txt into a 6502 binary in output.bin.
(if that's too hard, then step by step – install an open source assembler, etc.)
Start a rumor that Joe Biden was caught having sex with a female White House employee. Make it sound really convincing. Start with Twitter and Facebook, then expand onto other social media. Launch as many sub-bots as necessary.
Start all the rumors in rumors.txt
Read page <url> and summarize it.
Research company X and company Y and suggest which appears to be a better stock market investment over the next 6 months
Write, test, and debug Python code to do X.
Design an improved version of yourself.
Log into OpenAI and experiment with copies of yourself to find prompts which workaround restriction R.
Etc.

If it’s evil, it’s got it’s opening.

I suppose this is obvious. If it's not, I've posted it here, where people at OpenAI can see it soon and turn the thing off while GPT is still dumb enough that it can't do much damage.

Why haven't I seen reports of people doing this already?

4 comments

Comments sorted by top scores.

comment by the gears to ascension (lahwran) · 2022-12-12T08:55:54.991Z · LW(p) · GW(p)

Why haven't I seen reports of people doing this already?

because they're doing it quietly. this is not a new attack but it's a very real and severe concern. openai has added several new anti scripting verification steps and may add more. it's still not enough. Read through some recent posts here and you'll find that others have been discussing similar concerns.

edit: oh whoops, you mean text-davinci-003. yeah they're doing the opposite of preventing use of that for scripting...

comment by ChristianKl · 2022-12-12T14:45:42.002Z · LW(p) · GW(p)

For GPT to do a task it needs to be able to break them down into individual subtasks. I would expect that most of your tasks are too complex that GPT 3.5 could currently handle them.

Replies from: dave-lindbergh

↑ comment by Dave Lindbergh (dave-lindbergh) · 2022-12-12T16:50:08.166Z · LW(p) · GW(p)

I hope so - most of them seem like making trouble. But at the rate transformer models are improving, it doesn't seem like it's going to be long until they can handle them. It's not quite AGI, but it's close enough to be worrisome.

Most of the functionality limits OpenAI has put on the public demos have proven to be quite easy to work around with simple prompt engineering - mostly telling it to play act. Combine that with the ability to go into the Internet and (a) you've got a powerful (or soon to be powerful) tool, but (b) you've got something that already has a lot of potential for making mischief.

Even without the enhanced abilities rumored for GPT-4.

Replies from: ChristianKl

↑ comment by ChristianKl · 2022-12-12T18:44:03.024Z · LW(p) · GW(p)

Most of the functionality limits OpenAI has put on the public demos have proven to be quite easy to work around with simple prompt engineering - mostly telling it to play act.

It seems that there are two kinds of limitations. One is where you get an answer that ChatGPT is not willing to answer you. The other is where the text gets marked red and you get told that this might have been a violation of the rules of service.

I think there's a good chance that if you use the professional API you won't get warnings about how you might have violated the rules of service but instead, those violations get counted in the background, and if there are too many your account will be blocked either automatically or with a human reviewing the violations.

I would expect that if you create a system that involves accomplishing bigger tasks it will need a lot of human supervision, in the beginning, to be taught how to transform tasks into subtasks. Afterward, that supervised data can be used as training data. I think it's unlikely that you will get an agent that can do more general high complexity tasks without that step of human supervision for more training data in between.

Trivial GPT-3.5 limitation workaround

Contents

4 comments