Posts

Comments

Comment by Paul McMillan (paul-mcmillan) on Jailbreaking GPT-4's code interpreter · 2023-07-17T18:57:36.717Z · LW · GW

The behavior is a bit of an implementation detail. We don't provision more than a single sandbox per user, so the data on disk within that sandbox can overlap when you have multiple concurrent sessions, even though the other aspects of the execution state are separate. I agree this is a bit surprising (though it has no security impact), and we've been discussing ways to make this more intuitive.

Comment by Paul McMillan (paul-mcmillan) on Jailbreaking GPT-4's code interpreter · 2023-07-14T21:59:31.775Z · LW · GW

I helped work on this feature, and we're glad you're enjoying looking for the limits!

There are many misconceptions about the intended boundaries for the sandbox. They're not inside the Python interpreter, and they're not in the model. Neither of those are intended to provide any security boundaries. There's nothing private or sensitive there, including the code we use to operate the service. You're intended to be able to view and execute code. That's what this is. A code execution service. You wouldn't be surprised you can see these things on a VM that AWS gave you.

As you have discovered, the model is not a very good source of information about its own limitations and capabilities. Many of the model behaviors you've interpreted as security measures are actually unwanted, and we're always working to make the model more flexible. Some of them (like where the model thinks it can read and write) are there to guide it, and discourage it from messing up the machine. Other limitations are mainly there to prevent you from wedging your own machine accidentally (but we know this is still possible if you try hard). Still others are there to prevent you from doing bad things to other people (e.g. the sandbox does not have access to the internet).

Since the feature is still in development, we have indeed provisioned a generous amount of resources for the sandbox, but there are some external controls that aren't visible to you. When we see excessive abuse, or behavior that impacts other users, we respond appropriately. These limits will change as the service matures.