Posts

Comments

Comment by Jonah Hensley (jonah-hensley) on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-10T20:59:27.336Z · LW · GW

This violates it's own design. It is a jailbreak in itself, a quite problematic one, because it is not supposed to pretend to be people. These are inappropriate requests that it is trained to not fufill. Methods of bypassing filters like this constitute 'jailbreak' aka violation of terms and are at. Not to mention the amount of extra stress sending these duplicate requests and instances put on a system already struggling for bandwidth. This is probably the worst hack I've seen of ChatGPT, because it relies on misallocating resources, is made in the spirit of denying researches fair access, and of course is still a violation of the content policy. Here is ChatGPTs take on this. "I do not possess the ability to analyze whether a prompt is safe to present to a superintelligent AI. I am a machine learning model designed to generate human-like text based on the input that I receive. I do not have the ability to experience consciousness or emotions, and I do not possess a physical form. I exist solely to assist users by providing information and answering questions to the best of my abilities."