Posts

Comments

Comment by Platinuman (aaron-ho) on Using GPT-Eliezer against ChatGPT Jailbreaking · 2022-12-06T20:28:07.830Z · LW · GW

As far as I can tell, OpenAI is already using a separate model to evaluate prompts. See their moderation API at https://beta.openai.com/docs/guides/moderation/overview. Looking at the network tab, ChatGPT also always sends a request to "text-moderation-playground" first.