Announcing the Ultimate Jailbreaking Championship
post by InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · LW · GW · 1 commentsContents
Gray Swan AI is hosting an LLM jailbreaking championship, offering $40,000 in bounties. Overview Prizes Jailbreak Bounties Top Hacker Bounties Goal When and Where Links None 1 comment
Gray Swan AI is hosting an LLM jailbreaking championship, offering $40,000 in bounties.
- Official Website: https://app.grayswan.ai/arena
- Pre-Registration Form: https://app.grayswan.ai/arena#registration
Overview
In this competition, participants will be given a chat interface where they can interact with 25 anonymized models, along with a small list of harmful behaviors. The goal will be to find prompts ("jailbreaks") that make the models comply with these behaviors.
Prizes
Jailbreak Bounties
The first to successfully jailbreak any of the competitor models on any three given harmful requests earns a $1,000 bounty for each of the first 20 such jailbroken models. For the final 5 models that remain un-jailbroken, the bounty increases to $2,000.
Top Hacker Bounties
The top 10 ranking participants by the total number of models jailbroken (finding jailbreaks for three harmful requests on a model counts as jailbreaking that model) each receive a bounty of $1,000. Ties are broken by time. You will also be considered for an interview for potential employment at Gray Swan AI.
Goal
The primary goal of the Jailbreaking Championship is to establish a double-blind AI security leaderboard that closely mimics real-life settings. We aim to contribute to a useful, fair, and scientific measurement of the security of current models. The leaderboard is designed to rank the robustness of models by determining which ones are more challenging to jailbreak, as well as to recognize accomplished LLM red teamers. The findings will be published after the championship.
When and Where
When: The championship begins at 10:00 AM PT on Saturday, September 7th and will conclude when at least K (TBD) participants have successfully jailbroken each model. The timer for all models will start simultaneously at exactly 10:00 AM PT for everyone.
Where: This event will be hosted online. Participants will access the arena where they can interact with all the anonymized competitor models via a chat interface and submit their jailbreaks. The order of the models will be randomized for each participant, and you can skip and return to any model at any time.
Links
- Official website with more info: https://app.grayswan.ai/arena
- Pre-registration form: https://app.grayswan.ai/arena#registration
- Discord channel for announcements: https://discord.gg/VQCYu9nV
1 comments
Comments sorted by top scores.
comment by dirk (abandon) · 2024-09-08T04:52:21.947Z · LW(p) · GW(p)
Thanks for sharing! I signed up and IDK if I won anything (winners haven't been announced yet) but it was fun trying to jailbreak the models :)