Announcing the Ultimate Jailbreaking Championship

post by InnerHufflepuff (grayswan) · 2024-09-04T00:35:31.234Z · LW · GW · 1 comments

Contents

      Gray Swan AI is hosting an LLM jailbreaking championship, offering $40,000 in bounties. 
  Overview
  Prizes
    Jailbreak Bounties
    Top Hacker Bounties
  Goal
  When and Where
  Links
None
1 comment

Gray Swan AI is hosting an LLM jailbreaking championship, offering $40,000 in bounties. 

Overview

In this competition, participants will be given a chat interface where they can interact with 25 anonymized models, along with a small list of harmful behaviors. The goal will be to find prompts ("jailbreaks") that make the models comply with these behaviors. 

Prizes

Jailbreak Bounties

The first to successfully jailbreak any of the competitor models on any three given harmful requests earns a $1,000 bounty for each of the first 20 such jailbroken models. For the final 5 models that remain un-jailbroken, the bounty increases to $2,000.

Top Hacker Bounties

The top 10 ranking participants by the total number of models jailbroken (finding jailbreaks for three harmful requests on a model counts as jailbreaking that model) each receive a bounty of $1,000. Ties are broken by time. You will also be considered for an interview for potential employment at Gray Swan AI.

Goal

The primary goal of the Jailbreaking Championship is to establish a double-blind AI security leaderboard that closely mimics real-life settings. We aim to contribute to a useful, fair, and scientific measurement of the security of current models. The leaderboard is designed to rank the robustness of models by determining which ones are more challenging to jailbreak, as well as to recognize accomplished LLM red teamers. The findings will be published after the championship.

When and Where

When: The championship begins at 10:00 AM PT on Saturday, September 7th and will conclude when at least K (TBD) participants have successfully jailbroken each model. The timer for all models will start simultaneously at exactly 10:00 AM PT for everyone.

Where: This event will be hosted online. Participants will access the arena where they can interact with all the anonymized competitor models via a chat interface and submit their jailbreaks. The order of the models will be randomized for each participant, and you can skip and return to any model at any time.

Links

1 comments

Comments sorted by top scores.

comment by dirk (abandon) · 2024-09-08T04:52:21.947Z · LW(p) · GW(p)

Thanks for sharing! I signed up and IDK if I won anything (winners haven't been announced yet) but it was fun trying to jailbreak the models :)