Distillation Contest - Results and Recap

post by Aris · 2022-07-29T17:40:03.158Z · LW · GW · 0 comments

Contents

  Winners
  Scoring
  Advertising
    Who submitted and how did they hear about the contest?
    Other notes on advertising
  Impact
  Community Building Advice
  Takeaway
None
No comments

This post: 

Notes: 

A huge thank you to Akash and all of the judges for this contest! This wouldn’t have been possible without their work.  I’m definitely not perfect! I imagine there are better ways to advertise, run, and score a contest, so I wanted to be transparent about my process so that other people could make suggestions if they have ideas for improvements. 

Want to use the materials from the Distillation Contest to run your own version? I’m in the process of creating a platform for EA contests and will soon have a demo site up! I’m planning to upload all of the Distillation contest materials into a “bundle” so that anyone can host their own contest easily. Once the site is up (hopefully in the next couple of weeks), feel free to use anything there and iterate to make better resources. If you have your own EA contest resources and would like to make them available to other people, I’d love to add them to the notion!

Cross-posted on the EA Forum. 

Winners

The submission winning first place in the Distillation Contest is Understanding Selection Theorems [LW · GW] by UC Berkeley’s Adam Khoja – distilling John Wentworth's Selection Theorems: A Program for Understanding Agents [LW · GW]. Our second-place winner is The Geometry of Adversarial Perturbations by Gabriel Wu from Harvard University – a distillation of Universal Adversarial Perturbations

We’ve granted 15 other prizes, six $500 awards, and nine $250 awards, plus three honorable mentions. The six $500 winners are listed below, with their distillations linked to their names. You can find the other finalists, and their distillations, on the EA Berkeley distillation contest winners page.

Callum McDougall, Cambridge

Jasper Day, University of Edinburgh (has not yet given permissions to share submission)

Harrison Gietz [LW · GW], Louisiana State University

Sasha Sato, UC Berkeley

Chinmay Deshpande, Harvard University

Yash Dave, UC Berkeley

Scoring

Each submission was scored by two judges (of which we had five total, all of whom actively work in the alignment space). Our rubric took into account the submission’s Depth of Understanding, Clarity of Presentation, Concision/Length, Originality of Insight, Accessibility, and two extra-subjective measures: X-Factor and Subjective Rating, explained below. 

X-Factor

Subjective Rating

Once the judges had scored their submissions, we found the average score for that judge and then divided each submission score by the average to get an adjusted score. If the average score was 40, a score of 50 would turn into a 1.25.

There were 19 submissions that were rated above average by both of their judges. Given that this was already above the number of submissions we said we’d award, we moved on to mostly comparing these submissions to one another. We also looked at “controversial” submissions– submissions in which one judge rated a distillation above average and the other judge rated the submission below average–to see if there were any high-quality responses we should give a prize. 

One of these “controversial” submissions was rated so highly by one judge (and barely below average for another) that its average was high enough to receive an award. It was later realized that this was an error in the record of judges’ scores and the submission received a $500 award.

The above-average (by both judges) scores were then put in descending order, in order of adjusted score. They were labeled by the place they came in (1, 2, 3…). Since there were two scored versions of each, I took both place numbers and added them. Then, I sorted the submissions by this newly added score and the submissions with the lowest total scores won. 

Advertising

I currently believe that the best direct outreach methods for an EA contest are through reminders from an existing EA or AI Safety group, in-class announcements, and flyering. The best indirect methods seem to be advertising to other ea group organizers and newsletters or blogs that might advertise your contest. 

Who submitted and how did they hear about the contest?

Other notes on advertising

Impact

I’d imagine I invested something like 60-80 hours into this contest over the past few months and other people invested a total of something like 70 hours (advertizing, getting funding, collaborating, scoring submissions). If that found us a few more people who counterfactually pursue AI Safety careers or AI Safety researchers who gain counterfactual opportunities because they received an award in this contest, that seems worth the time to me. And it seems like those things are, at least somewhat, on track to happen: 

Promising people found EA and my university’s EA group through contests! (Yay nerdsniping!) 

I’ve already had multiple occasions where I've been asked to recommend promising students for mentorship or opportunities.  

As someone who didn't study CS, reading these distillations has been helpful to me. A couple of other people who have read the submissions have said the same :) You can read the linked distillations and see if this happens to you too! 

Community Building Advice

These are things I've either learned about contest community building or things I wish I had known when I started creating contests: 

I also followed up with John Wentworth, whose post Call for Distillers [LW · GW] inspired the contest, for feedback. The creator of the winning distillation was gauged to be about the right level for SERI MATS. This indicates to me that the Distillation Contest is able to distinguish people who could be talented at AI Safety research to at least some degree. 

Takeaway

In general, it seems to me that contests are a pretty low-stakes way to: 

I also think contests could be used in other cause areas and have many potential uses (upskilling, producing valuable outputs, etc.), but I think those vary depending on the contest itself.

Thanks for reading!

0 comments

Comments sorted by top scores.