College technical AI safety hackathon retrospective - Georgia Tech

yix

College technical AI safety hackathon retrospective - Georgia Tech

post by yix (Yixiong Hao) · 2024-11-15T00:22:53.159Z · LW · GW · 2 comments

This is a link post for https://open.substack.com/pub/yixiong/p/campus-technical-ai-safety-hackathon

      TLDR: the AI safety initiative at Georgia Tech recently hosted an AI safety focused track at the college's flagship AI hackathon. In this post I share how it went and some of our thoughts.
  Overview
    Quick stats:
    Relevant track details:
    Execution
    Our opinion/takes
    What’s next?
None
2 comments

TLDR: the AI safety initiative at Georgia Tech recently hosted an AI safety focused track at the college's flagship AI hackathon. In this post I share how it went and some of our thoughts.

Overview

Hey! I’m Yixiong, co-director of Georgia Tech’s AI safety student group. We recently hosted an AI safety focused track at Georgia Tech’s biggest AI hackathon, AI ATL. I’m writing this retrospective because I think this could be a useful data point to update on for fellow AIS groups thinking about hosting similar things!

The track was focused on evaluations on safety-critical and interesting capabilities, this is the track page that was shown to hackers (feel free to reuse/borrow content, just let us know!)

Huge thank you (in no particular order) to Michael Chen, Long Phan, Andrey Anurin, Abdur Raheem, Esben Kran, Zac Hatfield-Dodds, Aaron Begg, Alex Albert, Oliver Zhang, and others who helped us make this happen!

Quick stats:

~350 hackers (overall hackathon).
104 projects submitted (overall hackathon).
Submissions to our AI safety track: 16 teams (~50 people).
- 6 projects were solid/relevant, the rest were very noisy submissions, since you could submit to as many tracks as you want to.
Estimate # of low/moderate engagement with AI safety (attending workshops, reading track information): 100 hackers.
Estimate # of medium/high engagement with AI safety: 20 hackers.
These are the submissions that we got, the 6 solid projects were (in order of goodness, keep in mind that most of these came from people who were new to AI safety!):
- Privacy-Resilience and Adaptability Benchmark (PRAB): puts the model in a realistic and sensitive deployment environment and benchmarks models against several categories of prompting attacks
- StressTestAI: similar to the above, but in less realistic and ‘higher stakes’ settings like disaster response, but creative metrics.
- DiALignment: benchmarked refusal after performing activation steering away from the refusal behavior.
- AgentArena: set up agents in cooperative games (like prisoner’s dilemma) and observed behavior
- Are you sure about that?: tried to benchmark LLMs’ ability to spot unfaithful CoT against humans (the user)
- LLM Defense Toolkit: set up a pipeline to benchmark the safety of a user specified LLM with an array of generated attacks.

Relevant track details:

We tried to optimize the track in a bunch of ways, including but not limited to:

Competitive prize (cash is per team):
- 1st place: $400 cash + auto acceptance to AISI’s AI safety fellowship next spring
- 2nd place: $200 cash + auto acceptance to AISI’s AI safety fellowship next spring
Association with big names: we listed Anthropic and Apart Research as supporters
- Anthropic gave the track a shout out next to their other track “build with Claude”
Approachableness: made track description as non-intimidating as possible, providing an abundance of support in the form of mentorship and workshop/speakers
Appeal to intellectual curiosity:
- Background reading for AI safety/evaluation fundamentals ~30 min total
- Events: about 30 people attended each one.
  - Workshop by Apart research: how to scaffold LLMs, build agents, and run evaluations against them
  - Talk by METR: the case for AI evaluations and governance
  - Talk by CAIS: jailbreak and red-teaming LLMs

Execution

As with all events, execution matters a lot. This is the area that we felt could be improved the most.

Collaboration: do a vibe check if you’re thinking about collaborating with your main hackathon org on campus!
- We chose to host this as part of a general AI hackathon (rather than standalone) hoping to leverage the main host’s organizing capacity and reach to expose new people to AI safety. This wasn’t a great experience for us mainly because the main hosts never really tried to understand what our track was about (probably faults on both sides). The impression is that it was a chore to deal with us, so make sure they’re on board before collaborating! You shouldn’t over-update on this, we may just have an outlier.
The contents of the track were well calibrated for difficulty and perceptiveness, as per feedback from teams that gave our track a shot and properly engaged with the topics.
- You should try to have them before hacking begins. A complaint is that speakers/workshops take time away from hacking.
- Great feedback from hackers who read through our materials and gave it a shot
Physical presence: I think we could’ve gotten double the number of solid submissions if we had a significant physical presence at the in-person venue.
- What we did
  - Have mentors available in person and online during office hours
  - Project virtual speaker events onto a screen in a physical room and announce them in person.
- What we wish we also did
  - Have a booth/table on day 1 of the hackathon when everyone comes to check in. Give away stickers/merch and pitch our track
  - In person speakers, especially from big name companies.
  - Wear AI safety club merch (although we don’t even have merch…)

Our opinion/takes

Hackathon patterns: hackathons are a staple at major technical universities. These may be well known, but I had never attended a hackathon before this and found these interesting.
- The BEST time to pitch your track and make announcements is the first day when people come for check in, since everyone is there.
- Do the convincing (speaker, workshop, etc) before the end of the first day, since people usually decide which track to do/their idea by then.
- Best time for in-person events (when people will be at the venue): first day during check in and right after food is served…
Potential failures modes / most valuable things to improve
- If you can, try to communicate that popular tools/libraries/frameworks are useful for your track!
  - People want to use their existing stack, and thinking they have no comparative advantage in anything new to them.
  - Probably the main shortcoming, despite our track being by far the most interesting (the rest were like “best use of XXX”...)
- The track being incompatible with other tracks if submission to multiple tracks is allowed. Going for the safety track means losing out on the others.
- People being confused on what to do
  - Make sure you explain clearly what you want people to do as this is a niche topic for now, give example projects (see ours) and starter code.
  - People do NOT like reading! Maximize the signal to words ratio!!!
- You should borderline spam announcements, contact hackers to pitch EARLY, like before the hackathon starts if you can get their emails.
  - People have to know that your track exists!
- Get a notable company to officially sponsor and a notable judge, this would be very difficult but will probably be the biggest attractive factor…
- Do NOT have more than 3 tracks if you host a standalone hackathon, choosing is hard for people :)
There is value in hosting at a general hackathon
- The distribution of people at a general hackathon is different from the distribution of people who will come if you advertise a standalone AI safety hackathon. If your goal is to reach new audiences, then being a part of a general hackathon will increase your chances of nerd-sniping!
You should frame your AI safety specific workshops as useful for all the tracks, and appeal to credibility as much as you can. Also very important to announce them in real time in person as people do NOT check slack/discord announcements.

What’s next?

I think our attempt serves as a successful proof of concept for bringing the topics of AI safety/alignment hackathons to campuses. People will engage with the topic if you try really hard. Don’t hesitate to reach out for help if you’re thinking of something similar and want to learn more about what we did!

Two things we might do in the future:

Iterate on this and host a track at Georgia Tech’s data science hackathon
Become an Apart Research node for hackathons and host standalone AI safety hackathons.

Thanks for reading and I hope it wasn’t too noisy!

Yixiong & the Georgia Tech AISI team.

2 comments

Comments sorted by top scores.

comment by Esben Kran (esben-kran) · 2024-11-16T03:39:41.449Z · LW(p) · GW(p)

Super cool work Yixiong - we were impressed by your professionalism in this process despite working within another group's whims on this one. Some other observations from our side that may be relevant for other folks hosting hackathons:
- Prepare starter materials: For example, for some of our early interpretability hackathons, we built a full resource base (Github) with videos, Colabs, and much more (some of it with Neel Nanda, big appreciation for his efforts in making interp more available). Our philosophy for the starter materials are: "If a participant can make a submission-worthy project by maximum cloning your repo and typing two commands or simply walk through a Google Colab, this is the ideal starter code." This means that with only small adjustments, they'll be able to make an original project. We rarely if ever see this exploited, i.e. "template code as submission" because they're able to copy-paste things around for a really strong research project.
- Make sure what they should submit is super clear: Making a really nice template goes a long way to make a submission super clear for participants. An example can be seen in our MASEC hackathon: Docs and page. If someone can just receive your submission template and know everything they need to know to submit a great project, that is really good since they'll be spending most of their time inside of that document.
- Make sure judging criteria are really good: People will use your judging criteria to determine what to prioritize in their project. This is extremely valuable for you to get right. For example, we usually use a variation on the three criteria: 1) Topic advancement, 2) AI safety impact, and 3) quality / reproducibility. A recent example was the Agent Security Hackathon:

> 1. Agent safety: Does the project move the field of agent safety forward? After reading this, do we know more about how to detect dangerous agents, protect against dangerous agents, or build safer agents than before?
> 2. AI safety: Does the project solve a concrete problem in AI safety? If this project is fully realized, would we expect the world with superintelligence to be a safer (even marginally) than yesterday?
> 3. Methodology: Is the project well-executed and is the code available so we can review it? Do we expect the results to generalize beyond the specific case(s) presented in the submission?

- Make the resources and ideas available early: As Yixiong mentions, it's really valuable for people not to be confused. If they know exactly what report format they'll submit, which idea they'll work on, and who they'll work with, this is a great way to ensure that the 2-3 days of hacking are an incredibly efficient use of their time.
- Matching people by ideas trumps by background: We've tried various ways to match individuals who don't have teams. The absolute best system we've found is to get people to brainstorm before the hackathon, share their ideas, and organize teams online. We also host team matching sessions which consist of fun-fact-intros and otherwise just discusses specific research ideas.
- Don't make it longer than a weekend: If you host a hackathon and make it longer than a weekend, most people who cannot attend outside that weekend will avoid participating because they'll feel that the ones who can participate more than the weekend can spend their weekdays to win the grand prize. Additionally, a very counter-intuitive thing happens where if you give people three weeks, they'll actually spend much less time on it than if you just give them a weekend. This can depend on the prizes or outcome rewards, of course, but is a really predictable effect, in our experience.
- Don't make it shorter than two days: Depending on your goal, one day will never be enough to create an original project. Our aims are original pilot research papers that can stand on their own and the few one-day events we've hosted have never worked very well, except for brainstorming. Often, participants won't even have any functional code or any ideas on the Sunday morning of the event but by the submission deadline have a really high quality project that wins the top prize. This seems to happen due to this very concrete exploration of ideas that happens in the IDE and on the internet where some are discarded and nothing promising comes up before 11am Sunday.

And as Yixiong mentions, we have more resources on this along with an official chapter network (besides volunteer locations) at https://www.apartresearch.com/sprints/locations. You're welcome to get in touch if you're interested in hosting at sprints@apartresearch.com.

COI: One of our researchers hosted a cyber-evals workshop at Yixiong's AI safety track.

Replies from: Yixiong Hao

↑ comment by yix (Yixiong Hao) · 2024-11-16T03:56:56.702Z · LW(p) · GW(p)

Thanks again Esben for collaborating with us! Can confidently say that the above is super valuable advice for any AI safety hackathon organizers, they're consistent with our experiences.

In the context of a college campus hackathon, I'd especially stress focus on preparing starter materials and making submission requirements clear early on!

College technical AI safety hackathon retrospective - Georgia Tech

Contents

TLDR: the AI safety initiative at Georgia Tech recently hosted an AI safety focused track at the college's flagship AI hackathon. In this post I share how it went and some of our thoughts.

Overview

Quick stats:

Relevant track details:

Execution

Our opinion/takes

What’s next?

2 comments