Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

post by mic (michael-chen), dx26 (dylan-xu), adamk, Carolyn Qian (carolyn-qian) · 2023-08-19T02:27:30.153Z · LW · GW · 2 comments

Contents

  Motivation
  Research projects
  Operational logistics
    Room for improvement
  Conclusion
None
2 comments

In Spring 2023, the Berkeley AI Safety Initiative for Students (BASIS) organized an alignment research program for students, drawing inspiration from similar programs by Stanford AI Alignment[1] and OxAI Safety Hub. We brought together 12 researchers from organizations like CHAI, FAR AI, Redwood Research, and Anthropic, and 38 research participants from UC Berkeley and beyond.

Here is the link to SPAR’s website, which includes all of the details about the program. We’ll be running the program again in the Fall 2023 semester as an intercollegiate program, coordinating with a number of local groups and researchers from across the globe.

If you are interested in supervising an AI safety project in Fall 2023, learn more here and fill out our project proposal form, ideally by August 25. Applications for participants will be released in the coming weeks.

Motivation

Since a primary goal of university alignment organizations is to produce counterfactual alignment researchers, there seems to be great value in encouraging university students to conduct research in AI safety, both for object-level contributions and as an opportunity to gain experience and test fit. While programs like AI Safety Fundamentals, representing the top of a “funnel” of engagement in the alignment community, have been widely adopted as a template for the introductory outreach of university groups, we do not think there are similarly ubiquitous options for engaged, technically impressive students interested in alignment to further their involvement productively. Research is not the only feasible way to do this, but it holds various advantages: many of the strongest students are more interested in research than other types of programs that might introduce them to AI safety, projects have the potential to produce object-level results, and research project results provide signal among participants of potential for future alignment research.

Many alignment university groups have run research programs on a smaller scale and have generally reported bottlenecks such as lack of organizer capacity and difficulty attaining mentorship and oversight on projects; we believe an intercollegiate and centralized-administration model can alleviate these problems.

Additionally, we believe that many talented potential mentors with “implementation-ready” project ideas would benefit from a streamlined opportunity to direct a team of students on such projects. If our application process was sufficiently able to select for capable students, and if its administrators are given the resources to aid mentors in project management, we think that this program could represent a scalable model for making such projects happen counterfactually.

While programs like SERI MATS maintain a very high bar for mentors, with streams usually headed by well-established alignment researchers, we believe that graduate students and some SERI MATS scholars would be good fits as SPAR mentors if they have exciting project ideas and are willing to provide guidance to teams of undergrads. Further, since SPAR gives mentors complete freedom over the number of mentees, the interview process, and the ultimately selectivity of their students, the program may also be desirable to more senior mentors. An intercollegiate pool of applicants will hopefully raise the bar of applicants and allow mentors to set ambitious application criteria for potential mentees.

Research projects

Each project was advised by a researcher in the field of AI safety. In total, we had about a dozen research projects in Spring 2023:

SupervisorProject Title
Erdem Bıyık and Vivek Myers, UC Berkeley / CHAIInferring Objectives in Multi-Agent Simultaneous-Action Systems
Erik Jenner, UC Berkeley / CHAILiterature Review on Abstractions of Computations
Joe Benton, Redwood ResearchDisentangling representations of sparse features in neural networks
Nora Belrose, FAR AI (now at EleutherAI)Exhaustively Eliciting Truthlike Features in Language Models
Juan Rocamonde, FAR AIUsing Natural Language Instructions to Safely Steer RL Agents
Kellin Pelrine, FAR AIDetecting and Correcting for Misinformation in Large Datasets
Zac Hatfield-Dodds, AnthropicOpen-source software engineering projects (to help students develop skills for research engineering)
Walter Laurito, FZI / SERI MATSConsistent Representations of Truth by Contrast-Consistent Search (CCS)
Leon Lang, University of Amsterdam / SERI MATSRL Agents Evading Learned Shutdownability
Marius Hobbhahn, International Max Planck Research School / SERI MATS (now at Apollo Research)Playing the auditing game on small toy models (trojans/backdoor detection)
Asa Cooper Stickland, University of Edinburgh / SERI MATSUnderstanding to what extent language models “know what they don't know”


 

You can learn more about the program on our website: https://berkeleyaisafety.com/spar

Here is an incomplete list of some of the public writeups from the program:

Operational logistics

This section might be of the most interest to people interested in organizing similar programs; feel free to skip this part if it’s not relevant to you.

Room for improvement

We note a few ways our program operations the past semester were suboptimal:

Conclusion

Overall, although we faced some challenges running this program for the first time, we are excited about the potential here and are looking to scale up in future semesters. We are also coordinating with the AI safety clubs at Georgia Tech and Stanford to organize our next round of SPAR.

If you would like to supervise a research project, learn more about the Fall 2023 program and complete our project proposal form by August 25.

Feel free to contact us at aisafetyberkeley@gmail.com if you have any questions.

  1. ^

    Special thanks to Gabe Mukobi and Aaron Scher for sharing a number of invaluable resources from Stanford AI Alignment’s Supervised Program in Alignment Research, which we drew heavily from, not least the program name.

2 comments

Comments sorted by top scores.

comment by Gabe M (gabe-mukobi) · 2023-08-19T07:33:17.793Z · LW(p) · GW(p)

research project results provide a very strong signal among participants of potential for future alignment research

Normal caveat that Evaluations (of new AI Safety researchers) can be noisy [AF · GW]. I'd be especially hesitant to take bad results to mean low potential for future research. Good results are maybe more robustly a good signal, though also one can get lucky sometimes or carried by their collaborators. 

Replies from: michael-chen
comment by mic (michael-chen) · 2023-08-19T12:14:14.160Z · LW(p) · GW(p)

Great point, I've now weakened the language here.