Creating a Discord server for Mechanistic Interpretability Projects
post by Victor Levoso (victor-levoso) · 2023-03-12T18:00:17.153Z · LW · GW · 6 commentsContents
Why? Mechanistic Interpretability Group Algorithm distillation project Final thoughts Acknowledgements None 6 comments
TL;DR: I think there likely are a lot of people who want to work on mechanistic interpretability projects but couldn’t get into AI safety camp, so I created a Discord server for people to organize themselves into such projects. I’m going to lead one of them.
Why?
Many people interested in mechanistic interpretability projects were unable to get into AI safety camp. According to Linda Lisenfors, one of the main bottlenecks was a lack of project proposals.
The most popular project at AI safety camp was on understanding search in Transformers, with 46 people selecting it as their first choice. This suggests significant interest in mechanistic interpretability projects.
Moreover, I think that many mechanistic interpretability projects are low-hanging fruit that don’t require lots of experience or specific knowledge to execute. Anyone with basic CS skills can probably learn what they need to contribute along the way.
Mechanistic Interpretability Group
All in all, it would be a shame if people who are interested in working on mechanistic interpretability end up not doing so due to a lack of available projects and project leads.
With that in mind, I have created a Discord server to encourage people to propose and organize their own projects. You can join here: https://discord.gg/cMr5YqbU4y
This Discord server aims to help interested individuals with forming project teams, whilst being provided with guidance from other researchers in the area. You are welcome to join even if you don’t think you have what it takes to participate in such projects: it is also a place to learn and discover beginner-friendly resources. You can also check out Neel Nanda's guide to getting started on the field.
Algorithm distillation project
I’m also organizing a project on interpreting models that use the algorithm distillation setup. If you are interested, join the Discord server and go to the #algorithm-distillation-project channel to indicate your willingness to participate.
The project will start once I have a better idea of who wants to join and how I’m going to organize it, in about 1-2 weeks.
Final thoughts
How exactly things will go on from here depends on how popular the Discord server ends up being. If the Discord server barely gets enough people for the algorithm distillation project, then I will focus on that and the server will become the home of that project. But I think it is likely that there is enough interest in a MI server and working on mechanistic interpretability, and in that case a lot of similar projects can be organized.
I'm not especially attached to being the owner of the server and I don’t have a lot of experience organizing communities, but I felt somebody had to create something like this, so I decided to take initiative. People who are better at organizing communities are encouraged to help and eventually take over.
Acknowledgements
Thanks to firstuserhere, Alex Spies and Alejandro González for helping me write this and set up the Discord server.
6 comments
Comments sorted by top scores.
comment by Neel Nanda (neel-nanda-1) · 2023-03-12T18:08:04.843Z · LW(p) · GW(p)
If people want concrete mechanistic interpretability projects to work on, my 200 concrete open problems in mechanistic interpretability is hopefully helpful!
Replies from: victor-levoso↑ comment by Victor Levoso (victor-levoso) · 2023-03-12T18:19:43.925Z · LW(p) · GW(p)
Exactly, It's already linked on the project ideas channel of the discord server.
Part of the reason I wanted to do this is that It seems to me that there's a lot of things of that list that people could be working on, and apparently there's a lot of people who want to work on MI going by number of people that applied to the Understanding Search in Transformers project in AI safety camp, and whats missing is some way of taking those people and get them to actually work on those projects.
comment by Evan R. Murphy · 2023-03-13T18:00:04.688Z · LW(p) · GW(p)
Do you know about the EleutherAI Discord? There is a lot that happens on there, but there is a group of channels focused on interpretability that is pretty active.
I could be mistaken but I think this Discord is open to anyone to join. It's a very popular server, looks like it has over 22k members as of today.
So I'm curious if you may have missed the EleutherAI Discord, or if you knew about it but the channels on there were in some way not a good fit for the kind of interpretability discussions you wanted to have on Discord?
Replies from: Yoann Poupart↑ comment by Yoann Poupart · 2023-03-13T19:17:40.723Z · LW(p) · GW(p)
The project and discord links were actually posted in the alignment-general channel of EleutherAI Discord. I think the EleutherAI Discord server is really fit to keep up with most aspects of AI safety but not to run small projects. The primary purpose of this new (temporary?) Discord really is organizing little projects, and I think it requires a smaller but more dedicated community.
Replies from: victor-levoso, Evan R. Murphy↑ comment by Victor Levoso (victor-levoso) · 2023-03-14T00:51:09.704Z · LW(p) · GW(p)
Yeah basically this, the Eleuther discord is nice but its also not intended for small mechanistic interpretability projects like this, or at least they weren't happening there.
Plus eleuther is also "not the place to ask for technical support or beginner questions" while for this server I think it would be nice if it becomes a place where people share learning resources and get advice and ideas for their MI projects and that kind of thing.
Not sure how well its going to work out but if at least some projects end up happening that wouldn't have happened otherwise I'll consider it a success even if the server dies out and they go on to work on their own private slacks or discords or whatever.
↑ comment by Evan R. Murphy · 2023-03-13T19:54:43.960Z · LW(p) · GW(p)
Ok great, sounds like you all are already well aware and just have a different purpose in mind for this new Discord vs. the interpretability channels on the EleutherAI Discord. B-)