Book a Time to Chat about Interp Research
post by Logan Riggs (elriggs) · 2024-12-03T17:27:46.808Z · LW · GW · 2 commentsContents
Who is this for? I'm Useful, I Swear Why are You Doing this? How to Book None 2 comments
In the spirit of the season, you can book a call with me to help w/ your interp project (no large coding though)
Would you like someone to:
- Review your paper or code?
- Brainstorm ideas on next steps?
- How to best communicate your results?
- Discuss conceptual problems
- Obvious Advice (e.g. being affected by SAD because it's winter, not exercising, not getting enough sleep)
- [anything else that would be useful!]
When we're chatting, you can interrupt me to better focus on what you specifically want. We can set [5] minute timers to try to solve whatever problem you're having. You can steer the conversation or I can!
Who is this for?
Anyone working on interp research. My niche is SAE's, so that's likely where I'm most helpful.
This is for both junior and senior researchers. If you're unsure, feel free to dm me here on LW, and I'll let you know!
I'm Useful, I Swear
I worked on one of the earliest SAE papers, applied SAEs to Preference Modelss [LW · GW], and co-authored Decomposing The Dark Matter of SAEs (in fact, Josh asked me to work w/ him on this project because I was useful chatting about it!).
I'd like to donate at least 20 hours of my time this month, so please have a low bar for booking a time w/ me! (booking times are for 1 hour, but feel free to just use 15 minutes!).
Why are You Doing this?
I'm currently burnt out on coding on my edge-sparse SAE project (hence why I don't want to code atm); however, talking to folks for 8 hours a day about research isn't tiring for me!
So I'd still like to be useful to our research community, even it's not through novel research for these next few weeks.
How to Book
Here's the calendly link. Do send a quick blurb on the topic or a link to your paper or document or whatever you think is helpful for me to best help you!
2 comments
Comments sorted by top scores.
comment by jacquesthibs (jacques-thibodeau) · 2024-12-03T18:23:47.705Z · LW(p) · GW(p)
I sent an invite, Logan! :)
Shameless self-plug: Similarly, if anyone wants to discuss automating alignment research, I'm in the process of building an organization to make that happen. I'm reaching out to Logan because I have a project in mind regarding automating interpretability research (e.g. making AIs run experiments that try to make DL models more interpretable), and he's my friend! My goal for the org is to turn it into a three-year moonshot to solve alignment. I'd be happy to chat with anyone who would be interested in chatting further about this (I'm currently testing fit with potential co-founders and seeking a cracked basement CTO).
comment by Noa Nabeshima (noa-nabeshima) · 2024-12-03T21:37:08.910Z · LW(p) · GW(p)
I really enjoy chatting with Logan about interpretability research.