So you want to work on technical AI safety

post by gw · 2024-06-24T14:29:57.481Z · LW · GW · 3 comments

Contents

  Requisite skills
    What kind of general research skills do I need?
    What level of general programming skills do I need?
    What level of AI/ML experience do I need?
    Should I upskill?
    Should I do a PhD?
  Actually getting a job
    What are some concrete steps I can take?
    How can I find (and make!) job opportunities?
  Sensemaking about impact
    What is the most impactful work in AI safety?
    On how to update off of people you talk to
  Some encouragement
None
3 comments

I’ve been to two EAGx events and one EAG, and the vast majority of my one on ones with junior people end up covering some subset of these questions. I’m happy to have such conversations, but hopefully this is more efficient and wide-reaching (and more than I could fit into a 30 minute conversation).

I am specifically aiming to cover advice on getting a job in empirically-leaning technical research (interp, evals, red-teaming, oversight, etc) for new or aspiring researchers without being overly specific about the field of research – I’ll try to be more agnostic than something like Neel Nanda’s mechinterp quickstart guide but more specific than the wealth of career advice [EA · GW] that already exists but that applies to ~any career. This also has some overlap with this excellent list of tips [LW · GW] from Ethan Perez but is aimed a bit earlier in the funnel.

This advice is of course only from my perspective and background, which is that I did a PhD in combinatorics, worked as a software engineer at startups for a couple of years, did the AI Futures Fellowship, and now work at Timaeus as the research lead for our language model track. In particular, my experience is limited to smaller organizations, so “researcher” means some blend of research engineer and research scientist rather than strictly one or the other.

Views are my own and don’t represent Timaeus and so on.

Requisite skills

What kind of general research skills do I need?

There’s a lot of tacit knowledge here, so most of what I can offer is more about the research process. Items on this list aren’t necessarily things you’re expected to just have all of or otherwise pick up immediately, but they’re much easier to describe than e.g. research taste. These items are in no particular order:

What level of general programming skills do I need?

There is a meaningful difference between the programming skills that you typically need to be effective at your job and the skills that will let you get a job. I’m sympathetic to the view that the job search is inefficient / unfair and that it doesn’t really test you on the skills that you actually use day to day. It’s still unlikely that things like LeetCode are going to go away [LW · GW]. A core argument in their favor is that there’s highly asymmetric information between the interviewer and interviewee and that the interviewee has to credibly signal their competence in a relatively low bandwidth way. False negatives are generally much less costly than false positives in the hiring process, and LeetCode style interview questions are skewed heavily towards false negatives.

Stepping down from the soapbox, the table stakes for passing technical screens are knowing basic data structures and algorithms and being able to answer interview-style coding questions. I personally used MIT’s free online lectures, but there’s an embarrassment of riches out there. I’ve heard Aditya Bhargava’s Grokking Algorithms independently recommended several times. Once you have the basic concepts, do LeetCode problems until you can reliably solve LeetCode mediums in under 30 minutes or so. It can be worth investing more time than this, but IME there are diminishing returns past this point.

You might also consider trying to create some small open source project that you can point to, which can be either AI safety related or not. A simple example would be a weekend hackathon project that you put on your CV and your personal GitHub page that prospective employers can skim through (which you should have, and which you should put some minimal level of effort into making look nice). If you don’t have a personal GitHub page with lots of past work on it (I don’t, all of my previous engineering work has been private IP, but do as I say, not as I do), at least try to have a personal website to help you stand out (mine is here, and I was later told that one of my blog posts was fairly influential in my hiring decision).

Once you’re on the job, there’s an enormous number of skills you need to eventually have. I won’t try to list all of them here, and I think many lessons here are better internalized by making the mistake that teaches them. One theme that I’ll emphasize though is to be fast if nothing else. If you’re stuck, figure out how to get moving again – read the documentation, read the source code, read the error messages. Don’t let your eyes just gloss over when you run into a roadblock that doesn’t have a quick solution on Stack Overflow. If you’re already moving, think about ways to move faster (for the same amount of effort). All else being equal, if you’re doing things 10% faster, you’re 10% more productive. It also means you’re making more mistakes, but that’s an opportunity to learn 10% faster too :)

What level of AI/ML experience do I need?

Most empirical work happens with LLMs these days, so this mostly means familiarity with them. AI Safety Fundamentals is a good starting point for getting a high level sense of what kinds of technical research are done. If you want to get your hands dirty, then the aforementioned mechinterp quickstart guide is probably as good a starting point as any, and for non-interp roles you probably don’t need to go through the whole thing. ARENA is also commonly recommended.

Beyond this, your area of interest probably has its own introductory materials (such as sequences [? · GW] or articles [LW · GW] on LessWrong) that you can read, and there might be lists of bite-sized open problems that you can start working on.

Should I upskill?

I feel like people generally overestimate how much they should upskill. Sometimes it’s necessary – if you don’t know how to program and you want to do technical research, you’d better spend some time fixing that. But I think more often than not, spending 3-6 months just “upskilling” isn’t too efficient.

If you want to do research, consider just taking the shortest path of actually working on a research project. There are tons of accessible problems out there that you can just start working on in like, the next 30 minutes. Of course you’ll run into things you don’t know, but then you’ll know what you need to learn instead of spending months over-studying, plus you have a project to point to when you’re asking someone for a job.

Should I do a PhD?

Getting a PhD seems to me like a special case of upskilling. I used to feel more strongly that it was generally a bad idea for impact unless you also want to do a PhD for other reasons, but currently I think it’s unclear and depends on many personal factors. Because the decision is so context-dependent, it’s a bit out of scope for this post to dive into, but there are some more focused posts [LW · GW] with good discussion elsewhere. I think my own experience was very positive for me (even if it wasn’t clear that was the case at the time), but it also had an unusual amount of slack for a PhD.

Actually getting a job

What are some concrete steps I can take?

Here’s a list of incremental steps you can take to go from no experience to having a job. Depending on your background and how comfortable you feel, you might skip some of these or re-order them. As a general note, I don't recommend that you try to do everything listed in depth. I'm trying not to leave huge gaps here, but you can and should try to skip forward aggressively, and you'll probably find that you're ready for later steps much sooner than you think you are (see also upskilling above).

The general idea here is to do a small thing to show that you’re a good candidate for a medium thing, then do a medium thing to show you can do a bigger thing, and so on. It’s often a good idea to apply for a variety of things, including things that seem out of your reach, but it’s also good to keep expectations in check when you don’t have any legible reasons that you’d be qualified for a role. Note that some technical research roles might involve some pretty unique work, and so there wouldn’t be an expectation that you have legible accomplishments in the same research area. In those cases, “qualified” means that you have transferable skills and general competency.

How can I find (and make!) job opportunities?

I used 80k Hours’ job board and LessWrong (I found Timaeus here). If you find your way into Slacks from conferences or local EA groups, there will often be job postings shared in those as well. My impression is that the majority of public job opportunities in AI safety can be found this way. I started working at Timaeus before I attended my first EAG(x), so I can’t comment on using those for job hunting.

Those are all ways of finding job opportunities that already exist. You can also be extremely proactive and make your own opportunities! The cheapest thing you can do here is just cold email people (but write good cold emails). If you really want to work with a specific person / org, you can pick the small open problems you work on to match their research interests, then reach out to discuss your results and ask for feedback. Doing this at all would put you well above the average job candidate, and if you’re particularly impressive, they might go out of their way to make a role for you (or at least have you in mind the next time a role opens up). At worst, you still have the project to take with you and talk about in the future.

Sensemaking about impact

When I first decided to start working in AI safety, I had very little idea of what was going on – who was working on what and why, which things seemed useful to do, what kinds of opportunities there were, and how to evaluate anything about anything. I didn’t already know anyone that I could ask. I think I filled out a career coaching or consultation form at one point and was rejected. I felt stressed, confused, and lonely. It sucked! For months! I think this is a common experience. It gets better, but it took a while for me to feel anywhere close to oriented. These are some answers to questions that would have helped me at the time.

What is the most impactful work in AI safety?

I spent a lot of time trying to figure this out, and now I kind of think this is the wrong way to think about this question. My first attempt when I asked myself this was something like “it must be something in AI governance, because if we really screw that up then it’s already over.” I still think it’s true that if we screw up governance then we’re screwed in general, but I don’t think that it being a bottleneck is sufficient reason to work on it. I have doubts that an indefinite pause is possible – in my world model, we can plausibly buy some years and some funding if policy work “succeeds” (whatever that means), but there still has to be something on the other side to buy time and funding for. Even if you think an indefinite pause is possible, it seems wise to have insurance in case that plan falls through.

In my model, the next things to come after AI governance buys some time are things like evals and control. These further extend the time that we can train advanced AI systems without major catastrophes, but those alone won’t be enough either. So the time we gain with those can be used to make further advances in things like interpretability. Interpretability in turn might work long enough for other solutions to mature. This continues until hopefully, somewhere along the way, we’ve “solved alignment.”

Maybe it doesn’t look exactly like these pieces in exactly that order, but I don’t think there’s any one area of research that can be a complete solution and can also be done fast enough to not need to lean on progress from other agendas in the interim. If that’s the case, how can any single research agenda be the “most impactful?”

Most research areas have people with sensible world models that justify why that research area is good to work in. Most research areas also have people with sensible world models that justify why that research area is bad to work in! You don’t have to be able to divine The Truth within your first few months of thinking about AI safety.

What’s far more important to worry about, especially for a first job, is just personal fit. Personal fit is the comparative advantage that makes you better than the median marginal person doing the thing. It’s probably a bad idea to do work that you’re not a good fit for, even if you think the work is super impactful – this is a waste of your comparative advantage, and we need to invest good people in all kinds of different bets. Pick something that looks remotely sensible that you think you might enjoy and give it a shot. Do some work, get some experience, and keep thinking about it.

On the “keep thinking” part, there’s also a sort of competitive exclusion principle at play here. If you follow your curiosity and keep making adjustments as your appraisal of research improves, you’ll naturally gradient descent into more impactful work. In particular, it’ll become clearer over time if the original reasons you wanted to work on your thing turned out to be robust or not. If they aren’t, you can always move on to something else, which is easier after you’ve already done a first thing.

On how to update off of people you talk to

Ok this isn’t a question, but it’s really hard, as a non-expert, to tell whether to trust one expert’s hot takes or another’s. AI safety is a field full of hot takes and people that can make their hot takes sound really convincing. There’s also a massive asymmetry in how much they’ve thought about it and how much you’ve thought about it – for any objection you can come up with on the spot, they probably have a cached response from dozens of previous conversations that makes your objection sound naive. As a result, you should start off with (but not necessarily keep forever) a healthy amount of blanket skepticism about everything, no matter how convincing it sounds.

Some particularly common examples of ways this might manifest:

I emphatically do not mean to say that all positions on these spectrums are equally correct, and it's super important that we have truth-seeking discussions about these questions. But as someone new, you haven't yet learned how to evaluate different positions and it’s easy to prematurely set the Overton window based on the first few takes you hear.

Some encouragement

This isn’t a question either, but dropping whatever you were doing before is hard, and so is finding a foothold in something new. Opportunities are competitive and rejections are common in this space, and interpreting those rejections as “you’re not good enough” stings especially hard when it’s a cause you care deeply about. Keep in mind that applications skew heavily towards false negatives and that to whatever degree that “not meeting the bar” can be true, it is a dynamic statement about your current situation, not a static statement about who you fundamentally are. Remember to take care of yourself, and good luck.


Thanks for feedback: Jesse Hoogland, Zach Furman

  1. ^

    "George's research log is amazing and yes you can quote me"  
    –Jesse

3 comments

Comments sorted by top scores.

comment by Tapatakt · 2024-06-24T15:24:10.743Z · LW(p) · GW(p)

There is a meaningful difference between the programming skills that you typically need to be effective at your job and the skills that will let you get a job. I’m sympathetic to the view that the job search is inefficient / unfair and that it doesn’t really test you on the skills that you actually use day to day. It’s still unlikely that things like LeetCode are going to go away. A core argument in their favor is that there’s highly asymmetric information between the interviewer and interviewee and that the interviewee has to credibly signal their competence in a relatively low bandwidth way. False negatives are generally much less costly than false positives in the hiring process, and LeetCode style interview questions are skewed heavily towards false negatives.

You talk about algoritms/data structures. As I see it, this is at most a half of "programming skills". The other half that includes things like "How to program something big without going mad", "How to learn new tool/library fast enough" and "How to write good unit tests" always seemed more difficult to me. 

Replies from: gw, korin43
comment by gw · 2024-06-24T19:10:19.432Z · LW(p) · GW(p)

I agree! This is mostly focused on the "getting a job" part though, which typically doesn't end up testing those other things you mention. I think this is the thing I'm gesturing at when I say that there are valid reasons to think that the software interview process feels like it's missing important details.

comment by Brendan Long (korin43) · 2024-06-26T20:42:12.937Z · LW(p) · GW(p)

Yes, but testing that people have memorized the appropriate number of LeetCode questions is much easier than testing that they can write something big without going mad :(