How to pursue a career in technical AI alignment

post by Charlie Rogers-Smith (charlie.rs) · 2022-06-04T21:11:46.501Z · LW · GW · 1 comments

Contents

  Types of alignment work
  What type of alignment work should you do?
      High-level heuristics for choosing which work to do
      Some things to keep in mind when exploring different paths
  How to pursue alignment work
    How to pursue empirical alignment work
      Activities that are useful for both empirical research leads and contributors
      Whether and how to do a PhD
      How to pursue research contributor (ML engineering) roles
    How to pursue theoretical alignment work
  Learning
    Basic deep learning
    Machine learning
    AI alignment
      One path for learning about alignment
      Forming your own views on alignment is important when you have control over the direction of your work
  Funding
  Broadly useful career advice
    Look for ways to demonstrate your competence
    Focus on becoming excellent early in your career
    Engaging with the AI alignment community will help you a lot
    Take care of yourself
None
1 comment

(Crossposted at the EA forum [EA · GW].)

This guide is written for people who are considering direct work on technical AI alignment. I expect it to be most useful for people who are not yet working on alignment, and for people who are already familiar with the arguments for working on AI alignment. If you aren’t familiar with the arguments for the importance of AI alignment, you can get an overview of them by reading Why AI alignment could be hard with modern deep learning (Cotra, 2021) and one of The Most Important Century Series (Karnofsky, 2021) and AGI Safety from First Principles [? · GW] (Ngo, 2019).

It might not be best for you to work on technical AI alignment. You can have a large impact on reducing existential risk from AI by working on AI strategy, governance, policy, security, forecasting, support roles, field-building, grant-making, and governance of hardware. That’s not counting other areas, such as bio-risk. It is probably better to do great work in one of those areas than mediocre technical alignment work, because impact is heavy-tailed. One good exercise is to go through Holden Karnofsky’s aptitudes podcast/post [EA · GW], and think about which of the aptitudes you might be able to become great at. Then ask yourself or others how you could use those aptitudes to solve the problems you care about. I also recommend applying to speak with 80,000 Hours

I’ll probably be wrong but I might be helpful. Feedback was broadly positive, but I wouldn’t be surprised if some people think that this guide is net-negative. For example, because it pushes people toward/away from theoretical research, or empirical research, or ML engineering, or getting a PhD. I have tried to communicate my all-things-considered view here, after integrating feedback. But I can only suggest that you try to form your own view [LW · GW] on what’s best for you to do, and take this guide as one input to that process.

I had lots of help. Neel Nanda helped me start this project. I straight-up copied stuff from Rohin Shah, Adam Gleave, Neel Nanda, Dan Hendrycks, Catherine Olsson, Buck Shlegeris, and Oliver Zhang. I got great feedback from Adam Gleave, Arden Koehler, Rohin Shah, Dan Hendrycks, Neel Nanda, Noa Nabeshima, Alex Lawson, Jamie Bernardi, Richard Ngo, Mark Xu, Oliver Zhang, Andy Jones, and Emma Abele. I wrote most of this at Wytham Abbey, courtesy of Elizabeth Garrett.

 

Types of alignment work

(The following is almost all copied from Rohin Shah’s Career FAQ.)

For direct technical alignment research aimed at solving the problem (i.e. ignoring meta work, field building, AI governance, etc), these are the rough paths:

  1. Research Lead (theoretical): These roles come in a variety of types (industry, nonprofit, academic, or even independent). You are expected to propose and lead research projects; typically ones that can be answered with a lot of thinking and writing in Google Docs/LaTeX, and maybe a little bit of programming. Theoretical alignment work can be more conceptual or more mathematical—the output of math work tends to be a proof of a theorem or a new mathematic framework, whereas in conceptual work math is used as one (very good) tool to tell if a problem has been solved. Conceptual work is more philosophical. A PhD is not required but is helpful. Relevant skills: extremely strong epistemics and research taste, strong knowledge of AI alignment; this is particularly important due to the lack of feedback loops from reality.
  2. Research Contributor (theoretical): These roles are pretty rare; as far as I know they are only available at ARC [as of May 2022]. You should probably just read their hiring post [AF · GW]. 
  3. Research Lead (empirical): Besides academia, these roles are usually available in industry orgs and similar nonprofits, such as DeepMind, OpenAI, Anthropic, and Redwood Research. You are expected to propose and lead research projects; typically ones that involve achieving or understanding something new with current ML systems. A PhD is not strictly required but in practice most Research Leads have one. Relevant skills: strong research taste, strong knowledge of AI alignment and ML, moderate skill at programming and ML engineering.
  4. Research Contributor (empirical): These roles are usually available at industry orgs or similar nonprofits, such as DeepMind, OpenAI, Anthropic, and Redwood Research. You are expected to work on a team to execute on research projects proposed by others. A PhD is not required. Relevant skills: strong skill at programming, moderate research taste, moderate knowledge of AI alignment, jobs vary in how much they require skill at ML engineering (but most require strong skill).
  5. Professor: This is a specific route for either of the “Research Lead” career paths, but with additional requirements: as an academic, you are not only expected to propose and lead a research agenda, but also to take on and mentor grad students in pursuit of that research agenda, to teach classes, etc. A PhD is required; that’s the clear first step on this career path. Relevant skills: strong research taste, strong AI knowledge, moderate technical communication. Programming ability and ML ability is typically not tested or required, though they are usually needed to be successful during the PhD.
  6. Software Engineer: Many organizations can also benefit from strong software engineers [LW · GW] — for example, by creating frameworks for working with large neural nets that don’t fit on a GPU, or by reorganizing codebases to make them cleaner and more modular to enable faster experimentation. However, I expect you should only aim for this if you already have these skills (or can gain them quickly), or if for some reason you think you could become a world-class expert in these areas but not in any of the other paths.

The main difference between research leads and research contributors is that the research leads are expected to add value primarily by choosing and leading good research projects, while the research contributors are expected to add value primarily by executing projects quickly. However, it isn’t feasible to fully separate these two activities, and so [research] leads still need to have some skill in executing projects, and contributors still need to have some skill in choosing how to move forward on a project. Some orgs like DeepMind make the difference explicit (“Research Scientist” and “Research Engineer” titles), while others like OpenAI [Anthropic] do not (“Member of Technical Staff” title).

The main reason I carve up roles as “lead” vs “contributor” is that as far as I can tell, “lead” roles tend to be filled by people with PhDs. DeepMind explicitly requires PhDs for the Research Scientist role, but not for the Research Engineer role. (Both roles are allowed to lead projects, if they can convince their manager and collaborators that it is worth pursuing, but it’s only an explicit expectation for Research Scientists.) Other orgs don’t have a PhD as an explicit requirement, but nonetheless it seems like most people who end up choosing and leading research projects have PhDs anyway. I think this is because PhDs are teaching research skills that are hard to learn by other routes.

I don’t want to emphasize this too much — it is still possible to lead projects without a PhD. In April 2022, I could name 10 people without PhDs whose work was best categorized as “Research Lead”, who seemed clearly worth funding. (Note that “clearly worth funding without a PhD” doesn’t necessarily mean the PhD is a bad choice: for several of these people, it’s plausible to me that they would do much better work in 5 years time if they got a PhD instead of doing the things they are currently doing.)

 

What type of alignment work should you do?

I don’t have a strong view on what type of alignment work is most valuable, so I’ll mostly focus on personal fit. There is widespread disagreement in the community about the relative value of different work. However, the main decision you’ll have to make early on is whether, if at all, to pursue empirical or theoretical alignment work. And I think most people believe there’s good work to be done in both camps. If that’s true, it means you can probably just focus on becoming excellent at either theoretical or empirical work based on your personal fit, while you form your own views [LW · GW] about what specific theoretical/empirical alignment work is worth doing.

However, I think most people agree that if you can become a research lead who can set good, novel research agendas, then you should do that. You’ll need to have strong research taste and end-to-end thinking on AI alignment [LW · GW], which is a high bar. Paul Christiano and Chris Olah are examples of people who did this.

 

High-level heuristics for choosing which work to do

If you’re already a strong software engineer, consider applying to non-ML roles immediately, or retraining as an ML engineer [LW · GW]. Some engineering work on alignment teams doesn’t require ML knowledge. For example, creating frameworks for working with large neural nets that don’t fit on a GPU, or reorganizing codebases to make them cleaner and more modular to enable faster experimentation. Some ML engineering roles might not even require experience with ML if you’re a sufficiently strong software engineer. That is at least the case at Anthropic: “Lots of history writing code and learning from writing code is the hard part. ML is the easy bit, we can teach that.” I suggest reading AI Safety Needs Great Engineers [AF · GW], DeepMind is hiring for the scalable alignment and alignment teams [LW · GW], and 80,000 Hours’ Software Engineering career review.

To the extent that you think you might enjoy machine learning and coding, consider looking into How to pursue empirical alignment work [LW · GW]You can test whether you like ML and coding by learning Basic deep learning [LW · GW]. The early steps for research leads and research contributors are similar, so you can pursue those steps while figuring out which is better for you.

To the extent that you love theory, have or could get a very strong math/theoretical CS background, and think you might enjoy building end-to-end models of AI alignment, consider looking into How to pursue theoretical alignment work [LW · GW]. 
 

Some things to keep in mind when exploring different paths

Pay attention to whether you're enjoying yourself and growing and flourishing and kicking ass. But don’t give up immediately if you’re not. Enjoying yourself is really important, especially for research. But often people enjoy things more as they gain more mastery, or think they should already be good and suffer until they get there. Often people have bad luck. If you're enjoying yourself and kicking ass then that's a great sign. If you're not enjoying yourself and kicking ass after a while then consider switching to something else.

Sometimes very capable people are insecure about how good they are, and miscalibrated about how good they could become. Here are some more objective indications you can use to assess your fit: 

Talk to people and ask them to honestly evaluate whether you're on track to do good technical work. This is a good way to address the point above. Make it easy for them to tell you that you're not on track in worlds where you're not—for example, by emphasising to them how helpful it would be for you to switch to something you’re better at sooner. You could do this at Effective Altruism Global, or by talking to 80,000 Hours.

Recommended resources:

 

How to pursue alignment work

This is a high-level section that gives context and high-level heuristics for pursuing different types of alignment work, with pointers to other places in the doc that go into more depth.

How to pursue empirical alignment work

The early steps for research leads and research contributors are similar, so you can pursue those steps while figuring out which is better for you. Whether you want to pursue research lead or research contributor roles will mostly depend on how much you like and are good at research, end-to-end thinking on alignment, and machine learning, relative to how much you like and are good at ML engineering. Also whether you want and are able to get into a top PhD programme. If you’re uncertain, I recommend learning Basic deep learning [LW · GW], doing some ML implementation, and trying to get some research experience (see the next section). Then assessing personal fit from there, which might include talking to people about your fit.

 

Activities that are useful for both empirical research leads and contributors

Everyone should learn Basic deep learning [LW · GW]: You’ll need to learn basic Python coding, basic math (linear algebra, calculus, and probability), and get a basic understanding of deep learning (DL) models and how to implement them. DL is by far the dominant paradigm within machine learning, which in turn is the dominant paradigm within AI safety. I’ve included the best resources I know of in Basic deep learning [LW · GW].

You’ll need to become a decent ML engineer, even if you want to become a research lead. To become good at ML engineering, you’ll need to get experience implementing DL models. 

Research experience is essential for research leads, and useful for research contributors

Learning Machine learning [LW · GW]: how, and how much? It’s easiest to learn by being immersed in a research environment, so it’s sensible to focus on learning enough ML to get to that point. That means having enough breadth to talk about the main areas of DL sensibly and know about the recent advances, and having depth in the area you want to go into. You don’t need to learn all of ML to become part of a research environment. Though research leads should probably eventually know a bunch of ML. You can get breadth by taking courses in the most important subfields of ML (see Machine learning [LW · GW]), and using resources that curate and summarise/explain recent advances (see Machine learning [LW · GW]). You can get depth by reading a bunch of a sub-field’s main papers (~10+, or until you get diminishing returns) and doing your own research, or practical homeworks, or paper replications [LW · GW]. You can see what areas people are interested in by looking at blogs of the labs you’re interested in working at, or by checking the Alignment Newsletter. If you can take ML courses for credits, that is probably a great idea. See Machine learning [LW · GW] for more details.

Learning AI alignment: how, and how much? I recommend AGI Safety from First Principles [? · GW] (Ngo, 2020) and My Overview of the AI Alignment Landscape (Nanda, 2022) to get started, then the AGI safety fundamentals seminar programme or similar alignment reading sometime after learning Basic deep learning [LW · GW]. Learning AI alignment is a lot more important for research leads than research contributors—doing the stuff above is not sufficient for research leads and is not necessary for some research contributor roles, but it will likely be pretty useful for both. There’s much more detailed advice in AI alignment [LW · GW].

 

Whether and how to do a PhD

If you want to be a research lead, the default path is to get a PhD. However, it is also possible to start working as a research engineer and gradually transition toward a research lead role, though as a research engineer you’ll have less time for research activities than you would in a PhD programme. It is also possible to become a research lead without a PhD, if you do a residency program. It’s worth noting that the research-engineer boundary is dissolving at places like Anthropic and OpenAI. This is partially because they care less about the signalling of PhDs, and partially because their research leans relatively heavier on engineering (scaling) than on coming up with novel research directions. The most important thing for becoming a good research lead is getting mentorship from a great researcher and being able to practice research in a good environment. That’s most often achieved in a PhD but is sometimes possible in industry. 

There is pretty widespread disagreement about how good PhDs are. My impression is that the bulk of the disagreement comes down to how effectively PhDs train research taste and skills that are useful for alignment research, and secondarily, how quickly people expect AGI will be developed—if 5 years then PhDs don’t look good—because they likely won’t do any useful work—if 15 years then it’s less of an issue. My understanding of the main benefit of a PhD is that it develops your research taste and skills so that when you graduate, ideally, you’re able to set and execute your own (good) alignment research agenda in industry (at an existing or new org) or in academia. Failing that, the idea is that you’d come away from a PhD with great research skills that help with alignment research. A PhD also opens some doors that ML engineering wouldn’t be able to, for example, research scientist roles at DeepMind or Google Brain.

Here are some simplifying questions you can ask yourself to make the decision easier

If you’re uncertain about which path to pursue, it might be worth optimising for doing research in the short term while you get a better sense of whether a PhD makes sense for you (or whether you get offers from a top programme), and decide later, or apply to both PhDs and ML research engineering roles and compare options. Doing research will look pretty good for engineering roles as long as you stay away from theory-heavy research topics and eventually do enough ML engineering. And it’s a good test of fit. But optimising for ML engineering won’t help as much for PhDs, because publications and reference letters are key. You can however apply for a PhD after doing ML research engineering in industry.

How to do a PhD: If you are considering doing a PhD, I strongly recommend reading Careers in Beneficial AI Research (Gleave, 2020), Rohin Shah’s Career FAQAndrej Karpathy’s survival guide for PhDs, and Machine Learning PhD Applications — Everything You Need to Know.

 

How to pursue research contributor (ML engineering) roles

Read Activities that are useful for both empirical research leads and contributors. That section talks about how to learn Basic deep learning [LW · GW], ML, and AI alignment, and how to get research experience. If you’re sure you want to shoot for research contributor/ML engineering work, getting research experience is less important than for research lead roles, but might still be a useful source of mentorship and skill-building. Strong knowledge of AI alignment is also less important for getting research contributor roles, but how much you want to invest will depend on how much you want to eventually direct your own research, and investing where possible seems valuable. See AI alignment [LW · GW] for more details.

Being a good software engineer will make you a better ML engineer. If you can get a software engineering (SWE) internship at a top company early on, that will likely prove valuable. More broadly, getting mentored by someone much better than you at SWE will likely be valuable, as will reading and writing lots of code. In addition to internships and jobs and your own projects, you might be able to get mentorship by contributing to open-source projects and asking some senior person on that project whether they might mentor you. Perhaps check out 80,000 Hours’ Software Engineering career review

Do some paper replications. To become good at ML engineering, you’ll need to get experience implementing ML models. A good way to do that is to replicate a few foundational papers in a sub-field you might want to work in. This is similar to the task of implementing novel algorithms, but with training wheels: you know that the algorithm works and what good performance looks like. It will also give you a great understanding of the methods you implement. Look for ways to demonstrate your competence [LW · GW], by open-sourcing your code and maybe writing a blog post on your work. You can apply for funding [LW(p) · GW(p)] to do paper replications. See “Paper replication resources” below for more advice.

Below are some paper replication ideas. These are pretty off-the-cuff. If you’re serious about spending a couple of hundred hours on paper replications, it might be a good idea to reach out to a lab you want to work at with a specific plan so that they can give feedback on it. Ideally, see if you can get someone to mentor you. It will be useful to have an open-source codebase on hand, so try to find one before you set out. Check out Machine learning [LW · GW] for the relevant background.

Apply to MLAB: Redwood Research is running another fully funded (competitive) coding bootcamp [EA · GW] in summer 2022. The deadline for the application has passed, but there might be future cohorts. Practising leetcode problems is probably useful for getting accepted.

What does it take to get a job?

Where should you work? Adam Gleave: “The best way to learn research engineering is to work somewhere there is both high-quality engineering and cutting-edge research. Apply to [very competitive] residency programs at industrial labs. The top-4 labs are DeepMind, OpenAI, Google Brain and Facebook AI Research (FAIR); there are also smaller (but good) safety-focused labs like Anthropic and Redwood Research. There are also many smaller players like Amazon AI, NVidia, Vicarious, etc. These are generally less desirable, but still good options.” Since Adam wrote that, some new organisations focused on language models have formed that could be good places to build skills. Those are conjecture (safety-focused), cohere.ai (some near-term safety and lots of EAs working there; I wouldn’t bet on it being good to end up there though), and Hugging Face (no existential safety).

For the first couple of years, it might be worth going where you’ll grow the most. After that, you’ll want to go wherever you can do the best alignment research. However, I am personally worried about people skill-building for a couple of years and then not switching to doing the most valuable alignment work they can, because it can be easy to justify that your work is helping when it isn’t. This can happen even at labs that claim to have a safety focus! Working at any of Anthropic [AF · GW], DeepMind [LW · GW], Redwood Research, or OpenAI seems like a safe bet though. If you can’t work at one of those places, whether skill-building outside of safety teams (e.g. at Google Brain or FAIR) is good will depend pretty strongly on whether you expect to be able to later shift to more impactful work (requires continuing to form your own views [LW · GW] on alignment, and agency), whether you’ll be motivated doing work that doesn’t help with alignment, and how useful it is to be surrounded by people who work on alignment relative to people who are great ML engineers—the former is more important the more you want to direct your own research, the latter is more important the more you expect ML engineering to be your main contribution.

Paper replication resources:

Career resources

How to pursue theoretical alignment work

I don’t know that much about theoretical work, sorry. If you are a theoretical researcher and have thoughts on how to improve this section, please let me know! The paths to doing theoretical work are also a lot less well-scoped than the path to empirical work, so it’s not all my fault. Anyway, here’s what I’ve got:

Theoretical alignment work can be more conceptual or more mathematical.

What does conceptual work look like? Conceptual alignment work often involves reasoning about hypothetical behaviour. For example, Mark Xu (of the Alignment Research Center) describes most of his work as “coming up with good properties for algorithms to have, checking if algorithms have those properties, and trying to find algorithms that have those properties.” This is pretty similar to a skill-set you’d expect a theoretical computer scientist to have. The work tends to involve a lot of mathematical and philosophical reasoning. Conceptual researchers also need strong research taste, and strong knowledge of AI alignment. This is so that they don’t get lost in theoretical research that doesn’t help with alignment, which is easy to do since theory work has poor feedback loops. Examples of conceptual research include Paul Christiano’s Eliciting Latent Knowledge [LW · GW] (ELK), Evan Hubinger’s Risks from Learned Optimization [? · GW], John Wentworth’s Natural Abstractions [LW · GW], and MIRI’s agent foundations [? · GW] work.

What does mathematical work look like? I think the main difference is that in math work, the output is a proof of a theorem or a counterexample or a new mathematic framework, whereas in conceptual work math is used as one (very good) tool to tell if a problem has been solved. Conceptual work is more philosophical: the arguments are rarely watertight, and a lot more judgement is required. Examples of mathematical work include Michael Cohen’s Pessimism About Unknown Unknowns Inspires Conservatism, Vanessa Kosoy’s Infrabayesianism [? · GW], Scott Garabrant’s work on Logical inductionCartesian frames, and Finite factored setsCooperative Inverse Reinforcement Learning, and Tom Everett’s work (thesiscurrent work). You can see more topics here [EA · GW]. This is in contrast to semi-formal, conceptual work, of which Evan Hubinger’s Risks from Learned Optimization [? · GW] is a central example. 

Where does this work happen? The space is pretty weird. There aren’t established orgs doing shovel-ready work. It’s more like a mixed bag of people in academia (mostly math stuff, e.g. CIRL and Michael Cohen’s stuff), independent people on grants [AF · GW] (such as John Wentworth), the Machine Intelligence Research Institute (MIRI) (houses Evan Hubinger, Scott Garabrand, and Vanessa Kosoy among others), the Alignment Research Center (ARC) (which Paul Christiano directs), a few people at DeepMind (e.g. Ramana Kumar, and now some stuff at conjecture.dev too.

I don’t have a great sense of whether math or conceptual research is better to work on. Fortunately, the skill-sets are pretty similar, so you can probably just try each a bit while you develop your own views about work work is most valuable, and then decide based on where you think you’ll do the best work.

How to test fit for conceptual research: (I don’t really know, sorry.) 

How to test fit for mathematical research: (I don’t really know, sorry). 

It’s worth bearing in mind that pursuing theoretical alignment work is much riskier than ML-focused work, because you’ll build fewer transferable skills than ML work, you’ll have less credibility outside the alignment community, and the infrastructure for this work is just starting to be built. That said, if you think you could have a good fit, it might be worth testing it out!

How to pursue conceptual alignment research: Again, I don’t really know. For that reason, getting mentorship seems pretty important. If you can produce something, perhaps from one of the exercises above, I think Mark Xu or Evan Hubinger would consider chatting with you and giving you career advice. Here are some short-to-medium-term options: work independently on a grant (or at an existing organisation, though you’d probably need a PhD for that), work at ARC or MIRI (not sure whether MIRI is hiring as of June 2022), apprentice under a conceptual researcher, or do a PhD (in math/CS theory, with a smart and open professor who’s regularly publishing in COLT or FOCS or similar. You probably won’t be able to publish conceptual alignment work during a PhD, but you might build useful skills). My guess is that mentorship should be the main consideration early on in your career: if you can work with and get mentored by a strong conceptual alignment researcher, that is probably better than a PhD (unless you have the opportunity to work closely with a really strong or value-aligned advisor), and a good PhD probably looks better than independent work. If you want to try to apprentice under a conceptual researcher, or work at ARC/MIRI, some of the exercises in the previous section will be useful: reading and distilling and absorbing someone’s worldview, posting on the AI Alignment Forum, and trying to get more mentorship from there. More broadly, I recommend spending time learning about AI alignment [LW · GW] and forming your own views [LW · GW]. It’s worth noting that conceptual research is particularly mentorship constrained at the moment, so it might be hard to work closely with a strong conceptual researcher. It’s probably still worth trying though, and in particular everyone should probably apply [AF · GW] to ARC.

How to pursue mathematical alignment research: (I don’t really know, sorry.) Probably read a bunch of the mathematical alignment literature (you can see some of the literature here [EA · GW]). More broadly, I recommend spending time learning about AI alignment [LW · GW] and form your own views [LW · GW]. If you can get a theory PhD at the Center for Human-compatible AI (CHAI), that seems like a great bet. If you can do a theory PhD on something related to alignment, that is probably good too. It should be doable even if the professor doesn’t work on alignment, as long as they’re really smart and you can convince them that the topic is publishable. You could also work on something that’s useful skill-building for alignment, such as probability theory as applied to AI, or some part of theoretical CS (look for profs who publish in COLT of FOCS or similar). You might get better supervision that way. Ctrl+F “How to do a PhD” for resources on how to get an ML PhD; a lot of it should transfer to theory PhDs. Please try to speak to someone more knowledgeable than me before jumping into a PhD though!

 

Learning

Basic deep learning

This is just the basics: I’ve included stuff that’s sufficient to get you a basic understanding of deep learning models and how to implement them. This isn’t all you need to become a great empirical research lead or contributor. In particular, investing in coding and math beyond what is indicated here will prove worthwhile. Please skip my suggestions if you already have the knowledge/skill.

When to do what: The coding and math can be done in parallel. The deep learning (DL) courses require basic coding and math. Strictly speaking, you can understand DL with a very basic understanding of linear algebra and calculus. But sooner or later your lack of foundation will cause problems. That said, you can probably comfortably start studying DL after a semester of math classes, alongside building stronger mathematical foundations.

Coding: You’ll need to know how to read and write code in python. www.learnpython.org/ is good for that. There’s also the skill of being able to do stuff in the python ecosystem, which people often end up picking up slowly because it’s not taught. For that, I recommend The Hitchhiker’s Guide to Python, and The Great Research Code Handbook. You might be able to get funding [LW(p) · GW(p)] for a tutor. Here are some extra resources you might find helpful: Things I Wish Someone Had Told Me When I Was Learning How to Codelearntocodewith.me/resources/coding-tools/.

Math: Here are the areas of math required to learn basic DL. Other areas of math—like statistics—can be directly useful, and mathematical maturity beyond what is written here is certainly useful.

Deep learning: (DL) is by far the dominant paradigm within machine learning, which in turn is the dominant paradigm within AI. Getting a good understanding of DL is essential for all empirical alignment work. I recommend that you get practical experience by doing something like (1), and do one of (2) and (3). Participating in the ML Safety Scholars Programme [EA · GW] (fully funded, applications close May 31st 2022) over the summer seems like a great, structured way to learn DL.

  1. fast.ai is a practical course in deep learning (DL) that approaches DL from a coding (not math/statistics) perspective. If you already have some knowledge of how DL works, it is probably better to learn from the PyTorch tutorials. Or learn from those tutorials after doing fast.ai. PyTorch is a good framework to start with, but if you’re already good with TensorFlow or JAX you probably don’t need to pick PyTorch up until a project/job requires it. 
  2. Deep Learning Specialization (Ng), your standard DL class (CS 230 at Stanford).
  3. Deep Learning by NYU (LeCun).

 

Machine learning

Summary: It’s easiest to learn by being immersed in a research environment, so it’s sensible to focus on doing enough to get to that point. That means having enough breadth to talk about the main areas of DL sensibly and know about the recent advances, and having depth in the area you want to go into. You don’t need to learn all of ML to become part of a research environment. Though ML researchers should eventually know a lot of ML, and taking university courses in ML where you can is probably a good idea. You can get breadth by taking courses in the most important subfields of DL (see Learning about DL sub-fields), and using resources that curate and summarise/explain recent advances (see Resources). You can get depth by reading a bunch of a sub-field’s main papers (~10+, or until you get diminishing returns) and doing your own research, or practical homeworks, or paper replications [LW · GW] (though this takes a while, and might not be worth it for researchers). You can see what areas people are interested in by looking at blogs of the labs you’re interested in working at, or by checking the Alignment Newsletter (see Resources).

Learning about DL sub-fields: Once you finish Basic deep learning [LW · GW], you should have the background to go into any of these areas. I wouldn’t worry too much about nailing all of these areas straight away, especially if it trades off against research or engineering.

Resources: (You don’t have to keep up-to-date with all of these things! See which sources you like and benefit from.)

How to read papers: At some point you’ll need to be able to read papers well. Here are some resources for learning how to do that. Most of the time, you’ll want to be in “skim mode” or “understand deeply” mode, not somewhere in between.

 

AI alignment

Compared to other research fields—like math or theoretical physics—the EA-focused alignment space doesn’t have that much content. It still takes months of full-time study to get fully up to date, but you can 80/20 much faster than that, and not everyone has to be an expert. 

Buck: “I think it’s quite normal for undergraduates to have a pretty good understanding of whatever areas of [alignment] they’ve looked into.”

Buck: “Try to spend a couple of hours a week reading whatever AI safety content and EA content interests you. Your goal should be something like “over the years I’m in college, I should eventually think about most of these things pretty carefully” rather than “I need to understand all of these things right now”.”

 

One path for learning about alignment

Getting started: I recommend AGI Safety from First Principles [? · GW] (Ngo, 2019) and My Overview of the AI Alignment Landscape (Nanda, 2022). If you would like to learn more about the motivation for AI risk, I recommend Why AI alignment could be hard with modern deep learning (Cotra, 2021) and The Most Important Century Series (Karnofsky, 2021), which are also available in podcast format.

AGI safety fundamentals seminar programme: I recommend applying to participate in the alignment track. If you have time, the governance track might also be valuable. Each track takes around 5h per week, for 8 weeks. To get the most out of the programme I would do it after Basic deep learning [LW · GW]. 

The Alignment Newsletter is really good. It summarises recent work in AI alignment and ML. One exercise (among many) that will help orient you on what is happening is reading the highlight sections from the 20-50 most recent Alignment Newsletters (takes around 10h). The AN requires some background in machine learning, so you might need to get that before reading, or alongside. Some tips:

Keep up to date: with the Alignment NewsletterLessWrong, the EA Forum, the AI Alignment Forum (AF), the ML Safety Newsletter; reading posts that excite you. Blogs/Twitter from the alignment labs. There is also the 80,000 Hours podcast, the AXRP podcast (Richard and Paul’s episodes are great starting points; Beth’s and Evan’s are great too), and the FLI podcast. And Rob Miles’ Youtube channel. There is a bunch of content so you’ll need to filter! One way to filter is by looking through the Alignment Newsletter. If you want to read old stuff, on the AF you can sort by upvotes [? · GW]. 

Some people think that reading a lot is good, especially for conceptual work. The advice is “read everything”. This won’t be possible or good for most people! But if you can find a way to enjoyably sink 500h into active reading of alignment content, that will probably be really good for forming your own views. You might want to try out several resources, because some will be way more fun for you to read. The Alignment Newsletter is one source. Others include Paul Christiano’s blog (difficult to read but you might love it), the MIRI dialogues [? · GW] (also hard to read but juicy), and Rationality: From AI to Zombies (some people love this and others are put off). Reading lots is less good advice if you’re trying to do very competitive stuff, such as an ML PhD, because you’ll need to spend a lot of time getting research experience.

 

Forming your own views on alignment is important when you have control over the direction of your work

I recommend reading Rohin Shah’s Career FAQ (ctrl+F for “How can I do good AI alignment research?”), How I Formed My Own Views About AI Safety (Nanda, 2022), and Want to be an expert? Build deep models [EA · GW] (Bye, 2021). I’ll copy from these and add my own spin, but I think it’s probably worth reading them directly. 

Rohin Shah: “We want to think, figure out some things to do, and then, if we do those things, the world will be better. An important part of that, obviously, is making sure that the things you think about, matter for the outcomes you want to cause to happen. 

In practice, it seems to me that what happens is people get into an area, look around, look at what other people are doing. They spend a few minutes, possibly hours thinking about, “Okay, why would they be doing this?” This seems great as a way to get started in a field. It's what I did. 

But then they just continue and stay on this path, basically, for years as far as I can tell, and they don't really update their models of "Okay, and this is how the work that I'm doing actually leads to the outcome." They don't try to look for flaws in that argument or see whether they're missing something else. 

Most of the time when I look at what a person is doing, I don't really see that. I just expect this is going to make a lot of their work orders of magnitude less useful than it could be.”

What does it mean to “form your own views”? I mean something like forming a detailed model, starting from some basic and reasonable beliefs about the world, that gets you to a conclusion like ‘working on AI alignment is important’, or ‘this research direction seems like it might shift the needle on AI-induced x-risk’, or ‘Power-seeking AI poses a decent chance of extinction [LW · GW]’, without having to defer to other people. Ideally that model has depth, so that if you double-click on any part of the argument chain, there’s likely to be substance there. Good examples of this kind of reasoning include Buck Shlegeris’ My Personal Cruxes for Working on AI Safety [EA · GW], Richard Ngo’s AGI Safety from First Principles [? · GW], and Joseph Carlsmith’s report on Existent ial Risk from Power-Seeking AI [LW · GW].”

Why form your own views

You don’t need your own views straight away, and maybe not at all

How do you form your own views? Here are some ideas:

Forecasting questions:

Technical questions:

Resources:

Funding

People don’t apply for funding enough. Here are some rebuttals to common objections to applying for funding: You don’t need to be doing valuable AI alignment research right now in order to get funded; students are prime targets for funding, because money is likely to be particularly useful to them; getting rejected probably won’t negatively affect you down the line, as long as you’re honest and well-intentioned; often people are miscalibrated about whether their proposal is worth the money; grant-makers really want to fund good projects.

What can you apply for funding for? Here are some things that you could apply to the Long Term Future Fund (LTFF) for: 

It is often easy to apply for funding – e.g. the application for the Long-Term Future Fund takes 1-2 hours.

How to apply: Aim to have an application that is honest and straightforward. If the point is to help directly with alignment, give your best guess as to whether and how your project helps alignment. If the point is to advance your career, write about how you expect it to advance your career relative to the counterfactual. If you don’t have trustworthy signals of your competence and alignment, it helps to have a reference who knows you and is respected by the funding body. If you have either of those, consider applying immediately. If not, still consider applying immediately. But if you want a better shot, you might do an alignment project first and post it to LessWrong, for example as part of the AGI safety fundamentals seminar programme, or the ML Safety Scholars Programme [EA · GW] (fully funded, applications close May 31st 2022), or as part of forming your own views on alignment [LW · GW].

Funding sources:

Broadly useful career advice

Look for ways to demonstrate your competence

I have mostly talked about how to become competent. This is the most important thing and it should be your main focus early on; it is also much easier to appear competent when you actually are. But when you start to do competitive stuff like job or PhD applications, it’s useful to be able to demonstrate your competence in order to distinguish yourself from others.

Once you know which competencies to shoot for, find hard-to-fake signals that you are competent and work them into projects that build your competence. Search for ways to cache in on your competencies/cool shit you do. You can also ask people in the community/employers what signals they’d consider hard to fake.  For PhDs, doing research < ArXiv paper < published paper < published paper + reference letter from someone who has seen lots of students and has a good track record of predicting research success. Similarly, ML paper replication < open-source paper replication < open-source replication plus blog post about what you learned. Failed research < blog post about failed research… You’ll probably soon have lots of knowledge/skills/cool stuff that you’ve done, that people won’t know about. Sometimes, it’s easy to transform those into a competency signal by making your knowledge/skill/cool stuff legible and visible.

 

Focus on becoming excellent early in your career

Most of your impact comes from later in your career. Early in your career (for the first few years out of undergrad, at least), your focus should be on doing things where you can grow and become excellent. You can ask yourself (and others) where you’re likely to grow the most, and then go there. That might be alignment organisations, and it might not. Growth is largely a function of your environment and the mentorship available to you. The vast majority of good mentorship can be found outside of alignment, and alignment is heavily mentorship-constrained. If you become an excellent ML engineer/researcher or theoretical researcher, it will probably be easy to later specialise in empirical or theoretical alignment work. It is certainly fine (and maybe preferable, because of publications) to do non-alignment research as an undergraduate. 

That said, it might not be good to become excellent if it means advancing AI capabilities. Though there is nuance in ‘capabilities’: working on improving Bayesian inference approximation (useless-to-maybe-helpful for alignment) is very different from scaling up large language models (probably pretty bad). However, Anthropic believe that staying at the frontier of capabilities is necessary for doing good alignment work, so I don’t know how coherent the capabilities-safety dichotomy is (this is an active area of debate).

One way that working on stuff that doesn’t help with alignment could go badly, is that you get stuck doing research that sounds like it helps but doesn’t actually have a path to impact, like random robustness or interpretability research. This can happen even if you join a safety team. To avoid this, I recommend continuing to build your own views on alignment [LW · GW], speaking with more knowledgeable alignment people about your career decisions, and holding the intention to actually consider where you can do the best alignment research once you’ve built some skills.

 

Engaging with the AI alignment community will help you a lot

Why? I’m finding it a little hard to explain this. When I see people start to hang around in alignment communities, they seem to start doing much better stuff. That might be because they’re supported or mentored, they pick up implicit knowledge, they’re more motivated, or because they become aware of opportunities. Here are some ways to engage:

 

Take care of yourself

I don’t really know what to write here. I do know that taking care of yourself is extremely important. I burned out while trying to work on AI alignment, and can attest that burnout can be really bad. I don’t feel super qualified to give advice here, but I do have some things that seem useful to say: If your work becomes a slog/grind that daunts you when you wake up, as opposed to a source of strong internal desire, I think that’s worth paying attention to. You can take diagnostic tests right now or regularly for depressionanxiety, and burnout (takes less than 30 minutes in total). And maybe see a therapist if any of those are concerning, or preventatively, which you can get funding [LW(p) · GW(p)] for. Having good mentors, managers, and buddies will help a lot.

Trying to work on AI alignment might be particularly bad for some people’s mental health. Here are some reasons for that: Believing that we might all die might be really scary and totalising; there aren’t that many jobs in alignment at the moment, and ML opportunities in general are pretty competitive; you might not be able to help with technical alignment work, and that might be crushing; some of the actions I suggest are hard and unstructured—such as forming your own views on alignment, or doing paper replications—and a lot of people don’t thrive in unstructured environments; “technical AI alignment” is not a well-scoped career path or set of paths—and it’s often hard to know what’s best to do.

I don’t want you to feel bad about yourself if you’re struggling, or can’t help in a specific way. If you’re struggling, consider talking to your friends, people who have been through similar experiences, talking with AI safety support, taking time off, getting therapy, or trying a different type of work or environment.

1 comments

Comments sorted by top scores.

comment by jingxiangmo · 2023-04-02T00:48:24.758Z · LW(p) · GW(p)

This is a very excellent article, thank you.