General advice for transitioning into Theoretical AI Safety

post by Martín Soto (martinsq) · 2022-09-15T05:23:06.956Z · LW · GW · 0 comments

Contents

    Why this post?
    Why Alignment research?
    Why Theoretical Alignment research?
  Consensus advice
    Skilling up
    Building your career
  Controversial advice
None
No comments

Cross-posted from the EA Forum [EA · GW].

During the past months I've privately talked with +20 AI Safety researchers[1] about how to transition into Theoretical AI Safety . Here's a distillation of their advice and opinions (both general consensus and controversial takes).

Why this post?

Some [EA · GW] great posts [EA · GW] already exist with general advice for transitioning into AI Safety. However, these and others are mainly centered around technical Computer Science research and Machine Learning engineering. They don't delve into how more theoretical aptitudes and background (such as careers in Mathematics, theoretical Computer Science or Philosophy) can be steered into more abstract Alignment research that exploits them (except for Critch's short post). I think that's mainly because:

  1. Most people in the Alignment community work in applied research. That is to say, there are (way) more open positions for this kind of research. Still, most people I've talked to agree we need a non-trivial percentage of the community working in Theory (and we certainly need more people working in any subfield of Alignment [EA · GW] they want to work on[2]).
  2. Applied career paths are more standardized, and thus easier to give advice on. As happens more broadly in academia and the job market, there are more obvious job prospects for the more applied. Theoretical careers tend to feel more like creating your own path. They often involve autonomous projects or agendas, individualized assessment of learning and evolution, and informally improving your epistemics.

This post tries to fill that gap by being a helpful first read for graduates/researchers of abstract disciplines interested in AI Alignment. I'd recommend using it as a complement to the other more general introductions and advice. The two following sections are just a summary of general community knowledge. The advice sections do include some new insights and opinions which I haven't seen comprehensively presented elsewhere.

Why Alignment research?

I presuppose familiarity with the basic arguments [? · GW] encouraging AI Alignment research. This is a somewhat risky and volatile area to work on for positive impact, given how little we understand the problem, and so I recommend having a good inside view [? · GW] of this field's theory of change [? · GW] (and our doubts [? · GW] about it) before committing hard to any path (and performing a ladder of tests to check for your personal fit).

Of course, I do think careers in AI Safety have expected positive impact absurdly larger than the median, only equaled by other EA cause areas. Furthermore, if you're into an intellectual challenge, Alignment is one of the most exciting, foundational and mind-bending problems humanity is facing right now!

Why Theoretical Alignment research?

There are sound arguments [LW · GW] for the importance of theoretical research, since its methods allow for more general results that might scale beyond current paradigms and capabilities (which is what mainly worries us). But we are not even sure such general solutions should exist, and carrying this research out faces some serious downsides, such as the scarcity of feedback loops.

Truth is, there's no consensus as to whether applied or theoretical[3] research is more helpful for Alignment. It's fairly safe to say we yet don't know, and so need people working in all fronts. If you might have especially good aptitudes [EA · GW] for abstract thinking, mathematics, epistemics and leading research agendas, theoretical research might be your best chance for impact. That is, given uncertainty about each approach's impact, you should mainly maximize personal fit.

That said, I'd again encourage developing a good inside view to judge for yourself whether this kind of research can be useful. And of course, trying out some theoretical work early on doesn't lock you out of applied research.

In theoretical research I'm including both:

  1. Mathematical research (sometimes referred to as "Theoretical research"): Using mathematics to pin down how complex systems will behave. This usually involves use of Decision Theory, Game Theory, Statistics, Logic or constructing abstract structures. Its goal is mostly deconfusing about fundamental concepts. Prime examples are Garrabrant's Logical Induction or Vanessa Kosoy's work [EA(p) · GW(p)].
  2. Conceptual research: Using epistemics, philosophy and everything else you can think of to devise approaches to the Alignment problem. This usually involves hypothesizing, forecasting, consequentialist thinking, building plans and poking holes in them. Its goal is mostly coming up with a global big picture strategy (with many moving parts and underspecified details) that might totally or partially assure Alignment. Prime examples are Christiano's ELK or Critch's ARCHES.

This research is mainly carried out in private organizations, in academia or as independent funded research. Any approach will need or heavily benefit from basic understanding of the current paradigm for constructing these systems (Machine Learning theory), although some do abstract away some details of this paradigm.

As a preliminary advice, the first step in your ladder of tests can be just reading through ELK and thinking for a non-trivial amount of hours about it, trying to come up with strategies or poke holes in others [AF · GW]. If you enjoy that, then you'd probably not be miserable partaking in Theoretical AI Safety. If furthermore you have or can acquire the relevant theoretical background, and your aptitudes excel for that, then you're probably a good fit. Way more details can be weighed to assess your fit: how comfortable you will be tackling a problem about which everyone is mostly confused, how good you are at autonomous study/self-directed research, whether you'd enjoy moving to another country, how anxious deadlines or academia make you feel...

The following advice presuppose that you have already reached the (working) conclusion that this field can be impactful enough and you are a good fit for it.

Consensus advice

These points were endorsed by almost everyone I talked to.

Skilling up

Building your career

Controversial advice

These points prompted radically different opinions amongst the researchers I've talked to, and some portray continuous debates in the community.

  1. ^

    If you're one of those researchers and would like nominal recognition in this post, let me know. I've taken the approach of default anonymity given the informal and private nature of some of the discussions.

  2. ^

    "AI Safety Technical research" usually refers to any scientific research trying to ensure the default non-harmfulness of AI deployment, as opposed to the political research and action in AI Governance [? · GW]. It thus includes the Theoretical/Conceptual research I talk about in this post despite the applied connotations of the word "Technical".

  3. ^

    The division between applied and theoretical is not binary but gradual, and even theoretical researchers usually construct upon some empirical data.

  4. ^

    I've been positively surprised as to how many popular researchers have been kind enough to answer some of my specific questions.

0 comments

Comments sorted by top scores.