AMA: Paul Christiano, alignment researcher

paulfchristiano

AMA: Paul Christiano, alignment researcher

post by paulfchristiano · 2021-04-28T18:55:39.707Z · LW · GW · 197 comments

I'll be running an Ask Me Anything on this post from Friday (April 30) to Saturday (May 1).

If you want to ask something just post a top-level comment; I'll spend at least a day answering questions.

You can find some background about me here.

197 comments

Comments sorted by top scores.

comment by Gedusa · 2021-04-29T09:27:36.930Z · LW(p) · GW(p)

A number of people seem to have departed OpenAI at around the same time as you. Is there a particular reason for that which you can share? Do you still think that people interested in alignment research should apply to work at OpenAI?

Replies from: paulfchristiano, paulfchristiano

↑ comment by paulfchristiano · 2021-04-30T18:29:41.261Z · LW(p) · GW(p)

A number of people seem to have departed OpenAI at around the same time as you. Is there a particular reason for that which you can share?

My own departure was driven largely by my desire to work on more conceptual/theoretical issues in alignment. I've generally expected to transition back to this work eventually and I think there a variety of reasons that OpenAI isn't the best for it. (I would likely have moved earlier if Geoffrey Irving's departure hadn't left me managing the alignment team.)

I'm pretty hesitant to speak on behalf of other people who left. It's definitely not a complete coincidence that I left around the same time as other people (though there were multiple important coincidences), and I can talk about my own motivations:

A lot of the people who I talked with at OpenAI left, decreasing the benefits from remaining at OpenAI and increasing the benefits for talking to people outside of OpenAI.
The departures led to a lot of safety-relevant shakeups at OpenAI. It's not super clear whether that makes it an unusually good or bad time to shake up management of my team, but I think it felt unusually good to me (this might have been a rationalization, hard to say).

↑ comment by paulfchristiano · 2021-04-30T18:29:58.764Z · LW(p) · GW(p)

Do you still think that people interested in alignment research should apply to work at OpenAI?

I think alignment is a lot better if there are strong teams trying to apply best practices to align state of the art models, who have been learning about what it actually takes to do that in practice and building social capital. Basically that seems good because (i) I think there's a reasonable chance that we fail not because alignment is super-hard but because we just don't do a very good job during crunch time, and I think such teams are the best intervention for doing a better job, (ii) even if alignment is very hard and we need big new ideas, I think that such teams will be important for empirically characterizing and ultimately adopting those big new ideas. It's also an unusually unambiguous good thing.

I spent a lot of time at OpenAI largely because I wanted to help get that kind of alignment effort going. For some color see this post [LW · GW]; that team still exists (under Jan Leike) and there are now some other similar efforts at the organization.

I'm not as in the loop as I was a few months ago and so you might want to defer to folks at OpenAI, but from the outside I still tentatively feel pretty enthusiastic about the work of this kind that's happening at OpenAI. If you're excited about this kind of work then OpenAI still seems like a good place to go to me. (It also seems reasonable to think about DeepMind and Google, and of course I'm a fan of ARC for people who are a good fit, and I suspect that there will be more groups doing good applied alignment work in the future.)

comment by Ben Pace (Benito) · 2021-04-30T06:44:36.531Z · LW(p) · GW(p)

Who's the best critic of your alignment research? What have they been right about?

comment by Neel Nanda (neel-nanda-1) · 2021-04-28T21:36:20.700Z · LW(p) · GW(p)

What are the most important ideas floating around in alignment research that don't yet have a public write-up? (Or, even better, that have a public write-up but could do with a good one?)