Posts
Comments
FYI - the footnotes seem broken. Footnote 6 links to halfway through footnote 4.
[I privately wrote the following quick summary of some publicly-available information on (~safety-relevant) talent leaving OpenAI since the founding of Anthropic. Seems worth pasting here since it already exists but I'd have been more careful if I wrote it with public sharing in mind, it's not comprehensive, and I don't have time to really edit. I'd advise against updating too hard on it because:
- I basically don't have any visibility into OpenAI
- Inferences from LinkedIn often don't give a super accurate sense of somebody's contribution.
- I wrote down what I know about departures from OpenAI but didn't try to write up new hires in the same way.
- It's often impossible for people at orgs to talk publicly about personnel issues/departures so if Jacob/others don't correct me, it's not very strong evidence that nothing below is inaccurate/misleading.]
The main group of people working on alignment (other than interpretability) at OpenAI at the time of the Anthropic split at the end of 2020 was the Reflection team, which has since been renamed to the Alignment team. Of the 7 members of the team at that time (who are listed on the summarization paper), 4 are still working at OpenAI, and none are working at Anthropic.
Like Habryka, I believe it's literally true that nobody from the "Alignment team" left for Anthropic and 4/7 are still working at OpenAI. But it seems possible that things look different if you weight by seniority and account for potential contributions to OpenAI's attention to existential safety made by people who weren’t technical safety researchers, who were researchers on another team, etc.
Important: I don't know why the below people left OpenAI and their inclusion doesn't mean there's any bad blood between them or that they necessarily have criticisms of OpenAI's attitude toward safety.
If I understand correctly,
1 The alignment team lost its team lead (Paul).
2 Two senior people who weren’t counted as on the team but oversaw it or helped with its research direction left for Anthropic.
- VP of safety and policy (Daniela) whose linked in says she oversaw the safety and policy teams
- VP of Research (Dario), who was the Team Lead for AI Safety before he got promoted and says he built and lead several of their long-term safety teams left for Anthropic. He was also an author on the summarization paper Jacob references. Id guess that he continued to be a contributor to their AI Safety work after being promoted.
3 The head of the interpretability team (Chris Olah), which is one of the other teams that seems most relevant to existential safety, left for Anthropic.
- (Jacob acknowledges this earlier in the post)
4 Other Anthropic co-founders who left OpenAI include
- Tom Brown (led the engineering of GPT-3)
- Sam McCandlish and Jared Kaplan (just a consultant), who I think led their scaling laws research? I think I heard Jared is leading an Anthropic alignment team? I think Sam M did a fellowship on the safety team before building the scaling laws team
5 Another person who worked on technical safety at OpenAI and left for Anthropic
- Tom Henighan was on the technical staff, safety team but I guess not on the alignment team?
6 Several people on the policy team left for Anthropic including the director and two EAs who are interested in alignment.
- Policy Director, Jack Clark
- Danny Hernandez
- Amanda Askell
7 Another EA who I believe cares about alignment and left OpenAI for Anthropic:
- Nicholas Joseph
8 Other people I don’t know who left for Anthropic
- Kamal Ndousse
- Benjamin Mann. LinkedIn says he was on their security and safety working groups
9 Holden is no longer on OpenAI's board (though Helen Toner now is).
On the other hand, they’ve also hired some EAs who care about alignment since then. I believe examples include:
- Jan Leike, alignment team lead
- Richard Ngo, team lead(?) for futures subteam of the policy team
- Daniel Kokotajlo, futures subteam
- Surely others I don't know of or am leaving out
Thanks for writing this. I found it useful and have shared it with others.
I’ve also heard a number of people tell me that EA or AI safety efforts caused them to lose the ability to have serious hobbies, or serious intellectual interests, and I would guess this was harmful to long-term AI safety potential in most cases.
If you'd be up for sharing, I'd be pretty interested in a rough estimate of how many specific people you know of who have had this experience (and maybe also how many were "people who (IMO) have a better shot and a better plan than most for reducing AI risk.")
To be clear, I totally buy that this happens and there's a problem here. I just find it useful to get some sense of the prevalence of this kind of thing.