Why I'm joining Anthropic
post by evhub · 2023-01-05T01:12:13.822Z · LW · GW · 4 commentsContents
4 comments
Personal blogpost. Previously: “What I'll be doing at MIRI [AF · GW]”
For the last three years (since I left OpenAI), I've been a Research Fellow at MIRI. Starting next week, however, I'm going to be stepping down from that position and joining Anthropic as a safety researcher instead.[1]
To start with, some things that this does not mean:
- That I have anything in particular against MIRI.
- That I endorse everything Anthropic does.
- That I'm going to stop doing theoretical AI safety research.
- Any particular substantive change in my beliefs about AI safety.
So why do I think this is a good idea? The most basic reason is just that I think things are heating up and it'll be valuable for me to be closer to the action. I think current large language models are getting quite scary but that there's a lot of work to be done in understanding exactly how and what to do about it—see e.g. the recent paper I collaborated with Anthropic on [AF · GW]. I'll be splitting my time between theoretical and empirical work—both of which I'll continue to do—with the idea that being closer to current models should improve my ability to do both.
I expect that a lot of my time will be spent on the Conditioning Predictive Models agenda I've been working on for the past ~6 months, which I expect will be published in the next month or so. Until then, this post [AF · GW] and this post [AF · GW] probably contain the best current public writeups of some of the basic ideas. That being said, I won't be particularly tied down to it and might end up deciding to work on something completely different (e.g. what happened the last time I wrote up a big agenda [AF · GW]).
Since I'm sure I'm going to be asked about it a bunch now, some of my thoughts on Anthropic as an organization (obviously all thoughts are my own):
- I think it's really impressive and a very positive sign (and costly signal) that Anthropic has been able to do a bunch of capability work without publishing it.
- I think Anthropic is doing some of the best AI safety work of any organization right now—e.g. Transformer Circuits.
- Nevertheless, Anthropic is undoubtedly working on improving the capabilities of their models, and that absolutely has negative externalities, e.g. in increasing overall competition between labs.
- That being said, as someone whose overall estimates of AI existential risk are on the pessimistic side [AF · GW], I think high-variance bets—e.g. build cutting edge models so we can do cutting-edge AI safety work, have more leverage to influence other AI labs, etc.—can often make a lot of sense, especially when combined with strategies for mitigating potential downsides (e.g. not publishing capabilities advances).
- Overall, I think Anthropic's current strategy seems reasonable to me, but I am quite uncertain.
Though I will technically be keeping my MIRI affiliation as a Research Associate. ↩︎
4 comments
Comments sorted by top scores.
comment by Edward Kmett (edward-kmett) · 2023-01-05T04:58:26.750Z · LW(p) · GW(p)
Time to update my position on
comment by DragonGod · 2023-01-05T19:01:24.000Z · LW(p) · GW(p)
What do you think MIRI is currently doing wrong/what should they change about their approach/general strategy?
Replies from: evhub↑ comment by evhub · 2023-01-05T20:39:50.187Z · LW(p) · GW(p)
I thought I was pretty clear in the post that I don't have anything against MIRI. I guess if I were to provide feedback, the one thing I most wish MIRI would do more is hire additional researchers—I think MIRI currently has too high of a hiring bar.
Replies from: DragonGod↑ comment by DragonGod · 2023-01-06T11:01:41.655Z · LW(p) · GW(p)
I did not think you had anything against MIRI. It's just that leaving your position there provides you more allowance to be candid when giving critical feedback.
I would probably have asked this question to any MIRI staff who was departing. If there was ever a time to get opinions on what MIRI was doing wrong, it's now.