What are MIRI's big achievements in AI alignment?
post by tailcalled · 2023-03-07T21:30:58.935Z · LW · GW · No commentsThis is a question post.
Contents
Answers 34 Neel Nanda 23 Garrett Baker 18 Akash 13 Charlie Steiner None No comments
Answers
I give them a lot of credit for, to my eyes, realising this was a big deal way earlier than almost anyone else, doing a lot of early advocacy, and working out some valuable basic ideas, like early threat models, ways in which standard arguments and counter-arguments were silly, etc. I think this kind of foundational work feels less relevant now, but is actually really hard and worthwhile!
(I don't see much recent stuff I'm excited about, unless you count Risks from Learned Optimisation)
I think most every aspiring conceptual alignment researcher should read basically all of the work on Arbital's AI alignment section. Not all of it is right, but you'll avoid some obvious-in-retrospect pitfalls you likely would have otherwise fallen into. So I'd count that corpus as a big achievement.
They have a big paper on logical induction. It doesn't have any applications yet, but possibly will serve some theoretical grounding for later work. And I think the more general idea of seeing inexploitable systems as markets [LW · GW] has a good chance of being generally applicable.
Scott Garrabrant [LW · GW] has done a lot in the public eye, and so has Vanessa Kosoy [LW · GW].
Risks From Learned Optimization, as others have mentioned, explained & made palatable the idea of "mesa optimizers" to skeptics.
I think a lot of threat models (including modern threat models) are found in, or heavily inspired by, old MIRI papers. I also think MIRI papers provide unusually clear descriptions of the alignment problem, why MIRI expects it to be hard, and why MIRI thinks intuitive ideas won't work (see e.g., Intelligence Explosion: Evidence and Import, Intelligence Explosion Microeconomics, and Corrigibility).
Regarding more recent stuff, MIRI has been focusing less on research output and more on shaping discussion around alignment. They are essentially "influencers" on the alignment space. Some people I know label this as "not real research", which I think is true in some sense, but I think more about "what was the impact of this" than "does it fit into the definition of a particular term."
For specifics, List of Lethalities [LW · GW] and Death with Dignity [LW · GW] have had a pretty strong effect on discourse in the alignment community (whether or not this is "good" depends on the degree to which you think MIRI is correct and the degree to which you think the discourse has shifted in a good vs. bad direction). On how various plans miss the hard bits of the alignment challenge [LW · GW] remains one of the best overviews/critiques of the field of alignment, and the sharp left turn [LW · GW] post is a recent piece that is often cited to describe a particularly concerning (albeit difficult to understand) threat model. Six dimensions of operational adequacy [? · GW] is currently one of the best (and only) posts that tries to envision a responsible AI lab.
Some people have found the 2021 MIRI Dialogues [? · GW] to be extremely helpful at understanding the alignment problem, understanding threat models, and understanding disagreements in the field.
I believe MIRI occasionally advises people at other organizations (like Redwood, Conjecture, Open Phil) on various decisions. It's unclear to me how impactful their advice is, but it wouldn't surprise me if one or more orgs had changed their mind about meaningful decisions (e.g., grantmaking priorities or research directions) partially as a result of MIRI's advice.
There's also MIRI's research, though I think this gets less attention at the moment because MIRI isn't particularly excited about it. But my guess is that if someone made a list of all the alignment teams, MIRI would currently have 1-2 teams in the top 20.
- Being ~50% of where people were thinking about AI alignment until about 2018 - putting out educational materials, running workshops and conferences, etc. Each individual thing is fairly small, but they add up.
- Publishing basic explainers in respectable enough formats that academics have citations for them (especially Soares and Fallenstein 2014).
- Jessica's Quantilizers paper.
- Evan's Risks From Learned Optimization.
- Peter de Blanc's Ontological Crises paper.
- Eliezer's Intelligence Explosion Microeconomics and related arguments.
- (Probably some other publications I'm forgetting)
- Blue-sky research on doing new things that might be good (probably non-disclosed stuff, infrabayesianism might go here or below).
- All the stuff that's interesting and was probably was good for the field, but turned out not super relevant to training big neural nets, but still might turn out to be useful to have in our toolbox (decision theory, logical inductors, open source code games, etc.)
↑ comment by trevor (TrevorWiesinger) · 2023-03-08T00:01:14.455Z · LW(p) · GW(p)
Being ~50% of where people were thinking about AI alignment until about 2018 - putting out educational materials, running workshops and conferences, etc.
I think this is important to mention- from 2000 to 2018 they were doing basically all the heavy lifting, and 2018-2022 was a low period of contributions. That's a pretty great ratio of peak to valley.
They also spent almost all of that second period trying to find a way out by coming across something big [LW · GW]again, like they'd been for almost two years prior; their work with CFAR seems to me to have been a solid bet at the time (in fact I myself am still betting on CFAR in 2023, in spite of everything).
Replies from: vmehra↑ comment by niplav · 2023-03-08T01:29:36.992Z · LW(p) · GW(p)
I agree that the work on ontological crises was good, and feels like a strong precursor to model-splintering [LW · GW] and concept/value extrapolation [? · GW].
No comments
Comments sorted by top scores.