Question: MIRI Corrigbility Agenda

post by algon33 · 2019-03-13T19:38:05.729Z · score: 16 (7 votes) · LW · GW · 6 comments

MIRI's reading list on corrigbility seems out dated, and I can't find a centralised list Does anyone have, or know of, one?

As a side note, has MIRI stopped updating their reading list? It seems like that's the case.


Links given in the comment section to do with corrigibility. I'll try and update this with some summaries as I read them.


Comments sorted by top scores.

comment by Rob Bensinger (RobbBB) · 2019-03-15T05:36:42.850Z · score: 14 (5 votes) · LW · GW

The only major changes we've made to the MIRI research guide since mid-2015 are to replace Koller and Friedman's Probabilistic Graphical Models with Pearl's Probabilistic Inference; replace Rosen's Discrete Mathematics with Lehman et al.'s Mathematics for CS; add Taylor et al.'s "Alignment for Advanced Machine Learning Systems", Wasserman's All of Statistics, Shalev-Shwartz and Ben-David's Understanding Machine Learning, and Yudkowsky's Inadequate Equilibria; and remove the Global Catastrophic Risks anthology. So the guide is missing a lot of new material. I've now updated the guide to add the following note at the top:

This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the AI alignment problem is:
1. If you have a computer science or software engineering background: Apply to attend our new workshops on AI risk and to work as an engineer at MIRI. For this purpose, you don’t need any prior familiarity with our research.
If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position, shoot us an email and we can talk about whether it makes sense.
You can find out more about our engineering program in our 2018 strategy update.
2. If you’d like to learn more about the problems we’re working on (regardless of your answer to the above): See “Embedded Agency [LW · GW]” for an introduction to our agent foundations research, and see our Alignment Research Field Guide [LW · GW] for general recommendations on how to get started in AI safety.
After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “Fixed Point Exercises [LW · GW].”
If you want people to collaborate and discuss with, we suggest starting or joining a MIRIx group, posting on LessWrong, applying for our AI Risk for Computer Scientists workshops, or otherwise letting us know you’re out there.
comment by Rob Bensinger (RobbBB) · 2019-03-15T05:44:20.940Z · score: 5 (3 votes) · LW · GW

For corrigibility in particular, some good material that's not discussed in "Embedded Agency" or the reading guide is Arbital's Corrigibility and Problem of Fully Updated Deference articles.

comment by Wei_Dai · 2019-03-15T19:00:21.925Z · score: 7 (3 votes) · LW · GW

Is Jessica Taylor's A first look at the hard problem of corrigibility still a good reference or is it outdated?

comment by Rob Bensinger (RobbBB) · 2019-03-16T14:31:39.854Z · score: 4 (2 votes) · LW · GW

I'd expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn't spot any obvious signs that it's no longer a good reference.

comment by Rob Bensinger (RobbBB) · 2019-03-16T18:28:52.502Z · score: 4 (2 votes) · LW · GW

I've now also highlighted Scott's tip from "Fixed Point Exercises [LW · GW]":

Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.
comment by BurntVictory · 2019-03-15T00:58:18.281Z · score: 3 (2 votes) · LW · GW

The CHAI reading list is also fairly out of date (last updated april 2017) but has a few more papers, especially if you go to the top and select [3] or [4] so it shows lower-priority ones.

(And in case others haven't seen it, here's the MIRI reading guide for learning agent foundations.)