Provide feedback on Open Philanthropy’s AI alignment RFP

post by abergal, Nick_Beckstead · 2021-08-20T19:52:55.309Z · LW · GW · 6 comments

Open Philanthropy is planning a request for proposals (RFP) for AI alignment projects working with deep learning systems, and we’re looking for feedback on the RFP and on the research directions we’re looking for proposals within. We’d be really interested in feedback from people on the Alignment Forum on the current (incomplete) draft of the RFP.

The main RFP text can be viewed here. It links to several documents describing two of the research directions we’re interested in:

Please feel free to comment either directly on the documents, or in the comments section below.

We are unlikely to add or remove research directions at this stage, but we are open to making any other changes, including to the structure of the RFP. We’d be especially interested in getting the Alignment Forum’s feedback on the research directions we present, and on the presentation of our broader views on AI alignment. It’s important to us that our writing about AI alignment is accurate and easy to understand, and that it’s clear how the research we’re proposing relates to our goals of reducing risks from power-seeking systems.

6 comments

Comments sorted by top scores.

comment by RyanCarey · 2021-08-22T23:03:02.590Z · LW(p) · GW(p)

The implication seems to be that this RFP is for AIS work that is especially focused on DL systems. Is there likely to be a future RFP for AIS research that applies equally well to DL and non-DL systems? Regardless of where my research lands, I imagine a lot of useful and underfunded research fits in the latter category.

Replies from: abergal
comment by abergal · 2021-08-28T02:46:10.026Z · LW(p) · GW(p)

This RFP is an experiment for us, and we don't yet know if we'll be doing more of them in the future. I think we'd be open to including research directions we think that are promising that apply equally well to both DL and non-DL systems-- I'd be interested in hearing any particular suggestions you have.

(We'd also be happy to fund particular proposals in the research directions we've already listed that apply to both DL and non-DL systems, though we will be evaluating them on how well they address the DL-focused challenges we've presented.)

Replies from: RyanCarey
comment by RyanCarey · 2021-08-28T08:33:21.538Z · LW(p) · GW(p)

I imagine you could catch useful work with i) models of AI safety, or ii) analysis of failure modes, or something, though I'm obviously biased here.

comment by Alex Flint (alexflint) · 2021-08-29T19:01:19.349Z · LW(p) · GW(p)

Thank you for posting this Asya and Nick. After I read it I realized that it connected to something that I've been thinking about for a while that seems like it might actually be a fit for this RFP under research direction 3 or 4 (interpretability, truthful AI). I drafted a very rough 1.5-pager this morning in a way that hopefully connects fairly obviously to what you've written above:

https://docs.google.com/document/d/1pEOXIIjEvG8EARHgoxxI54hfII2qfJpKxCqUeqNvb3Q/edit?usp=sharing

Interested in your thoughts.

Feedback from everyone is most welcome, too, of course.

comment by adamShimi · 2021-08-21T12:31:29.782Z · LW(p) · GW(p)

Great initiative! I'll try to leave some comments sometime next week.

Is there a deadline? (I've seen floating around the 15th of September, but I guess feedback would be valuable before that so you can take it into account?)

Also, is this the proposal mentioned [AF · GW] by Rohin in his last newsletter, or a parallel effort?

Replies from: abergal
comment by abergal · 2021-08-22T01:35:17.967Z · LW(p) · GW(p)

Getting feedback in the next week would be ideal; September 15th will probably be too late.

Different request for proposals!