List of good AI safety project ideas?

post by Aryeh Englander (alenglander) · 2021-05-26T22:36:07.910Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    9 evhub
    5 G Gordon Worley III
    4 James_Miller
    2 Aryeh Englander
    2 Aryeh Englander
    1 Esben Kran
None
1 comment

Can we compile a list of good project ideas related to AI safety that people can work on? There are occasions at work when I have the opportunity to propose interesting project ideas for potential funding, and it would be really useful if there was somewhere I could look for projects that people here would really like someone to work on, even if they themselves don't have the time or resources to do so. I also keep meeting people who are searching for useful alignment-related projects they can work on for school, work, or as personal projects, and I think a list of project ideas might be helpful for them as well.

I'm particularly interested in project ideas that are currently not being worked on (to your knowledge) but where it would be great if someone would take up that project. Or alternatively, project ideas that are currently being worked on but where there are variations on those ideas that nobody has yet attempted but someone should.

Occasionally someone will post an idea or set of ideas on the Alignment Forum, for example Ajeya Cotra's "sandwiching" idea [AF · GW] or the recent list of ideas from Stuart Armstrong and Owain Evans [AF · GW]. I also sometimes come across ideas mentioned towards the end of a paper or buried somewhere in a research agenda. But I think having a larger list somewhere could be really useful.

(Note: I am not looking for lists of open problems, challenges, or very general research directions. I'm looking for suggestions that at least point towards a concrete project idea, and where an individual or small team might be able to produce useful results given current technology and with sufficient time and resources.)

Please post ideas or links / references to published ideas in the comments, if you know of any. Ideas mentioned as part of a larger post or paper would count, but please point to the section where the idea is mentioned.

If I get enough links or references maybe I'll try to compile a list that others can use.

Answers

answer by evhub · 2021-05-27T19:02:17.427Z · LW(p) · GW(p)

Though they're both somewhat outdated at this point, there are certainly still some interesting concrete experiment ideas to be found in my “Towards an empirical investigation of inner alignment [LW · GW]” and “Concrete experiments in inner alignment [LW · GW].”

answer by Gordon Seidoh Worley (G Gordon Worley III) · 2021-05-27T13:54:07.056Z · LW(p) · GW(p)

I wrote a research agenda that suggests additional work to be done and that I'm not doing.

https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1 [LW · GW]

answer by James_Miller · 2021-05-26T23:59:24.302Z · LW(p) · GW(p)

I co-authored a paper suggesting that we take advantage of AI's superhuman abilities in chess to create trustworthy and untrustworthy chess oracles to help develop strategies for dealing with possibly unfriendly oracles.  https://arxiv.org/abs/2010.02911

answer by Aryeh Englander · 2021-06-03T12:53:47.084Z · LW(p) · GW(p)

New post on the EA Forum: Some AI Governance Research Ideas [EA · GW]

answer by Aryeh Englander · 2021-05-28T19:25:34.379Z · LW(p) · GW(p)

Just came across this: Research ideas to study humans with AI Safety in mind [AF · GW]

comment by Hardin Scott (hardinscott) · 2022-10-05T06:35:35.321Z · LW(p) · GW(p)

Great information! thanks for sharing

answer by Esben Kran · 2022-10-31T15:47:20.442Z · LW(p) · GW(p)

We have developed AI Safety Ideas which is a collaborative AI safety research platform with a lot of research project ideas.

1 comment

Comments sorted by top scores.

comment by Vika · 2021-07-25T22:26:14.209Z · LW(p) · GW(p)

Thanks Aryeh for collecting these! I added them to a new Project Ideas section in my AI Safety Resources list.