List of technical AI safety exercises and projects
post by JakubK (jskatt) · 2023-01-19T09:35:18.171Z · LW · GW · 5 commentsThis is a link post for https://docs.google.com/document/d/1-58zgC2lRMbMK-CXU44VR3ApGYbTI0aJKX-cKkxDeyo/edit?usp=sharing
Contents
5 comments
EDIT 3/17/2023: I've reorganized the doc and added some governance projects.
I intend to maintain a list at this doc. I'll paste the current state of the doc (as of January 19th, 2023) below. I encourage people to comment with suggestions.
- Levelling Up in AI Safety Research Engineering [Public] (LW [LW · GW])
- Highly recommended list of AI safety research engineering resources for people at various skill levels.
- AI Alignment Awards
- Alignment jams / hackathons from Apart Research
- Past / upcoming hackathons: LLM, interpretability 1, AI test, interpretability 2
- Projects on AI Safety Ideas: LLM, interpretability, AI test
- Resources: black-box investigator of language models [LW · GW], interpretability playground (LW [LW · GW]), AI test
- Examples of past projects; interpretability winners [LW · GW]
- How to run one as an in-person event at your school
- Neel Nanda: 200 Concrete Open Problems in Mechanistic Interpretability [? · GW] (doc and previous version)
- Project page from AGI Safety Fundamentals and their Open List of Project ideas
- AI Safety Ideas by Apart Research; EAF post [EA · GW]
- Most Important Century writing prize [EA · GW] (Superlinear page)
- Center for AI Safety
- Competitions like SafeBench
- Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research.
- course.mlsafety.org projects CAIS is looking for someone to add details about these projects on course.mlsafety.org
- Distilling / summarizing / synthesizing / reviewing / explaining [LW · GW]
- Forming your own views on AI safety (without stress!) – also see Neel's presentation slides and "Inside Views Resources" doc
- Answer some of the application questions from the winter 2022 SERI-MATS, such as Vivek Hebbar's problems
- 10 exercises from Akash [EA(p) · GW(p)] in “Resources that (I think) new alignment researchers should know about”
- [T] Deception Demo Brainstorm has some ideas (message Thomas Larsen [LW · GW] if these seem interesting)
- Upcoming 2023 Open Philanthropy AI Worldviews Contest [EA · GW]
- Alignment research at ALTER [EA · GW] – interesting research problems, many have a theoretical math flavor
- Open Problems in AI X-Risk [PAIS #5] [? · GW]
- Amplify creative grants [EA · GW] (old)
- Evan Hubinger: Concrete experiments in inner alignment [LW · GW], ideas someone should investigate further [AF(p) · GW(p)], sticky goals [LW · GW]
- Richard Ngo: Some conceptual alignment research projects [LW · GW], alignment research exercises [LW · GW]
- Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models [AF · GW]
- Implement a key paper in deep reinforcement learning
- “Paper replication resources” section in “How to pursue a career in technical alignment [EA · GW]”
- Daniel Filan idea [LW(p) · GW(p)]
- Summarize a reading from Reading What We Can
5 comments
Comments sorted by top scores.
comment by Steven Byrnes (steve2152) · 2023-01-19T13:58:46.250Z · LW(p) · GW(p)
not sure if this is what you’re looking for but I have some wish-list big projects listed here [LW · GW].
Replies from: jskatt↑ comment by JakubK (jskatt) · 2023-01-19T18:47:41.560Z · LW(p) · GW(p)
Thanks for sharing that, I just added it to the Google doc.
comment by plex (ete) · 2023-01-19T18:52:08.607Z · LW(p) · GW(p)
Nice! Would you be up for putting this in the aisafety.info Google Drive folder too, with a question-shaped title?
Replies from: jskatt↑ comment by JakubK (jskatt) · 2023-01-19T18:54:37.429Z · LW(p) · GW(p)
Done, the current title is "What are some exercises and projects I can try?"
Replies from: ete↑ comment by plex (ete) · 2023-01-19T18:58:07.556Z · LW(p) · GW(p)
Great, thanks!