List of technical AI safety exercises and projects

jakubk

List of technical AI safety exercises and projects

post by JakubK (jskatt) · 2023-01-19T09:35:18.171Z · LW · GW · 5 comments

This is a link post for https://docs.google.com/document/d/1-58zgC2lRMbMK-CXU44VR3ApGYbTI0aJKX-cKkxDeyo/edit?usp=sharing

5 comments

EDIT 3/17/2023: I've reorganized the doc and added some governance projects.

I intend to maintain a list at this doc. I'll paste the current state of the doc (as of January 19th, 2023) below. I encourage people to comment with suggestions.

Levelling Up in AI Safety Research Engineering [Public] (LW [LW · GW])
- Highly recommended list of AI safety research engineering resources for people at various skill levels.
AI Alignment Awards
Alignment jams / hackathons from Apart Research
- Past / upcoming hackathons: LLM, interpretability 1, AI test, interpretability 2
- Projects on AI Safety Ideas: LLM, interpretability, AI test
- Resources: black-box investigator of language models [LW · GW], interpretability playground (LW [LW · GW]), AI test
- Examples of past projects; interpretability winners [LW · GW]
- How to run one as an in-person event at your school
Neel Nanda: 200 Concrete Open Problems in Mechanistic Interpretability [? · GW] (doc and previous version)
Project page from AGI Safety Fundamentals and their Open List of Project ideas
AI Safety Ideas by Apart Research; EAF post [EA · GW]
Most Important Century writing prize [EA · GW] (Superlinear page)
Center for AI Safety
- Competitions like SafeBench
- Student ML Safety Research Stipend Opportunity – provides stipends for doing ML research.
- course.mlsafety.org projects CAIS is looking for someone to add details about these projects on course.mlsafety.org
Distilling / summarizing / synthesizing / reviewing / explaining [LW · GW]
Forming your own views on AI safety (without stress!) – also see Neel's presentation slides and "Inside Views Resources" doc
Answer some of the application questions from the winter 2022 SERI-MATS, such as Vivek Hebbar's problems
10 exercises from Akash [EA(p) · GW(p)] in “Resources that (I think) new alignment researchers should know about”
[T] Deception Demo Brainstorm has some ideas (message Thomas Larsen [LW · GW] if these seem interesting)
Upcoming 2023 Open Philanthropy AI Worldviews Contest [EA · GW]
Alignment research at ALTER [EA · GW] – interesting research problems, many have a theoretical math flavor
Open Problems in AI X-Risk [PAIS #5] [? · GW]
Amplify creative grants [EA · GW] (old)
Evan Hubinger: Concrete experiments in inner alignment [LW · GW], ideas someone should investigate further [AF(p) · GW(p)], sticky goals [LW · GW]
Richard Ngo: Some conceptual alignment research projects [LW · GW], alignment research exercises [LW · GW]
Buck Shlegeris: Some fun ML engineering projects that I would think are cool, The case for becoming a black box investigator of language models [AF · GW]
Implement a key paper in deep reinforcement learning
“Paper replication resources” section in “How to pursue a career in technical alignment [EA · GW]”
Daniel Filan idea [LW(p) · GW(p)]
Summarize a reading from Reading What We Can

5 comments

Comments sorted by top scores.

comment by Steven Byrnes (steve2152) · 2023-01-19T13:58:46.250Z · LW(p) · GW(p)

not sure if this is what you’re looking for but I have some wish-list big projects listed here [LW · GW].

Replies from: jskatt

↑ comment by JakubK (jskatt) · 2023-01-19T18:47:41.560Z · LW(p) · GW(p)

Thanks for sharing that, I just added it to the Google doc.

comment by plex (ete) · 2023-01-19T18:52:08.607Z · LW(p) · GW(p)

Nice! Would you be up for putting this in the aisafety.info Google Drive folder too, with a question-shaped title?

Replies from: jskatt

↑ comment by JakubK (jskatt) · 2023-01-19T18:54:37.429Z · LW(p) · GW(p)

Done, the current title is "What are some exercises and projects I can try?"

Replies from: ete

↑ comment by plex (ete) · 2023-01-19T18:58:07.556Z · LW(p) · GW(p)

Great, thanks!

List of technical AI safety exercises and projects

Contents

5 comments