MATS AI Safety Strategy Curriculum

post by Ryan Kidd (ryankidd44), Ronny Fernandez (ronny-fernandez) · 2024-03-07T19:59:37.434Z · LW · GW · 2 comments

Contents

  Week 1: How will AGI arise?
    Suggested discussion questions
  Week 2: Is the world vulnerable to AI?
    Suggested discussion questions
  Week 3: How hard is AI alignment?
    Suggested discussion questions
  Week 4: How should we prioritize AI safety research?
    Suggested discussion questions
  Week 5: What are AI labs doing?
    Suggested discussion questions
  Week 6: What governance measures reduce AI risk?
    Suggested discussion questions
  Week 7: What do positive futures look like?
    Suggested discussion questions
  Acknowledgements
None
2 comments

As part of the MATS Winter 2023-24 Program, scholars were invited to take part in a series of weekly discussion groups on AI safety strategy. Each strategy discussion focused on a specific crux we deemed relevant to prioritizing AI safety interventions and was accompanied by a reading list and suggested discussion questions. The discussion groups were faciliated by several MATS alumni and other AI safety community members and generally ran for 1-1.5 h.

As assessed by our alumni reviewers [LW · GW], scholars in our Summer 2023 Program were much better at writing concrete plans for their research than they were at explaining their research’s theory of change. We think it is generally important for researchers, even those early in their career, to critically evaluate the impact of their work, to:

We expect that the majority of improvements to the above areas occur through repeated practice, ideally with high-quality feedback from a mentor or research peers. However, we also think that engaging with some core literature and discussing with peers is beneficial. This is our attempt to create a list of core literature for AI safety strategy appropriate for the average MATS scholar, who should have completed the AISF Alignment Course.

We are not confident that the reading lists and discussion questions below are the best possible version of this project, but we thought they were worth publishing anyways. MATS welcomes feedback and suggestions for improvement.

Week 1: How will AGI arise?

What is AGI?

How large will models need to be and when will they be that large?

How far can current architectures scale?

What observations might make us update?

Suggested discussion questions

Week 2: Is the world vulnerable to AI?

Conceptual frameworks for risk: What kinds of technological advancements is the world vulnerable to in general?

Attack vectors: How might AI cause catastrophic harm to civilization?

AI’s unique threat: What properties of AI systems make them more dangerous than malicious human actors?

Suggested discussion questions

Week 3: How hard is AI alignment?

What is alignment?

How likely is deceptive alignment?

What is the distinction between inner and outer alignment? Is this a useful framing?

How many tries do we get, and what's the argument for the worst case?

How much do alignment techniques for SOTA models generalize to AGI? What does that say about how valuable alignment research on present day SOTA models is?

Suggested discussion questions

Week 4: How should we prioritize AI safety research?

What is an "alignment tax" and how do we reduce it?

What kinds of alignment research will we be able to delegate to models if any?

How should we think about prioritizing work within the control paradigm in comparison to work with the alignment paradigm?

How should we prioritize alignment research in light of the amount of time we have left until transformative AI?

How should you prioritize your research projects in light of the amount of time you have left until transformative AI?

Suggested discussion questions

Week 5: What are AI labs doing?

How are the big labs approaching AI alignment and AI risk in general?

How are small non-profit research orgs approaching AI alignment and AI risk in general?

General summaries:

Suggested discussion questions

Week 6: What governance measures reduce AI risk?

Should we try to slow down or stop frontier AI research through regulation?

What AI governance levers exist?

What catastrophes uniquely occur in multipolar AGI scenarios?

Suggested discussion questions

Week 7: What do positive futures look like?

Note: attending discussion this week was highly optional.

What near-term positive advancements might occur if AI is well-directed?

What values might we want to actualize with the aid of AI?

What (very speculative) long-term futures seem possible and promising?

Suggested discussion questions

 

 

Acknowledgements

Ronny Fernandez was chief author of the reading lists and discussion questions, Ryan Kidd planned, managed, and edited this project, and Juan Gil coordinated the discussion groups. Many thanks to the MATS alumni and other community members who helped as facilitators!

2 comments

Comments sorted by top scores.

comment by Ronny Fernandez (ronny-fernandez) · 2024-03-26T15:24:21.862Z · LW(p) · GW(p)

I want to note for posterity that I tried to write this reading list somewhat impartially. That is, I have a lot of takes about a lot of this stuff, and I tried to include a lot of material that I disagree with but which I have found helpful in some way or other. I also included things that people I trust have found helpful even if I personally never found it helpful.

comment by Kabir Kumar (kabir-kumar-1) · 2024-03-08T15:41:51.267Z · LW(p) · GW(p)

Week 3: How hard is AI alignment?

https://www.lesswrong.com/posts/3pinFH3jerMzAvmza/on-how-various-plans-miss-the-hard-bits-of-the-alignment#comments [LW · GW]

Seems like something important to be aware of, even if they may disagree.