MATS AI Safety Strategy Curriculum v2

post by DanielFilan, Ryan Kidd (ryankidd44) · 2024-10-07T22:44:06.396Z · LW · GW · 6 comments

Contents

  Week 1: How powerful is intelligence?
    Core readings
    Other readings
    Discussion questions
  Week 2: How and when will transformative AI be made?
    Core readings
    Other readings
    Discussion questions
  Week 3: How could we train AIs whose outputs we can’t evaluate?
    Core readings
    Other readings
    Discussion questions
  Week 4: Will AIs fake alignment?
    Core readings
    Other readings
      On inner and outer alignment
      On reasons to think deceptive alignment is likely
    Discussion questions
  Week 5: How should AI be governed?
    Core readings
    Other readings
    Discussion questions
  Readings that did not fit into any specific week
  Acknowledgements
None
6 comments

As part of our Summer 2024 Program, MATS ran a series of discussion groups focused on questions and topics we believe are relevant to prioritizing research into AI safety. Each weekly session focused on one overarching question, and was accompanied by readings and suggested discussion questions. The purpose of running these discussions was to increase scholars’ knowledge about the AI safety ecosystem and models of how AI could cause a catastrophe, and hone scholars’ ability to think critically about threat models—ultimately, in service of helping scholars become excellent researchers.

The readings and questions were largely based on the curriculum from the Winter 2023-24 Program [LW · GW], with two changes:

In addition, the curriculum was supplemented in two ways:

As in the post about the previous cohort’s curriculum [LW · GW], we think that there is likely significant room to improve this curriculum, and welcome feedback in the comments.

Week 1: How powerful is intelligence?

Core readings

Other readings

Discussion questions

Week 2: How and when will transformative AI be made?

Core readings

Other readings

Discussion questions

Week 3: How could we train AIs whose outputs we can’t evaluate?

Core readings

Other readings

Discussion questions

Week 4: Will AIs fake alignment?

Core readings

Scheming AIs: Will AIs fake alignment during training in order to get power?, abstract and introduction (Carlsmith - 45 min)
[In retrospect, this probably took longer than 45 minutes for most people to read]

Other readings

On inner and outer alignment

On reasons to think deceptive alignment is likely

Discussion questions

Week 5: How should AI be governed?

Core readings

Other readings

Discussion questions

Readings that did not fit into any specific week

Acknowledgements

Daniel Filan was the primary author of the curriculum (to the extent that it differed from the Winter 2023-24 curriculum [LW · GW]) and coordinated the discussion groups. Ryan Kidd scoped, managed, and edited the project. Many thanks to the MATS alumni and other community members who helped as facilitators and to the scholars who showed up and had great discussions!

6 comments

Comments sorted by top scores.

comment by Steven Byrnes (steve2152) · 2024-10-08T01:22:18.668Z · LW(p) · GW(p)

Quick thoughts, feel free to ignore:

  • You should be sure to point out that many of the readings are dumb and wrong (i.e., the readings that I disagree with). :-P
  • I was going to suggest Carl Shulman + Dwarkesh podcast as another week 1 option but I forgot that it’s 6 hours!
  • I hope week 2 doesn’t make the common mistake of supposing that the scaling hypothesis is the only possible reason that someone might think timelines might be short-ish (see here, here [EA(p) · GW(p)])
  • Week 3 title should maybe say “How could we safely train AIs…”? I think there are other training options if you don’t care about safety.

Good luck!

Replies from: DanielFilan, MaimedUbermensch
comment by DanielFilan · 2024-10-15T21:07:01.872Z · LW(p) · GW(p)

A thing you are maybe missing is that the discussion groups are now in the past.

You should be sure to point out that many of the readings are dumb and wrong

The hope is that the scholars notice this on their own.

Week 3 title should maybe say “How could we safely train AIs…”? I think there are other training options if you don’t care about safety.

Lol nice catch.

comment by Luca (MaimedUbermensch) · 2024-10-08T15:31:26.277Z · LW(p) · GW(p)

Can you expand on which readings you think are dumb and wrong? 

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2024-10-08T15:43:16.478Z · LW(p) · GW(p)

I was just being silly … they’re trying to present arguments on both sides of various contentious issues, so of course any given reader is going to think that ≈50% of those arguments are wrong.

comment by Akash (akash-wasil) · 2024-10-08T16:09:17.568Z · LW(p) · GW(p)

It's quite hard to summarize AI governance in a few readings. With that in mind, here are some AI governance ideas/concepts/frames that I would add:

  • Emergency Preparedness (Wasil et al; exec summary + policy proposals - 3 mins)

    Governments should invest in strategies that can help them detect and prepare for time-sensitive AI risks. Governments should have ways to detect threats that would require immediate intervention & have preparedness plans for how they can effectively respond to various acute risk scenarios.

  • Safety cases (Irving - 3 mins; see also Clymer et al)

         Labs should present arguments that AI systems are safe within a particular training or deployment                context. 

(Others that I don't have time to summarize but still want to include:)

Replies from: DanielFilan
comment by DanielFilan · 2024-10-15T21:04:27.485Z · LW(p) · GW(p)

We included a summary of Situational Awareness as an optional reading! I guess I thought the full thing was a bit too long to ask people to read. Thanks for the other recs!