I designed an AI safety course (for a philosophy department)

eleni-angelou

I designed an AI safety course (for a philosophy department)

post by Eleni Angelou (ea-1) · 2023-09-23T22:03:00.036Z · LW · GW · 15 comments

  Background
  Goals
  Course Overview 
    INTRO TO AI 
    Week 1 (8/28-9/1): The foundations of Artificial Intelligence (AI)
    Week 2 (9/5-8): AI, Machine Learning (ML), and Deep Learning (DL)
    Week 3 (9/11-16): What can current AI models do? 
    AI AND THE FUTURE OF HUMANITY 
    Week 4 (9/18-22): What are the stakes? 
    Week 5 (9/25-29): What are the risks? 
    Week 6 (10/2-6): From Intelligence to Superintelligence 
    Week 7 (10/10-13): Human-Machine interaction and cooperation 
    THE BASICS OF AI SAFETY 
    Week 8 (10/16-20): Value learning and goal-directed behavior  
    Week 9 (10/23-27): Instrumental rationality and the orthogonality thesis  
    METAPHYSICAL & EPISTEMOLOGICAL CONSIDERATIONS 
    Week 10 (10/30-11/4): Thinking about the Singularity
    Week 11 (11/6-11): AI and Consciousness 
    ETHICAL QUESTIONS
    Week 12 (11/13-17): What are the moral challenges of high-risk technologies?   
    Week 13 (11/20-22): Do we owe anything to the future? 
    WHAT CAN WE DO NOW 
    Week 14 (11/27-12/1): Technical AI Alignment 
    Week 15 (12/4-8): AI governance and regulation 
None
15 comments

Background

In the fall of 2023, I'm teaching a course called "Philosophy and The Challenge of the Future"^[1] [EA(p) · GW(p)] which is focused on AI risk and safety. I designed the syllabus keeping in mind that my students:

will have no prior exposure to what AI is or how it works
will not necessarily have a strong philosophy background (the course is offered by the Philosophy department, but is open to everyone)
will not necessarily be familiar with Effective Altruism at all

Goals

My approach combines three perspectives: 1) philosophy, 2) AI safety, and 3) Science, Technology, Society (STS); this combination reflects my training in these fields and attempts to create an alternative introduction to AI safety (that doesn't just copy the AISF curriculum). That said, I plan to recommend the AISF course towards the end of the semester; since my students are majoring in all sorts of different things, from CS to psychology, it'd be great if some of them considered AI safety research as their career path.

Course Overview

INTRO TO AI

Week 1 (8/28-9/1): The foundations of Artificial Intelligence (AI)

Required Readings:

Artificial Intelligence, A Modern Approach, pp. 1-27, Russell & Norvig.
Superintelligence, pp. 1-16, Bostrom.

Week 2 (9/5-8): AI, Machine Learning (ML), and Deep Learning (DL)

Required Readings:

You Look Like a Thing and I Love You, Chapters 1, 2, and 3, Shane.
But what is a neural network? (video)
ML Glossary (optional but helpful for terminological references)

Week 3 (9/11-16): What can current AI models do?

Required Readings:

Artificial Intelligence, A Modern Approach, pp. 27-34, Russell & Norvig.
ChatGPT Explained (video)
What is Stable Diffusion? (video)

AI AND THE FUTURE OF HUMANITY

Week 4 (9/18-22): What are the stakes?

Required Readings:

The Precipice, pp. 15-21, Ord.
Existential risk and human extinction: An intellectual history, Moynihan.
Everything might change forever this century (video)

Week 5 (9/25-29): What are the risks?

Required Readings:

Taxonomy of Risks posed by Language Models, Weidinger et al.
Human Compatible, pp. 140-152, Russell.
Loss of Control: “Normal Accidents and AI Systems”, Chan.

Week 6 (10/2-6): From Intelligence to Superintelligence

Required Readings:

A Collection of Definitions of Intelligence, Legg & Hutter.
Artificial Intelligence as a positive and negative factor in global risk, Yudkowsky.
Paths to Superintelligence, Bostrom.

Week 7 (10/10-13): Human-Machine interaction and cooperation

Required Readings:

Cooperative AI: machines must learn to find common ground, Dafoe et. al.
AI-written critiques help humans notice flaws
AI Generates Hypotheses Human Scientists Have Not Thought Of

THE BASICS OF AI SAFETY

Week 8 (10/16-20): Value learning and goal-directed behavior

Required Readings:

Machines Learning Values, Petersen.
The Basic AI Drives, Omuhundro.
The Value Learning Problem, Soares.

Week 9 (10/23-27): Instrumental rationality and the orthogonality thesis

Required Readings:

The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents, Bostrom.
General Purpose Intelligence: Arguing The Orthogonality Thesis, Armstrong.

METAPHYSICAL & EPISTEMOLOGICAL CONSIDERATIONS

Week 10 (10/30-11/4): Thinking about the Singularity

Required Readings:

The Singularity: A Philosophical Analysis, Chalmers.
Can Intelligence Explode?, Hutter.

Week 11 (11/6-11): AI and Consciousness

Required Readings:

Could a Large Language Model be Conscious?, Chalmers.
Will AI Achieve Consciousness? Wrong Question, Dennett.

ETHICAL QUESTIONS

Week 12 (11/13-17): What are the moral challenges of high-risk technologies?

Required Readings:

Human Compatible, “Misuses of AI”, Russell.
The Ethics of Invention, “Risk and Responsibility”, Jasanoff.

Week 13 (11/20-22): Do we owe anything to the future?

Required Readings:

What We Owe The Future, Chapter 1, MacAskill.
The Future of Humanity, Bostrom.
On future people, looking back at 21st century longtermism, Carlsmith.

WHAT CAN WE DO NOW

Week 14 (11/27-12/1): Technical AI Alignment

Required Readings:

Concrete Problems in AI Safety, Amodei et al.
AI Safety needs social scientists, Irving & Askell.

Week 15 (12/4-8): AI governance and regulation

Required Readings:

AI Governance, A research agenda, Dafoe.
AI Strategy, Policy, and Governance (optional but helpful video).

Feedback is welcome! Especially if you have readings in mind that you can imagine your 19-year-old self being excited about.

15 comments

Comments sorted by top scores.

comment by gjm · 2023-09-24T03:08:20.262Z · LW(p) · GW(p)

To what extent, if any, will this course acknowledge that some people disagree very vigorously with what I take to be the positions you're generally advocating for?

(I ask not because I think those people are right and you're wrong -- I think those people are often wrong and sometimes very silly indeed and expect I would favour your position over theirs at least 80% of the time -- but because I think it's important that your students be able to distinguish "this is uncontroversial fact about which basically no one disagrees" from "this is something I am very confident of, but if you talked to some of the other faculty they might think I'm as crazy as I think they are" from "this is my best guess and I am not terribly sure it's right", and the fact that pretty much all the required reading is from an LW-ish EA-ish perspective makes me wonder whether you're making those distinctions clearly. My apologies in advance if I turn out to be being too uncharitable, which I may well be.)

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-09-24T16:39:45.392Z · LW(p) · GW(p)

In addition to acknowledging uncertainty, I think the proper way to address this is to 'teach the controversy.' Have some articles and tweets by Yann LeCun peppered throughout, for example. Also that Nature article: "Stop Worrying About AI Doomsday." Etc.

Replies from: nikolas-kuhn

↑ comment by Amalthea (nikolas-kuhn) · 2023-09-24T23:03:25.557Z · LW(p) · GW(p)

I'm not sure how much space to give the more unreasonable criticisms like the ones you point out. My call would be that the most high quality considerations in all directions should be prioritized over critics being influential or figures of authority - although of course that these voices exist deserves mention, although it might illustrate less the factual dimension than the social one.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-09-24T23:59:22.831Z · LW(p) · GW(p)

I agree those criticisms are pretty unreasonable. However I think they are representative of the discourse -- e.g. Yann LeCun is a very important and influential person, and also an AI expert, so he's not cherry-picked.

Also see this recent review from someone who seems thoughtful and respected: Notes on Existential Risk from Artificial Superintelligence (michaelnotebook.com) who says

I will say this: those pieces all make a case for extraordinary risks from AI (albeit in different ways); I am somewhat surprised that I have not been able to find a work of similar intellectual depth arguing that the risks posed by ASI are mostly of "ordinary" types which humanity knows how to deal with. This is often asserted as "obviously" true, and given a brief treatment; unfortunately-often the rebuttal is mere proof by ridicule, or by lack-of-imagination (often people whose main motivation appears to be that people they don't like are worried about ASI xrisk). It's perhaps not so surprising: "the sky is not falling" is not an obvious target for a serious book-length treatment. Still, I hope someone insightful and imaginative will fill the gap. Three brief-but-stimulating shorter treatments are: Anthony Zador and Yann LeCun, Don't Fear the Terminator (2019); Katja Grace, Counterarguments to the basic AI x-risk case (2022); and: David Krueger, A list of good heuristics that the case for AI x-risk fails [LW · GW] (2019).↩︎

i.e. he thinks there just isn't much actually good criticism out there, to the point where he thinks LeCun is one of the top three!!!! (And note that the other two aren't exactly harsh critics, they are kinda AI safety people playing devil's advocate...)

Replies from: nikolas-kuhn, dr_s

↑ comment by Amalthea (nikolas-kuhn) · 2023-09-25T07:20:41.621Z · LW(p) · GW(p)

Completely agreed on the state of the discourse. I think the more interesting discussions start once you acknowledge at least the vague general possibility of serious risk (see e.g. the recent debate posts on the EA forum). I still think these are wrong, but at least worth engaging with.

If I was giving a course, I just wouldn't really know what to do with actively bad opinions beyond "this person says XYZ" and maybe having the students reason about it as an exercise. But if you do this too much it feels like gloating.

↑ comment by dr_s · 2023-09-25T05:47:58.710Z · LW(p) · GW(p)

Honestly I think the strongest criticism will come from someone arguing that there's not enough leverage in our world for superintelligence to be much more powerful than us, for good or bad. People who argue that ASI is absolutely necessary because it will make us immortal and colonise the stars but doesn't warrant any worry about the possibility it may direct its vast power to less desirable goals are just unserious though. Also obviously the possibility that AGI may actually be still far off, but that doesn't say much about whether it's dangerous, just whether the danger is imminent.

comment by Firinn · 2023-09-24T06:37:38.577Z · LW(p) · GW(p)

oh, great, I'm glad someone is doing this! Will you collect some data about how your students respond, and write up what you feel worked well or badly? Are you aware of any existing syllabi that you took inspiration from? It'd be great if people doing this sort of thing could learn from one another!

comment by gjm · 2023-09-24T02:54:45.409Z · LW(p) · GW(p)

This is very much not what I (or I think anyone) would expect to be in a course with the very general-sounding title "Philosophy and the challenge of the future". Is it the case that anyone choosing whether to study this will first look at the syllabus (or maybe some other document that gives a shorter summary of what's going to be in the course) and therefore not be at risk of being misled? If not, you might consider a more informative title, or maybe a subtitle. "Philosophy and the challenge of artificial intelligence". "Philosophy and the challenge of the future: hard thinking about AI". "Opportunities and threats of artificial intelligence: a philosophical perspective". Or something.

Replies from: ea-1

↑ comment by Eleni Angelou (ea-1) · 2023-09-24T03:01:38.065Z · LW(p) · GW(p)

Course titles are fixed so I didn't choose that, but because it's a non-intro course it's up to the instructor to decide the course's focus. And yes, the students had seen the description before selecting it.

Replies from: gjm

↑ comment by gjm · 2023-09-24T03:10:00.816Z · LW(p) · GW(p)

Huh. So is there a course every year titled "Philosophy and the challenge of the future", with radically different content each time depending on the particular interests of whoever's lecturing that year?

Replies from: nikolas-kuhn, daniel-kokotajlo

↑ comment by Amalthea (nikolas-kuhn) · 2023-09-24T06:57:19.953Z · LW(p) · GW(p)

This doesn't appear to be too unusual, almost every department I've been in has such "topics" courses in certain areas. One point is that the lecturer can present their specific knowledge or current developments.

Replies from: Firinn

↑ comment by Firinn · 2023-09-24T07:14:05.929Z · LW(p) · GW(p)

Yep, I think my university called these "special topics" or "selected topics" papers sometimes. As in, a paper called "Special Topics In X" would just be "we got three really good researchers who happen to study different areas of X, we asked them each to spend one-third of the year teaching you about their favourite research, and then we test you on those three areas at the end of the year". Downside is that you don't necessarily get the optimal three topics that you wanted to learn about, upside is you get to learn from great researchers.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-09-24T16:40:15.832Z · LW(p) · GW(p)

Yep that's how it was in my program at UNC

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-09-24T16:46:37.702Z · LW(p) · GW(p)

Really cool stuff, I'll be interested to hear how it goes! I know some people who taught a similar course at UNC in previous years on the same topics, if you like I can put you in touch and you can compare notes!

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-09-24T16:45:32.814Z · LW(p) · GW(p)

IMO you should lean more in the direction of having less EA content and more technical AI alignment and forecasting and governance content.

E.g. Ajeya's training game report, Ajeya's timelines model + the Davidson-Epoch takeoffspeeds.com extension.

I designed an AI safety course (for a philosophy department)

Contents

Background

Goals

Course Overview

15 comments