Flashcards for AI Safety?

soren-elverlin-1

Flashcards for AI Safety?

post by Søren Elverlin (soren-elverlin-1) · 2019-05-14T14:37:59.417Z · LW · GW · 1 comment

This is a question post.

  Answers
    4 habryka
    1 chrizbo
None
1 comment

I sometimes struggle to remember the contents of all the articles I've read on AI Safety. Spaced repetition might be helpful, but this requires someone to write flashcards. For Anki, I've found 2 decks, titled "Superintelligence" and "AI Policy".

Do more AI Safety relevant decks exist?

What would be a good strategy for generating useful flashcards?

Answers

answer by habryka · 2019-05-14T19:08:39.077Z · LW(p) · GW(p)

This is the only repository of LW-related Anki decks that I know:

https://www.lesswrong.com/posts/Kbp8riSbpKv8jie8o/anki-decks-by-lw-users [LW · GW]

No explicit section for AI safety, though a lot of the sequences decks apply.

Hower, I personally think copying other people's Anki decks is a bad idea and it has caused me to stop using Anki completely a few times in the past. Though others seem to have had more positive experience with doing that (see the top comments on the thread above for discussion).

answer by chrizbo · 2019-06-04T03:42:58.394Z · LW(p) · GW(p)

Are you looking to learn them or consider them when doing something actively? I've found randomization card decks where there are a lot of options. It allows you to explore a bit more than you would have and doesn't depend on your ability to recall or that you won't be biased for/against certain ones.

I talk about it from an ideation POV here:

https://interaction19.ixda.org/program/talk-using-randomness-to-break-down-biases-chris-butler/

1 comment

Comments sorted by top scores.

comment by riceissa · 2019-05-14T20:04:06.447Z · LW(p) · GW(p)

I've made around 250 Anki cards about AI safety. I haven't prioritized sharing my cards because I think finding a specific card useful requires someone to have read the source material generating the card (e.g. if I made the card based on a blog post, one would need to read that exact blog post to get value out of reviewing the card; see learn before you memorize). Since there are many AI safety blog posts and I don't have the sense that lots of Anki users read any particular blog post, it seems to me that the value generated from sharing a set of cards about a blog post isn't high enough to overcome the annoyance cost of polishing, packaging, and uploading the cards.

More generally, from a consumer perspective, I think people tend to be pretty bad at making good Anki cards (I'm often embarrassed at the cards I've created several months ago!), which makes it unexciting for me to spend a lot of effort trying to collaborate with others on making cards (because I expect to receive poorly-made cards in return for the cards I provide). I think collaborative card-making can be done though, e.g. Michael Nielsen and Andy Matuschak's quantum computing guide comes with pre-made cards that I think are pretty good.

Different people also have different goals/interests so even given a single source material, the specifics one wants to Ankify can be different. For example, someone who wants to understand the technical details of logical induction will want to Ankify the common objects used (market, pricing, trader, valuation feature, etc.), the theorems and proof techniques, and so forth, whereas someone who just wants a high-level overview and the "so what" of logical induction can get away with Ankifying much less detail.

Something I've noticed is that many AI safety posts aren't very good at explaining things (not enough concrete examples, not enough emphasis on common misconceptions and edge cases, not enough effort to answer what I think of as "obvious" questions); this fact is often revealed by the comments people make in response to a post. This makes it hard to make Anki cards because one doesn't really understand the content of the post, at least not well enough to confidently generate Anki cards (one of the benefits of being an Anki user is having a greater sensitivity to when one does not understand something; see "illusion of explanatory depth" and related terms). There are other problems like conflicting usage of terminology (e.g. multiple definitions of "benign", "aligned", "corrigible") and the fact that some of the debates are ongoing/some of the knowledge is still being worked out.

For "What would be a good strategy for generating useful flashcards?": I try to read a post or a series of posts and once I feel that I understand the basic idea, I will usually reread it to add cards about the basic terms and ask myself simple questions. Some example cards for iterated amplification:

what kind of training does the Distill step use?
in the pseudocode, what step gets repeated/iterated?
how do we get A[0]?
write A[1] in terms of H and A[0]
when Paul says IDA is going to be competitive with traditional RL agents in terms of time and resource costs, what exactly does he mean?
advantages of A[0] over H
symbolic expression for the overseer
why should the amplified system (of human + multiple copies of the AI) be expected to perform better than the human alone?

Flashcards for AI Safety?

Contents

Answers

1 comment