Takeaways from a survey on AI alignment resources

post by DanielFilan · 2022-11-05T23:40:01.917Z · LW · GW · 10 comments

Contents

  What am I talking about?
  My summary of the results
  Basic stats
  Context for questions
  Usefulness ratings
    Among all respondents:
    Among people trying to get into alignment:
    Among people who spend time solving alignment problems:
    Among people paid to work on technical AI alignment research:
  Recommendation ratings
    Alignment professionals recommend to peers:
    Alignment professionals recommend to newcomers (= people trying to move into AI alignment career):
    Newcomers recommend to newcomers:
  Details of the survey
None
10 comments

What am I talking about?

In June and July of this year, I ran a survey to ask a lot of people how useful they found a variety of resources on AI alignment. I was particularly interested in “secondary resources”: that is, not primary resource outputs, but resources that summarize, discuss, analyze, or propose concrete research efforts. I had many people promote the survey in an attempt to make it not obvious that I was running it (so that it would not affect what people said about AXRP, the podcast that I run). CEA helped a great deal with the shaping and promotion of the survey.

The goal of the survey was initially to figure out how useful AXRP was, but I decided that it would be useful to get a broader look at the space of these secondary resources. My hope is that the results give people a better sense of what secondary resources might be worth checking out, as well as gaps that could be filled.

Participants were shown a list of resources, select those they’d engaged with for >30 min, and for each they selected, rate on a scale from 0 to 4 how useful they’d found it, how likely they’d be to recommend to a friend getting into the field who hadn’t read widely, and how likely they’d be to recommend to someone paid to do AI alignment research. You can do a test run of the survey at this link.

My summary of the results

Basic stats

Context for questions

When sorting things by ratings, I’ve included the top 5, and anything just below the top 5 if that was a small number. I also included ratings for AXRP, the podcast I make. Ratings are paired with the standard error of the mean (total ratings have this standard error multiplied by the number of people in the sample). Only things that at least 2 people engaged with were included.

Ratings were generally rounded to two significant figures, and standard errors were reported to the same precision.

Usefulness ratings

Among all respondents:

Total usefulness (multiplying average rating by reach):

  1. 80k podcast: 167 +/- 8
  2. Superintelligence: 166 +/- 8
  3. Talks by AI alignment researchers: 134 +/- 6
  4. Rob Miles videos: 131 +/- 7
  5. AI alignment newsletter: 117 +/- 7
  6. conversations with AI alignment researchers at conferences: 107 +/- 5

Everything else 85 or below, AXRP is at 59 +/- 4.

Average usefulness ratings:

  1. AI Safety Camp: 3.4 +/- 0.2
  2. Conversations: 3.1 +/- 0.2
  3. AGI Safety Fundamentals Course (AGISF): 3.0 +/- 0.2
  4. MLAB: 2.8 +/- 0.8
  5. Rob Miles videos: 2.7 +/- 0.1
  6. 80k podcast: 2.6 +/- 0.1
  7. Superintelligence: 2.6 +/- 0.1
  8. AXRP: 2.6 +/- 0.2

Everything else 2.5 or below.

Among people trying to get into alignment:

Total usefulness:

  1. 80k podcast: 95 +/- 6
  2. AI Alignment Newsletter: 76 +/- 6
  3. Talks by AI alignment researchers: 72 +/- 4
  4. AGISF: 68 +/- 3
  5. Rob Miles videos: 67 +/- 5
  6. Superintelligence: 64 +/- 5

Everything else 50 or below, AXRP is at 37 +/- 3

Average usefulness:

  1. Tie between AI Safety Camp at 3.5 +/- 0.3 and MLAB at 3.5 +/- 0.4
  2. AGISF: 3.2 +/- 0.2
  3. Convos: 3.1 +/- 0.2
  4. ARCHES agenda: 3.0 +/- 0.7
  5. 80k podcast: 2.7 +/- 0.2

Then there’s a tail just under that, AXRP is at 2.6 +/- 0.2

Among people who spend time solving alignment problems:

Total usefulness:

  1. Superintelligence: 48 +/- 5
  2. Talks: 47 +/- 4
  3. Convos: 45 +/- 4
  4. AI Alignment Newsletter: 42 +/- 5
  5. 80k podcast: 37 +/- 4
  6. Embedded Agency sequence: 36 +/- 5

Everything else 29 or below, AXRP is 20 +/- 2.

Average usefulness:

  1. Convos: 3.2 +/- 0.3
  2. AI Safety Camp: 3.2 +/- 0.3
  3. Tie between AGISF at 2.7 +/- 0.4 and ML Safety Newsletter at 2.7 +/- 0.3
  4. AI Alignment Newsletter: 2.6 +/- 0.3
  5. Embedded Agency sequence: 2.6 +/- 0.3

Then a smooth drop in average usefulness, AXRP is at 2.2 +/- 0.3

Among people paid to work on technical AI alignment research:

Total usefulness:

  1. Convos: 28 +/- 3
  2. Talks: 26 +/- 2
  3. Superintelligence: 23 +/- 4
  4. AXRP: 22 +/- 3
  5. Embedded Agency sequence: 20 +/- 3

Everything else 19 or below.

Average usefulness:

  1. AI Safety Camp: 3.7 +/- 0.3
  2. AI Alignment Newsletter: 3.2 +/- 0.4
  3. Convos: 3.1 +/- 0.3
  4. Rob Miles videos: 2.8 +/- 0.5 (honourable mention to AIRCS workshops, which had one rating and scored 3 for usefulness)
  5. AXRP: 2.8 +/- 0.3

Everything else 2.5 or below.

Recommendation ratings

Alignment professionals recommend to peers:

  1. Convos with researchers: 3.7 +/- 0.2
  2. AXRP: 3.3 +/- 0.2
  3. Tie between ML safety newsletter at 3.0 +/- 0.4 and AI alignment newsletter at 3.0 +/- 0.5
  4. Rob Miles videos: 2.6 +/- 0.5
  5. Embedded Agency sequence: 2.5 +/- 0.5

Everything else 2.4 or lower

Alignment professionals recommend to newcomers (= people trying to move into AI alignment career):

  1. AGISF: 3.7 +/- 0.2
  2. Rob Miles: 3.4 +/- 0.3
  3. The Alignment Problem: 3.2 +/- 0.3
  4. 80k podcast: 3.13 +/- 0.3
  5. AI safety camp: 3.0 +/- 0.5

Everything else 2.8 or lower (AXRP is at 1.9 +/- 0.4)

Newcomers recommend to newcomers:

  1. MLAB: 4.0 +/- 0.0 (2 ratings)
  2. AGISF: 3.7 +/- 0.1
  3. Rob Miles: 3.4 +/- 0.2
  4. AI safety camp: 3.0 +/- 0.9
  5. Human Compatible (the book): 2.8 +/- 0.3 (honourable mention to AIRCS workshops which had one rating, and scored 3)
  6. The Alignment Problem: 2.8 +/- 0.3

Everything else 2.6 or lower (AXRP is at 2.4 +/- 0.3)

One tidbit: newcomers seem to agree with the professionals about what newcomers should engage with, in terms of ratings.

Details of the survey

The survey was run on GuidedTrack. Due to an error on my part, if anybody pressed the ‘back’ button and changed a rating, this messed up their results unrecoverably (hence the drop-off from the number of entries total and the number with data I could use).

The list of resources:

The rating scale for usefulness:

The probability rating scale:

As well as the details published here, I also collected how many years people had been interested in AI alignment and/or paid to work on technical AI alignment research, as applicable. Also, people were able to write in comments about specific resources, as well as the survey as a whole, and could write in the place they heard about the survey.

For more details, you can see my GitHub repository for this survey. It contains the GuidedTrack code to specify the survey, the results, and a script to analyze the results. Note that I redacted some details of some comments to remove detail that might identify a respondent.

10 comments

Comments sorted by top scores.

comment by DanielFilan · 2023-12-18T01:27:17.441Z · LW(p) · GW(p)

I'm glad I did this project and wrote this up. When your goal is to make a thing to make the AI alignment community wiser, it's not really obvious how to tell if you're succeeding, and this was a nice step forward in doing that in a way that "showed my work". That said, it's hard to draw super firm conclusions, because of bias in who takes the survey and some amount of vagueness in the questions. Also, if the survey says a small number of people used a resource and all found it very useful, it's hard to tell if people who chose not to use the resource would find it useful.

That said, I'm not sure it's had much of an impact. I don't see people talking about it much, and the one link-back is an LTFF reviewer saying they didn't really pay much attention to the survey in deciding whether to fund AXRP. Probably the biggest impact is that the process of figuring out how to make the survey helped me clarify how I would ask people in my day-to-day life whether a resource has been useful (my current favourite is "have you learned something by listening to / reading / partaking in the resource", alongside whether they like it), but this is not conveyed well in the post itself.

comment by brook · 2022-11-06T17:20:20.094Z · LW(p) · GW(p)

Thanks for running this survey! I'm looking to move into AI alignment, and this represents a useful aggregator of recommendations from professionals and from other newcomers; I was already focussing on AGISF but it's useful to see that many of the resources advertised as 'introductory' on the alignment forum (e.g. the Embedded Agency sequence) are not rated as very useful. 

I was also surprised that conversations with researchers ranked quite low as a recommendation to newcomers, but I guess it makes sense that most alignment researchers are not as good at 'interpreting' research as e.g. Rob Miles, Richard Ngo. 

Replies from: DanielFilan
comment by DanielFilan · 2022-11-06T17:39:23.548Z · LW(p) · GW(p)

My guess is that it's also because conversations are less optimized (being done on the fly) and maybe harder to access. It's still the case that people getting into alignment found them "very" useful on average, which seems like high praise to me.

comment by Vika · 2022-11-10T16:50:31.332Z · LW(p) · GW(p)

Too bad that my list of AI safety resources didn't make it into the survey - would be good to know to what extent it would be useful to keep maintaining it. Will you be running future iterations of this survey? 

Replies from: DanielFilan
comment by DanielFilan · 2022-11-11T19:42:35.728Z · LW(p) · GW(p)

Would have been good to ask about that and also mine it for resources.

Re: future iterations, I'm not sure. On one hand, I think it's kind of bad for this kind of thing to be run by a person who stands to benefit from his thing ranking high on the survey. On the other hand, I'm not sure if anyone else wants to do it, and I think it would be good to run future iterations.

If anyone does want to take it over, please let me know. I'm not sure how many would be interested in doing that (maybe grantmaking orgs?), but if there are multiple such people it would probably be good to pick a designated successor. I should say that I reserve the right to wait until next year to make any sort of decision on this.

Replies from: DanielFilan
comment by DanielFilan · 2022-11-14T01:21:45.430Z · LW(p) · GW(p)

Similarly, the next iteration of this should definitely ask about The Inside View.

comment by plex (ete) · 2022-11-07T17:06:31.004Z · LW(p) · GW(p)

I'd love to see a spreadsheet with more than the top 5 for each category, to help prioritization of aisafety.info resources.

Replies from: DanielFilan
comment by DanielFilan · 2022-11-08T19:41:29.585Z · LW(p) · GW(p)

I didn't want to name and shame lower-ranked entries, but if you go to the github and run the code you can see the whole ranked list for each category you're interested in - just have to uncomment the relevant part of the script.

Replies from: ete
comment by plex (ete) · 2022-11-08T22:17:56.131Z · LW(p) · GW(p)

Makes sense. As a not-really-programmer, this is for sure enough of an inconvenience that I won't. Would you be up for listing the top 50% in a spreadsheet?

Edit: Thanks Joern Stoehler for DMing me a readout!

comment by Rohin Shah (rohinmshah) · 2022-11-06T09:34:13.350Z · LW(p) · GW(p)

Thanks for running this!

Overall, I'm not sure what the takeaways are for the Alignment Newsletter.

On the one hand, it scored pretty highly in several categories, particularly "total usefulness". It also probably takes less time than some other things above it, making it pretty cost-effective if you just quickly look at the numbers.

On the other hand, the main theory of impact is that people who work on alignment can use it to keep up to date about alignment work. But deriving reach as total usefulness / average usefulness, we see that <= 6/30 people who are paid to work on technical AI alignment research and 16/51 people who spend time solving alignment problems said that they engaged with the newsletter. Intuitively that feels pretty low and a signal against the main theory of impact.