Cost-effectiveness of professional field-building programs for AI safety research

post by Dan H (dan-hendrycks) · 2023-07-10T18:28:36.677Z · LW · GW · 5 comments

5 comments

Comments sorted by top scores.

comment by Natália (Natália Mendonça) · 2023-07-11T20:25:56.607Z · LW(p) · GW(p)

If you don’t mind me asking — what was the motivation behind posting 3 separate posts on the same day with very similar content, rather than a single one? 

It looks like a large chunk (around a ~quarter or a third or something similar) of the sentences in this post are identical to those in “Cost-effectiveness of student programs for AI safety research [LW · GW]” or differ only slightly (by e.g. replacing the word “students” with “professionals” or “participants”). 

Moreover, some paragraphs in both of those posts can be found verbatim in the introductory post, “Modeling the impact of AI safety field-building programs [LW · GW],” as well. 

This can generate confusion, as people usually don’t expect blog posts to be this similar.

Replies from: oliver-zhang
comment by ozhang (oliver-zhang) · 2023-07-24T05:59:25.308Z · LW(p) · GW(p)

The main overlap between Modeling the impact of AI safety field-building programs [LW · GW] and the other two posts is the disclaimers, which we believe should be copied in all three posts, and the main QARY definition, which seemed significant enough to add. Beyond that, the intro post is distinct from the two analysis posts.

This post does have much in common with the Cost-effectiveness of student programs for AI safety research.  [LW · GW] The two post are structured in an incredibly similar manner. That being said, the sections, are doing the same analysis to different sets of programs. As such, the graphs/numbers/conclusions drawn may be different.

It's plausible that we could've dramatically shortened the section "The model" from one of the posts. Ultimately, we did not decide to and instead let the reader decide if they wanted to skip. (This has the added benefit of making each post most self-contained.) However, we could see arguments for the opposing view.

comment by DanielFilan · 2023-07-10T21:26:57.238Z · LW(p) · GW(p)

It seems like the estimates for the cost-effectiveness of the NeurIPS social and workshop rely heavily on estimates of the number of "conversions" those produced, but I couldn't find an explanation of how these estimates were produced in the post. No chance you can walk us thru the envelope math there?

Replies from: oliver-zhang
comment by ozhang (oliver-zhang) · 2023-07-23T16:14:03.473Z · LW(p) · GW(p)

Of course!

We ask practitioners who have direct experience with these programs for their beliefs as to which research avenues participants pursue before and after the program. Research relevance (before/without, during, or after) is given by the sum product of these probabilities with CAIS’s judgement of the relevance of different research avenues (in the sense defined here [LW(p) · GW(p)]).You can find the explicit calculations for workshops at lines 28-81 of this script, and for socials at lines 28-38 of this script.

Using workshop contenders’ research relevance without the program (during and after the program period) as an example:

  1. There are ~107 papers submitted with unique sets of authors. (Not quite -- this is just number of non-unique authors across submitted papers divided by average number of authors per paper. This is the way that the main practitioner interviewed about workshops found it most natural to think through this problem.)
  2. What might the distribution of research avenues among these papers look like without the program
    1. Practitioners believe: around 3% cover research avenues that CAIS considers to be 100x the relevance of adversarial robustness (e.g. power aversion), 30% cover avenues 10x the relevance of adversarial robustness (e.g. trojans), 30% cover avenues equally relevant to adversarial robustness, and most of the remainder of papers would cover research avenues 0.1x the relevance of adversarial robustness. (Next point refers to the remaining remainder.)
    2. Additionally, practitioners believe that in expectation the workshop will produce 0.05 papers with 100x relevance, 0.25 papers with 10x relevance, 1 paper with 2x relevance, 4 papers with 1x relevance, and 1 paper with 0.5x relevance. Of these, the 100x and 10x papers are fully counterfactual, and the remaining papers are 30% likely to be counterfactual.
    3. Calculating out, average research relevance without the program among contenders is 3.41.

Clearly, this is far from a perfect process. Hence the strong disclaimer [LW(p) · GW(p)] regarding parameter values. In future we would want to survey participants before and after, rather than rely on practitioner intuitions. We are very open to the possibility that better future methods would produce parameter values inconsistent with the current parameter values! Our hope with these posts is to provide a helpful framework for thinking about these programs [LW(p) · GW(p)], not to give confident conclusions.

Finally, it’s worth mentioning that the cost-effectiveness of these programs relative to one another do not rely very heavily on conversions. You can see this by reading off cost-effectiveness from the change in research relevance here [LW · GW]. Further, research avenue relevance treatment effects across programs (excluding engineers submitting to the TDC, where we can be more confident) differ by a factor of ~2, whereas differences in e.g. cost per participant are ~20x and average scientist-equivalence are ~7x.

Replies from: DanielFilan
comment by DanielFilan · 2023-07-24T19:01:36.023Z · LW(p) · GW(p)

I guess I just don't have a strong sense of where the practitioners' numbers are coming from, or why they believe what they believe. Which is fine if you want to bulid a pipeline that turn some intuitions into decisions, but not obviously incredibly useful for the rest of us (beyond just telling us those intuitions).

Finally, it’s worth mentioning that the cost-effectiveness of these programs relative to one another do not rely very heavily on conversions.

The thing you link shows that if you change the conversion ratio of both programs the same amount, the relative cost-effectiveness doesn't change, which makes sense. But if workshops produced 100x more conversions than socials, or vice versa, presumably this must make a difference. If you say that the treatment effects only differ by a factor of 2, then fair enough, but that's just not super a priori clear (and the fact that you claim that (a) you can measure the TDC better and (b) the TDC has a different treatment effect makes me skeptical).

(For the record, I couldn't really make heads or tails of the spreadsheet you linked or what the calculations in the script were supposed to be, but I didn't try super hard to understand them - perhaps I'd write something different if I really understood them)