Anonymous advice: If you want to reduce AI risk, should you take roles that advance AI capabilities?

post by Benjamin Hilton (80000hours) · 2022-10-11T14:16:02.550Z · LW · GW · 9 comments

9 comments

Comments sorted by top scores.

comment by trevor (TrevorWiesinger) · 2022-10-11T20:43:55.116Z · LW(p) · GW(p)

For anyone who seriously needs to choose (or might choose at some point), it's worth the time to spend at least 2 hours reading this post and other resources. Probably more like a week talking to people if you're already in the area.

Just assuming "I'ma boycott it, because it's bad" is a spectacularly naive oversimplification of the fuller ecosystem. There's a ludicrously large number of potential side effects, many of which you and all your friends have almost zero chance of noticing on your own without reaching out to a ton of different experts.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2022-10-12T19:29:11.464Z · LW(p) · GW(p)

I find this post quite helpful in clarifying my thoughts on this matter. Thank you.

[Edit: comment retracted. The organization I referred to addressed my primary objection, and I thus I wish to withdraw my complaint.]

Replies from: qemqemqem
comment by Andrew Keenan Richardson (qemqemqem) · 2022-10-12T20:12:33.969Z · LW(p) · GW(p)

I'm the Astera researcher that Nathan spoke to. This is a pretty bad misrepresentation of my views based on a 5 minute conversation that Nathan had with me about this subject (at the end of a technical interview). 

A few responses:

  • We do publish open source code at https://github.com/Astera-org but we are considering moving to closed source at some point in the future for safety concerns
  • It is untrue that we are "not interested in securing [our] code or models against malicious actors", but it is true that we are not currently working on the interventions suggested by Nathan
  • My personal view is that AI alignment needs to be tailored to the model, an approach that I am working on articulating further and hope to post on this forum
  • Steve Byrnes works at the Astera institute on alignment issues
Replies from: nathan-helm-burger
comment by Ben Pace (Benito) · 2022-10-11T19:43:24.313Z · LW(p) · GW(p)

Pretty interesting post.

comment by Yonatan Cale (yonatan-cale-1) · 2022-10-11T17:51:56.734Z · LW(p) · GW(p)

I recommend discussing in the original comment [LW(p) · GW(p)] as opposed to splitting up the comments between places, if you have something to ask/say

comment by Yonatan Cale (yonatan-cale-1) · 2022-10-11T15:58:30.202Z · LW(p) · GW(p)

Poll: Agree/Disagree:

Working for a company that advances AI capabilities is a good idea for advancing-safety because you can speak up if you disagree with something, and this outweighs the downside of how you'd help them advance capabilities

comment by Yonatan Cale (yonatan-cale-1) · 2022-10-11T15:57:13.699Z · LW(p) · GW(p)

Poll: Agree/disagree:

Working for companies that advance AI capabilities is generally a good idea for people worried about AI risk

comment by RobertM (T3t) · 2022-10-12T06:06:01.536Z · LW(p) · GW(p)

Something I find interesting is the relationship between believing that the marginal researcher's impact, if taking a capabilities role, is likely to be negligible, and having a position somewhere on the spectrum other than "this is obviously a terrible idea".

On one hand, that seems obvious, maybe even logically necessary.  On the other hand, I think that the impact of the marginal researcher on capabilities research has a much higher mean than median (impact is heavy-tailed), and this may be even more true for those listening to this advice.  I also think the arguments for working on capabilities seem quite weak:

  • "up-skilling"
    • My first objection is that it's not clear why anybody needs to up-skill in a capabilities role before switching to work on alignment.  Most alignment organizations don't have bureaucratic requirements like "[x] years of experience in a similar role", and being an independent researcher obviously has no requirements whatsoever.  The actual skills that might make one more successful at either option... well, that leads to my second objection.
    • My second objection is that "capabilities" is a poorly-defined term.  If one wants to up-skill in ML engineering by e.g. working at an organization which only uses existing techniques to build consumer features, I expect this to have approximately no first-order[1] risk of advancing the capabilities frontier.  However, this kind of role by definition doesn't help you up-skill in areas like "conduct research on unsolved (or worse, unspecified) problems".  To the extent that a role does exercise that kind of skill, that role becomes correspondingly riskier.  tl;dr: you can up-skill in "python + ML libraries" pretty safely, as long as the systems you're working on don't themselves target inputs to AI (i.e. making cheaper chips, better algorithms, etc), but not in "conduct novel research".
  • "influence within capabilities organization"
    • I think the median outcome of an early-career alignment researcher joining a capabilities org and attempting to exert influence to steer the organization in a more alignment-friendly direction is net-negative (though I'm pretty uncertain).  I suspect that for this to be a good idea, it needs to be the primary focus of the person going into the organization, and that person needs to have a strong model of what exactly they're trying to accomplish and how they're going to accomplish it, given the structure and political landscape of the organization that they'll be joining.  If you don't have experience successfully doing this in at least one prior organization, it's difficult to imagine a justified inside-view expectation of success.
  • "connections"
    • See "influence" - what is the plan here?  Admittedly connections can at least preserve some optionality when you leave, but I don't think I've actually seen anyone argue the case for how valuable they expect connections to be, and what their model is for deriving that.

 

In general, I think the balance of considerations quite strongly favors not working on capabilities (in the narrower sense, rather than the "any ML application that isn't explicitly alignment" sense).  The experts themselves seem to be largely split between "obviously bad" and "unclear, balance of trade-offs", and the second set seem to mostly be conditional on beliefs like:

  • "I don’t think it’s obvious that capabilities work is net negative"
  • "I don’t think on the margin AI risk motivated individuals working in these spaces would boost capabilities much"
  • other confusions or disagreements around the category of "capabilities work"
  • what I think are very optimistic beliefs about the ability of junior researchers to exert influence over large organizations

I recognize that "We think this is a hard question!" is not necessarily a summary of the surveyed experts' opinions, but I would be curious to hear the "positive" case for taking a capabilities role implied by it, assuming there's ground not covered by the opinions above.

  1. ^

    And I think the second-order effects, like whatever marginal impact your decision has on the market for ML engineers, in pretty trivial in this case. 

Replies from: gw
comment by gw · 2022-10-12T08:10:32.301Z · LW(p) · GW(p)

Re: "up-skilling": I think this is underestimating the value of developing maturity in an area before trying to do novel research. These are two separate skills, and developing both simultaneously from scratch doesn't seem like the fastest path to proficiency to me. Difficulties often multiply.

There is a long standing certification for "proving you've learned to do novel research", the PhD. A prospective student would find it difficult to enter a grad program without any relevant coursework, and it's not because those institutions think they have equal chances of success as a student who does.