Lab governance reading list

post by Zach Stein-Perlman · 2024-10-25T18:00:28.346Z · LW · GW · 3 comments

Contents

3 comments

What labs should do

OpenAI[3]

Resources

Suggestions are welcome. You can put suggestions that don't deserve their own LW comment in this doc.

  1. ^

     There are two main lines of defense you could employ to prevent schemers from causing catastrophes.

    • Alignment: Ensure that your models aren't scheming.
    • Control: Ensure that even if your models are scheming, you'll be safe, because they are not capable of subverting your safety measures.

    Source: The case for ensuring that powerful AIs are controlled [LW · GW] (Redwood: Greenblatt and Shlegeris 2024).

  2. ^

    [Maybe a lot of early AI risk—risk from AIs that are just powerful enough to be extremely useful—]comes from the lab using AIs internally to do AI development (by which I mean both research and engineering). This is because the AIs doing AI development naturally require access to compute and model weights that they can potentially leverage into causing catastrophic outcomes—in particular, those resources can be abused to run AIs unmonitored.

    Using AIs for AI development looks uniquely risky to me among applications of early-transformative AIs, because unlike all other applications I know about:

    • It’s very expensive to refrain from using AIs for this application.
    • There’s no simple way to remove affordances from the AI such that it’s very hard for the AI to take a small sequence of actions which plausibly lead quickly to loss of control. In contrast, most other applications of AI probably can be controlled just by restricting their affordances.

    Source: Shlegeris 2024 [LW(p) · GW(p)].

  3. ^

     I wrote this post because I'm helping BlueDot create/run a lab governance session. One constraint they impose is focusing on OpenAI, so I made an OpenAI section. Other than that, this doc is just my recommendations.

3 comments

Comments sorted by top scores.

comment by Akash (akash-wasil) · 2024-10-25T23:40:12.512Z · LW(p) · GW(p)

Perhaps this isn’t in scope, but if I were designing a reading list on “lab governance”, I would try to include at least 1-2 perspectives that highlight the limitations of lab governance, criticisms of focusing too much on lab governance, etc.

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

(Perhaps such perspectives get covered in other units, but part of me still feels like it’s pretty important for a lab governance reading list to include some of these more “fundamental” critiques of lab governance. Especially insofar as, broadly speaking, I think a lot of AIS folks were more optimistic about lab governance 1-3 years ago than they are now.)

Replies from: Erich_Grunewald
comment by Erich_Grunewald · 2024-10-26T06:14:02.957Z · LW(p) · GW(p)

Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).

Isn't much of that criticism also forms of lab governance? I've always understood the field of "lab governance" as something like "analysing and suggesting improvements for practices, policies, and organisational structures in AI labs". By that definition, many critiques of RSPs would count as lab governance, as could the coverage of OpenAI's NDAs. But arguments of the sort "labs aren't responsive to outside analyses/suggestions, dooming such analyses/suggestions" would indeed be criticisms of lab governance as a field or activity.

(ETA: Actually, I suppose there's no reason why a piece of X research cannot critique X (the field it's a part of). So my whole comment may be superfluous. But eh, maybe it's worth pointing out that the stuff you propose adding can also be seen as a natural part of the field.)

Replies from: akash-wasil
comment by Akash (akash-wasil) · 2024-10-26T19:56:07.260Z · LW(p) · GW(p)

Yeah, I think there's a useful distinction between two different kinds of "critiques:"

  • Critique #1: I have reviewed the preparedness framework and I think the threshold for "high-risk" in the model autonomy category is too high. Here's an alternative threshold.
  • Critique #2: The entire RSP/PF effort is not going to work because [they're too vague//labs don't want to make them more specific//they're being used for safety-washing//labs will break or weaken the RSPs//race dynamics will force labs to break RSPs//labs cannot be trusted to make or follow RSPs that are sufficiently strong/specific/verifiable]. 

I feel like critique #1 falls more neatly into "this counts as lab governance" whereas IMO critique #2 falls more into "this is a critique of lab governance." In practice the lines blur. For example, I think last year there was a lot more "critique #1" style stuff, and then over time as the list of specific object-level critiques grew, we started to see more support for things in the "critique #2" bucket.