Slowing AI: Interventions

post by Zach Stein-Perlman · 2023-04-18T14:30:35.746Z · LW · GW · 0 comments

Contents

  Affordances
  Plans
  Interventions[1]
None
No comments

Disclaimer: this post is underdeveloped and doesn't have the answers. Hopefully a future version will be valuable, but I'm mostly posting this to help facilitate brainstorms and dialogues in my personal conversations.

Affordances

What important actions related to slowing AI could actors take; what levers do they have?

I haven't thought much about this yet. There are some lists and analyses that aren't focused on slowing AI. Copying from "Actors' levers" in "Slowing AI: Reading list" [LW · GW]:

Rough short lists of actors' levers relevant to slowing AI:

US government levers (see also my List of lists of government AI policy ideas):

Lab levers:

How can you affect those decisions? It depends on who you are.

In addition to reasoning from levers like those listed above, it might be useful to try to generate affordances by starting at goals. Try asking "what goals might actor X be able to achieve" or "how can actor X achieve goal Y" in addition to "how can actor X leverage its ability Z."

Some ways of slowing AI are independent of relevant actors' affordances, but many should focus on causing actors to use their abilities well or on giving actors new abilities.

Plans

Possible interventions or their properties are listed in the following section. But maybe it can be useful not to think at the level of interventions, but to start at a plan (or playbook or maybe theory of victory) and then find interventions flowing from that plan. For example, the plan develop model evaluations and have leading labs agree to them currently seems promising; we can ask

(Or not just slowing-related interventions. This paragraph applies to strategy in general, not just slowing AI. And most AI plans aren't focused on slowing AI; that's fine.)

A big class of AI plans seems to have components (1) do technical AI safety research and (2) slow the deployment of systems that would cause existential catastrophe (especially by slowing their development). Particular existing plans may incorporate observations on slowing-related frames, considerations, variables, affordances, or interventions. Perhaps we can improve and/or steal the slowing-related component of an existing plan.

Interventions[1]

This is a list of possible interventions, classes of interventions, and characteristics and consequences of interventions related to slowing AI. It's poorly organized and I don't analytically endorse it. I also don't necessarily endorse listed interventions. (One major ambiguity here: interventions by whom? Different actors have different options.)

Perhaps some interventions or affordances will become possible in the future. Perhaps in particular some will become possible in the endgame.[8] In particular, I'm tentatively excited about last-minute coordination to slow down for safety, enabled by something like scary demos, relevant actors' attitudes, craziness in the world, and strategic clarity.

Some important actions to slow AI progress don't fit into the interventions list, like avoid speeding up AI progresshelp others slow AI progressdiscover new interventions or affordances for relevant actors, and cause [people/organizations/communities/memes] that will slow AI to gain [influence/resources/power].

Some interventions that seem both unpromising and norm-violating are omitted.

(Bay Area) AI safety people often seem to assume that government interventions require persuading politicians, but in fact technocrats could largely suffice.

  1. ^

    Related lists:

    - Thomas Larsen et al.'s Ways to buy time [LW · GW] (2022)
    - Katja Grace's "Restraint is not terrorism, usually" in "Let’s think about slowing down AI" [LW · GW] (2022).

    My list kind of incorporates those lists but has a sufficiently different (broader) approach that skimming Ways to buy time [LW · GW] and reading Restraint is not terrorism, usually [LW · GW] is worthwhile too.

  2. ^

    Thomas Larsen et al. have lots of specific possibilities: Ways to buy time [LW · GW]. See also Vael Gates et al.'s AI Risk Discussions [LW · GW] (2023) and Michael Aird's "How bad/good would shorter AI timelines be?" (unpublished).

  3. ^

    Thomas Larsen et al. have lots of specific possibilities: Ways to buy time [LW · GW]. A side benefit of helping labs determine what is risky is that they will do better safety research.

  4. ^

    It's not obvious why it might be that a lab would slow down if and only if that would cause others to slow down. This relates to my desire for a great account of "racing."

  5. ^

    See e.g. Paul Christiano's Honest organizations (2018). Why is it good for labs to be able to make themselves partially transparent? Transparency allows actors to coordinate without needing to trust each other. In some scenarios, labs would behave more safely if they knew that others were behaving in certain ways, and moreover they could make deals with others to behave more safely.

  6. ^

    In addition to AI killing everyone, security-flavored risks include hacking, chemical and biological engineering, and advantaging adversaries. One frame on publication practices is that American labs publishing helps China (it helps everyone but China is #2 in AI) and "they" are taking value from "us." Perhaps this frame converges with the AI killing everyone risk in suggesting that American AI capabilities research should be siloed or illegal to share or reviewed by the state ('born secret') or something.

  7. ^

    Katja Grace says [LW · GW]:

    E.g. a journal verifies research results and releases the fact of their publication without any details, maintains records of research priority for later release, and distributes funding for participation. (This is how Szilárd and co. arranged the mitigation of 1940s nuclear research helping Germany, except I’m not sure if the compensatory funding idea was used.)

    And (if I recall correctly) Daniel Kokotajlo, Siméon Campos, and Akash Wasil also have thoughts on publication mechanisms.

  8. ^

     "Endgame" in my usage roughly means when there is sufficient clarity and simplicity that

    - Possibilities are few/simple/clear enough to be considered pretty exhaustively and
    - It's useful to optimize directly for terminal goals and do direct search, rather than use heuristics and intermediate goals.

    Note that there may not be an endgame. See also AI endgame [LW(p) · GW(p)].

0 comments

Comments sorted by top scores.