Slowing AI: Interventions
post by Zach Stein-Perlman · 2023-04-18T14:30:35.746Z · LW · GW · 0 commentsContents
Affordances Plans Interventions[1] None No comments
Disclaimer: this post is underdeveloped and doesn't have the answers. Hopefully a future version will be valuable, but I'm mostly posting this to help facilitate brainstorms and dialogues in my personal conversations.
Affordances
What important actions related to slowing AI could actors take; what levers do they have?
I haven't thought much about this yet. There are some lists and analyses that aren't focused on slowing AI. Copying from "Actors' levers" in "Slowing AI: Reading list" [LW · GW]:
- "Levers of governance" in "Literature Review of Transformative AI Governance" (Maas draft)
- AI Policy Levers: A Review of the U.S. Government’s Tools to Shape AI Research, Development, and Deployment (Fischer et al. 2021)
- "Affordances" in "Framing AI strategy" [LW(p) · GW(p)] (Stein-Perlman 2023)
- Various private collections of AI policy ideas, notably including "AI Policy Ideas Database" (private work in progress)
- Current UK government levers on AI development [EA · GW] (Hadshar 2023)
- Existential and global catastrophic risk policy ideas database (filter for "Artificial intelligence") (Sepasspour et al. 2022)
Rough short lists of actors' levers relevant to slowing AI:
US government levers (see also my List of lists of government AI policy ideas):
- Ban or moratorium
- Standards & regulation
- Evals
- Export controls
- Tracking compute
- Publication regulation
- "Born secret"
- Expropriation for slowing
- Government control
- Expropriation
- Manhattan/Apollo-style project
- Liability: affect companies' liability for misuse of their AI
- Migration of talent
- Visa vetting
- Expanding immigration
- Antitrust (or its absence)
Lab levers:
- Pausing/slowing (on risky systems)
- Coordination
- Publication & diffusion of ideas
- Industry self-regulation & professional norms
How can you affect those decisions? It depends on who you are.
In addition to reasoning from levers like those listed above, it might be useful to try to generate affordances by starting at goals. Try asking "what goals might actor X be able to achieve" or "how can actor X achieve goal Y" in addition to "how can actor X leverage its ability Z."
Some ways of slowing AI are independent of relevant actors' affordances, but many should focus on causing actors to use their abilities well or on giving actors new abilities.
Plans
Possible interventions or their properties are listed in the following section. But maybe it can be useful not to think at the level of interventions, but to start at a plan (or playbook or maybe theory of victory) and then find interventions flowing from that plan. For example, the plan develop model evaluations and have leading labs agree to them currently seems promising; we can ask
- How can the evals plan help slow AI?
- How can we support/facilitate the evals plan? What slowing-related desiderata help enable the evals plan (and what interventions would promote those desiderata)?
- What slowing-related interventions does the evals plan enable or synergize with? In worlds where people work on the evals plan, what new opportunities for slowing AI appear?
(Or not just slowing-related interventions. This paragraph applies to strategy in general, not just slowing AI. And most AI plans aren't focused on slowing AI; that's fine.)
A big class of AI plans seems to have components (1) do technical AI safety research and (2) slow the deployment of systems that would cause existential catastrophe (especially by slowing their development). Particular existing plans may incorporate observations on slowing-related frames, considerations, variables, affordances, or interventions. Perhaps we can improve and/or steal the slowing-related component of an existing plan.
Interventions[1]
This is a list of possible interventions, classes of interventions, and characteristics and consequences of interventions related to slowing AI. It's poorly organized and I don't analytically endorse it. I also don't necessarily endorse listed interventions. (One major ambiguity here: interventions by whom? Different actors have different options.)
- Help labs slow down for safety (now or later)
- Cause labs to want to slow down for safety
- Advocacy to labs and the ML research community[2]
- Make safety more prestigious
- Help labs determine what is risky[3]
- Safety research
- Making safety research legible to labs
- Model evaluations
- Help labs coordinate to slow down[4]
- Help labs develop and agree to safety standards
- Help labs make themselves partially transparent[5]
- Auditing
- Whistleblowing
- Make labs' (and researchers') attitudes on risk and safety common knowledge
- Make labs' (and researchers') shared values common knowledge
- Model evaluations
- Katja Grace says [LW · GW]: "Formulate specific precautions for AI researchers and labs to take in different well-defined future situations, Asilomar Conference style. These could include more intense vetting by particular parties or methods, modifying experiments, or pausing lines of inquiry entirely. Organize labs to coordinate on these."
- Cause labs to prefer projects and systems that are less risky and less likely to lead to risk
- [Not sure how to do that, other than regulation, discussed below]
- Help researchers or lab employees develop affordances for collective action (Katja Grace says [LW · GW]: "Help organize the researchers who think their work is potentially omnicidal into coordinated action on not doing it")
- Cause labs to want to slow down for safety
- Decrease labs' access to inputs to AI progress
- Decrease labs' access to compute (for large training runs)
- Track compute and regulate access to compute for large training runs
- Intervene on the supply chain or regulate production
- Increase the short-term price of compute by increasing short-term demand
- Buy compute
- (Maybe) export controls and regulating trade
- Decrease labs' money
- Decrease investment in AI labs
- Make AI products less profitable (through policy)
- Decrease labs' access to data
- Cause companies to not train AI on private data
- Through policy
- Motivated by protecting privacy
- Motivated by protecting intellectual property
- Through companies wanting to respect privacy
- Through policy
- Cause companies to not train AI on synthetic data
- Cause companies to not train AI on certain parts of the internet
- Cause companies to not train AI on private data
- Decrease labs' access to capability-increasing external research, or decrease diffusion of ideas, algorithms, and models (related desiderata: decrease labs' access to risk-increasing external research)
- Cause capability-increasing ideas to not be published or otherwise propagated
- Decrease labs' access to research talent
- Cause researchers to less prefer to do work that increases risk (e.g., perhaps large language models, reinforcement learning agents, and compute-intensive models vs self-driving cars and image generation) (somewhat related: prestige races)
- Cause researchers to less prefer to work at leading labs
- Tell your friends not to advance risky AI capabilities
- Avoid actions that cause people to pursue careers in which they advance risky AI capabilities (including some AI safety research training programs)
- Improve migration of AI talent
- Increase emigration from China
- Decrease labs' access to AI research tools or ability to automate research
- Make it harder to deploy AI research tools
- Decrease labs' access to compute (for large training runs)
- Other
- Policy hinders developing, deploying, and profiting from AI
- Policy differentially hinders developing, deploying, and profiting from AI that is relatively risky or likely to lead to risky systems (e.g., large language models) (so labs substitute from more-risky to less-risky projects)
- Regulation & standards
- Regulating development
- Regulating deployment
- Model evaluations
- Regulation & standards
- Katja Grace says [LW · GW]: "Try to get the message to the world that AI is heading toward being seriously endangering. If AI progress is broadly condemned, this will trickle into myriad decisions: job choices, lab policies, national laws. To do this, for instance produce compelling demos of risk, agitate for stigmatization of risky actions [LW(p) · GW(p)], write science fiction illustrating the problems broadly and evocatively (I think this has actually been helpful repeatedly in the past), go on TV, write opinion pieces, help organize and empower the people who are already concerned, etc."
Perhaps some interventions or affordances will become possible in the future. Perhaps in particular some will become possible in the endgame.[8] In particular, I'm tentatively excited about last-minute coordination to slow down for safety, enabled by something like scary demos, relevant actors' attitudes, craziness in the world, and strategic clarity.
Some important actions to slow AI progress don't fit into the interventions list, like avoid speeding up AI progress, help others slow AI progress, discover new interventions or affordances for relevant actors, and cause [people/organizations/communities/memes] that will slow AI to gain [influence/resources/power].
Some interventions that seem both unpromising and norm-violating are omitted.
(Bay Area) AI safety people often seem to assume that government interventions require persuading politicians, but in fact technocrats could largely suffice.
- ^
Related lists:
- Thomas Larsen et al.'s Ways to buy time [LW · GW] (2022)
- Katja Grace's "Restraint is not terrorism, usually" in "Let’s think about slowing down AI" [LW · GW] (2022).My list kind of incorporates those lists but has a sufficiently different (broader) approach that skimming Ways to buy time [LW · GW] and reading Restraint is not terrorism, usually [LW · GW] is worthwhile too.
- ^
Thomas Larsen et al. have lots of specific possibilities: Ways to buy time [LW · GW]. See also Vael Gates et al.'s AI Risk Discussions [LW · GW] (2023) and Michael Aird's "How bad/good would shorter AI timelines be?" (unpublished).
- ^
Thomas Larsen et al. have lots of specific possibilities: Ways to buy time [LW · GW]. A side benefit of helping labs determine what is risky is that they will do better safety research.
- ^
It's not obvious why it might be that a lab would slow down if and only if that would cause others to slow down. This relates to my desire for a great account of "racing."
- ^
See e.g. Paul Christiano's Honest organizations (2018). Why is it good for labs to be able to make themselves partially transparent? Transparency allows actors to coordinate without needing to trust each other. In some scenarios, labs would behave more safely if they knew that others were behaving in certain ways, and moreover they could make deals with others to behave more safely.
- ^
In addition to AI killing everyone, security-flavored risks include hacking, chemical and biological engineering, and advantaging adversaries. One frame on publication practices is that American labs publishing helps China (it helps everyone but China is #2 in AI) and "they" are taking value from "us." Perhaps this frame converges with the AI killing everyone risk in suggesting that American AI capabilities research should be siloed or illegal to share or reviewed by the state ('born secret') or something.
- ^
E.g. a journal verifies research results and releases the fact of their publication without any details, maintains records of research priority for later release, and distributes funding for participation. (This is how Szilárd and co. arranged the mitigation of 1940s nuclear research helping Germany, except I’m not sure if the compensatory funding idea was used.)
And (if I recall correctly) Daniel Kokotajlo, Siméon Campos, and Akash Wasil also have thoughts on publication mechanisms.
- ^
"Endgame" in my usage roughly means when there is sufficient clarity and simplicity that
- Possibilities are few/simple/clear enough to be considered pretty exhaustively and
- It's useful to optimize directly for terminal goals and do direct search, rather than use heuristics and intermediate goals.Note that there may not be an endgame. See also AI endgame [LW(p) · GW(p)].
0 comments
Comments sorted by top scores.