Framing AI strategy

zach-stein-perlman

Framing AI strategy

post by Zach Stein-Perlman · 2023-02-07T19:20:04.535Z · LW · GW · 1 comments

This is a link post for https://aiimpacts.org/framing-ai-strategy/

  Make a plan
  Affordances
  Intermediate goals
  Threat modeling
  Theories of victory
  Tactics and policy development
  Memes & frames
  Exploration, world-modeling, and forecasting
  Nearcasting
  Leverage
None
1 comment

Strategy is the activity or project of doing research to inform interventions to achieve a particular goal.¹ AI strategy is strategy from the perspective that AI is important, focused on interventions to make AI go better. An analytic frame is a conceptual orientation that makes salient some aspects of an issue, including cues for what needs to be understood, how to approach the issue, what your goals and responsibilities are, what roles to see yourself as having, what to pay attention to, and what to ignore.

This post discusses ten strategy frames, focusing on AI strategy. Some frames are comprehensive approaches to strategy; some are components of strategy or prompts for thinking about an aspect of strategy. This post focuses on meta-level exploration of frames, but the second and last sections have some object-level thoughts within a frame.

Sections are overlapping but independent; focus on sections that aren’t already in your toolbox of approaches to strategy.

Epistemic status: exploratory, brainstormy.

Make a plan

See Jade Leung’s Priorities in AGI governance research (2022) and How can we see the impact of AI strategy research? (2019).

One output of strategy is a plan describing relevant (kinds of) actors’ behavior. More generally, we can aim for a playbook– something like a function from (sets of observations about) world-states to plans. A plan is good insofar as it improves important decisions in the counterfactual where you try to implement it, in expectation.

To make a plan or playbook, identify (kinds of) actors that might be affectable, then figure out

what they could do,
what it would be good for them to do,
what their incentives are (if relevant), and then
how to cause them to act better.

It is also possible to focus on decisions rather than actors: determine what decisions you want to affect (presumably because they’re important and affecting them seems tractable) and how you can affect them.

For AI, relevant actors include AI labs, states (particularly America), non-researching non-governmental organizations (particularly standard-setters), compute providers, and the AI risk and EA communities.²

Insofar as an agent (not necessarily an actor that can take directly important actions) has distinctive abilities and is likely to try to execute good ideas you have, it can be helpful to focus on what the agent can do or how to leverage the agent’s distinctive abilities rather than backchain from what would be good.³

Affordances

As in the previous section, a natural way to improve the future is to identify relevant actors, determine what it would be good for them to do, and cause them to do those things. “Affordances” in strategy are “possible partial future actions that could be communicated to relevant actors, such that they would take similar actions.”⁴ The motivation for searching for and improving affordances is that there probably exist actions that would be great and relevant actors would be happy to take, but that they wouldn’t devise or recognize by default. Finding great affordances is aided by a deep understanding of how an actor thinks and its incentives, as well as a deep external understanding of the actor, to focus on its blind spots and identify feasible actions.⁵ Separately, the actor’s participation would sometimes be vital.

Affordances are relevant not just to cohesive actors but also to non-structured groups. For example, for AI strategy, discovering affordances for ML researchers (as individuals or for collective action) could be valuable. Perhaps there also exist great possible affordances that don’t depend much on the actor– generally helpful actions that people just aren’t aware of.

For AI, two relevant kinds of actors are states (particularly America) and AI labs. One way to discover affordances is to brainstorm the kinds of actions particular actors can take, then find creative new plans within that list. Going less meta, I made lists of the kinds of actions states and labs can take that may be strategically significant, since such lists seem worthwhile and I haven’t seen anything like them.

Kinds of things states can do that may be strategically relevant (or consequences or characteristics of possible actions):

Regulate (and enforce regulation in their jurisdiction and investigate possible violations)
Expropriate property and nationalize companies (in their territory)
Perform or fund research (notably including through Manhattan/Apollo-style projects)
Acquire capabilities (notably including military and cyber capabilities)
Support particular people, companies, or states
Disrupt or attack particular people, companies, or states (outside their territory)
Affect what other actors believe on the object level
- Share information
- Make information salient in a way that predictably affects beliefs
- Express attitudes that others will follow
Negotiate with other actors, or affect other actors’ incentives or meta-level beliefs
Make agreements with other actors (notably including contracts and treaties)
Establish standards, norms, or principles
Make unilateral declarations (as an international legal commitment) [less important]

Kinds of things AI labs⁶ can do—or choose not to do—that may be strategically relevant (or consequences or characteristics of possible actions):

Deploy an AI system
Pursue capabilities
- Pursue risky (and more or less alignable systems) systems
- Pursue systems that enable risky (and more or less alignable) systems
- Pursue weak AI that’s mostly orthogonal to progress in risky stuff for a specific (strategically significant) task or goal
  - This could enable or abate catastrophic risks besides unaligned AI
Do alignment (and related) research (or: decrease the alignment tax [EA · GW] by doing technical research)
- Including interpretability and work on solving or avoiding alignment-adjacent problems like decision theory and strategic interaction [LW · GW] and maybe delegation involving multiple humans or multiple AI systems
Advance global capabilities
- Publish capabilities research
- Cause investment or spending in big AI projects to increase
Advance alignment (or: decrease the alignment tax) in ways other than doing technical research
- Support and coordinate with external alignment researchers
Attempt to align a particular system (or: try to pay the alignment tax)
Interact with other labs⁷
- Coordinate with other labs (notably including coordinating to avoid risky systems)
  - Make themselves transparent to each other
  - Make themselves transparent to an external auditor
  - Merge
  - Effectively commit to share upsides
  - Effectively commit to stop and assist
- Affect what other labs believe on the object level (about AI capabilities or risk in general, or regarding particular memes)
  - Practice selective information sharing [LW(p) · GW(p)]
  - Demonstrate AI risk (or provide evidence about it)
- Negotiate with other labs, or affect other labs’ incentives or meta-level beliefs
Affect public opinion, media, and politics
- Publish research
- Make demos or public statements
- Release or deploy AI systems
Improve their culture or operational adequacy [? · GW]
- Improve operational security
- Affect attitudes of effective leadership
- Affect attitudes of researchers
- Make a plan for alignment (e.g., OpenAI’s); share it; update and improve it; and coordinate with capabilities researchers, alignment researchers, or other labs if relevant
- Make a plan for what to do with powerful AI (e.g., CEV or some specification of long reflection [? · GW]), share it, update and improve it, and coordinate with other actors if relevant
- Improve their ability to make themselves (selectively) transparent
Try to better understand the future, the strategic landscape, risks, and possible actions
Acquire resources
- E.g., money, hardware, talent, influence over states, status/prestige/trust
- Capture scarce resources
  - E.g., language data from language model users
Affect other actors’ resources
- Affect the flow of talent between labs or between projects
Plan, execute, or participate in pivotal acts or processes [LW · GW]

(These lists also exist on the AI Impacts wiki, where they may be improved in the future: Affordances for states and Affordances for AI labs. These lists are written from an alignment-focused and misuse-aware perspective, but prosaic risks may be important too.)

Maybe making or reading lists like these can help you notice good tactics. But innovative affordances are necessarily not things that are already part of an actor’s behavior.

Maybe making lists of relevant things similar actors have done in the past would illustrate possible actions, build intuition, or aid communication.

This frame seems like a potentially useful complement to the standard approach backchain [LW · GW] from goals to actions of relevant actors. And it seems good to understand actions that should be items on lists like these—both like understanding these list-items well and expanding or reframing these lists—so you can notice opportunities.

Intermediate goals

No great sources are public, but illustrating this frame see “Catalysts for success” and “Scenario variables” in Marius Hobhannon et al.’s What success looks like [EA · GW] (2022). On goals for AI labs, see Holden Karnofsky’s Nearcast-based “deployment problem” analysis [LW · GW] (2022).

An intermediate/instrumental goal is a goal that is valuable because it promotes one or more final/terminal goals. (“Goal” sounds discrete and binary, like “there exists a treaty to prevent risky AI development,” but often should be continuous, like “gain resources and influence.”) Intermediate goals are useful because we often need more specific and actionable goals than “make the future go better” or “make AI go better.”

Knowing what specifically would be good for people to do is a bottleneck on people doing useful things. If the AI strategy community had better strategic clarity, in terms of knowledge about the future and particularly intermediate goals, it could better utilize people’s labor, influence, and resources. Perhaps an overlapping strategy framing is finding or unlocking effective opportunities to spend money. See Luke Muehlhauser’s A personal take on longtermist AI governance [EA · GW] (2021).⁸

It is also sometimes useful to consider goals about particular actors.

Threat modeling

Illustrating threat modeling for the technical component of AI misalignment, see the DeepMind safety team’s Threat Model Literature Review [LW · GW] and Clarifying AI X-risk [LW · GW] (2022), Sam Clarke and Sammy Martin’s Distinguishing AI takeover scenarios [LW · GW] (2021), and GovAI’s Survey on AI existential risk scenarios [EA · GW] (2021).

The goal of threat modeling [? · GW] is deeply understanding one or more risks for the purpose of informing interventions. A great causal model of a threat (or class of possible failures) can let you identify points of intervention and determine what countering the threat would require.

A related project involves assessing all threats (in a certain class) rather than a particular one, to help account for and prioritize between different threats.

Technical AI safety research informs AI strategy through threat modeling. A causal model of (part of) AI risk can generate a model of AI risk abstracted for strategy, with relevant features made salient and irrelevant details black-boxed. This abstracted model gives us information including necessary and sufficient conditions or intermediate goals for averting the relevant threats. These in turn can inform affordances, tactics, policies, plans, influence-seeking, and more.

Theories of victory

I am not aware of great sources, but illustrating this frame see Marius Hobhannon et al.’s What success looks like [EA · GW] (2022).

Considering theories of victory is another natural frame for strategy: consider scenarios where the future goes well, then find interventions to nudge our world toward those worlds. (Insofar as it’s not clear what the future going well means, this approach also involves clarifying that.) To find interventions to make our world like a victorious scenario, I sometimes try to find necessary and sufficient conditions for the victory-making aspect of that scenario, then consider how to cause those conditions to hold.⁹

Great threat-model analysis can be an excellent input to theory-of-victory analysis, to clarify the threats and what their solutions must look like. And it could be useful to consider scenarios in which the future goes well and scenarios where it doesn’t, then examine the differences between those worlds.

Tactics and policy development

Collecting progress on possible government policies, see GovAI’s AI Policy Levers (2021) and GCRI’s Policy ideas database.

Given a model of the world and high-level goals, we must figure out how to achieve those goals in the messy real world. For a goal, what would cause success, which of those possibilities are tractable, and how could they become more likely to occur? For a goal, what are necessary and sufficient conditions for achievement and how could those occur in the real world?

Memes & frames

I am not aware of great sources on memes & frames in strategy, but see Jade Leung’s How can we see the impact of AI strategy research? (2019). See also the academic literature on framing, e.g. Robert Entman’s Framing (1993).

(“Frames” in this context refers to the lenses through which people interpret the world, not the analytic, research-y frames discussed in this post.)

If certain actors held certain attitudes, they would make better decisions. One way to affect attitudes is to spread memes [? · GW]. A meme could be explicit agreement with a specific proposition; the attitude that certain organizations, projects, or goals are (seen as) shameful; the attitude that certain ideas are sensible and respectable or not; or merely a tendency to pay more attention to something. The goal of meme research is finding good memes—memes that would improve decisions if widely accepted (or accepted by a particular set of actors¹⁰) and are tractable to spread—and figuring out how to spread them. Meme research is complemented by work actually causing those memes to spread.

For example, potential good memes in AI safety include things like AI is powerful but not robust, and in particular [specification gaming or Goodhart or distributional shift or adversarial attack] is a big deal. Perhaps misalignment as catastrophic accidents is easier to understand than misalignment as powerseeking agents, or vice versa. And perhaps misuse risk is easy to understand and unlikely to be catastrophically misunderstood, but less valuable-if-spread.

A frame tells people what to notice and how to make sense of an aspect of the world. Frames can be internalized by a person or contained in a text. Frames for AI might include frames related to consciousness, Silicon Valley, AI racism, national security, or specific kinds of applications such as chatbots or weapons.

Higher-level research could also be valuable. This would involve topics like how to communicate ideas about AI safety or even how to communicate ideas and how groups form beliefs.

This approach to strategy could also involve researching how to stifle harmful memes, like perhaps “powerful actors are incentivized to race for highly capable AI” or “we need a Manhattan Project for AI.”

Exploration, world-modeling, and forecasting

Sometimes strategy greatly depends on particular questions about the world and the future.

More generally, you can reasonably expect that increasing clarity about important-seeming aspects of the world and the future will inform strategy and interventions, even without thinking about specific goals, actors, or interventions. For AI strategy, exploration includes central questions about the future of AI and relevant actors, understanding the effects of possible actions, and perhaps also topics like decision theory, acausal trade, digital minds, and anthropics.

Constructing a map is part of many different approaches to strategy. This roughly involves understanding the landscape and discovering analytically useful concepts, like reframing victory means causing AI systems to be aligned to it’s necessary and sufficient to cause the alignment tax to be paid, so it’s necessary and sufficient to reduce the alignment tax and increase the amount-of-tax-that-would-be-paid such that the latter is greater.

One exploratory, world-model-y goal is a high-level understanding of the strategic landscape. One possible approach to this goal is creating a map of relevant possible events, phenomena, actions, propositions, uncertainties, variables, and/or analytic nodes.

Nearcasting

Discussing nearcasting, see Holden Karnofsky’s AI strategy nearcasting [LW · GW] (2022). Illustrating nearcasting, see Karnofsky’s Nearcast-based “deployment problem” analysis [LW · GW] (2022).

Holden Karnofsky defines “AI strategy nearcasting” as

trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today’s. One (but not the only) version of this assumption would be “Transformative AI will be developed soon, using methods like what AI labs focus on today.”

When I think about AI strategy nearcasting, I ask:

What would a near future where powerful AI could be developed look like?
In this possible world, what goals should we have?
In this possible world, what important actions could relevant actors take?
- And what facts about the world make those actions possible? (For example, some actions would require that a lab has certain AI capabilities, or most people believe a certain thing about AI capabilities, or all major labs believe in AI risk.)
In this possible world, what interventions are available?
Relative to this possible world, how should we expect the real world to be different?¹¹

And how do those differences affect the goals we should have, and the interventions that are available to us?

Nearcasting seems to be a useful tool for

predicting relevant events concretely and
forcing you to notice how you think the world will be different in the future and how that matters.

Leverage

I’m not aware of other public writeups on leverage. See also Daniel Kokotajlo’s What considerations influence whether I have more influence over short or long timelines? [LW · GW] (2020). Related concept: crunch time [LW · GW].

When doing strategy and planning interventions, what should you focus on?

A major subquestion is: how should you prioritize focus between possible worlds?¹² Ideally you would prioritize working on the worlds that working on has highest expected value, or something like the worlds that have the greatest product of probability and how much better they would go if you worked on them. But how can you guess which worlds are high-leverage for you to work on? There are various reasons to prioritize certain possible worlds, both for reasoning about strategy and for evaluating possible interventions. For example, it seems higher-leverage to work on making AI go well conditional on human-level AI appearing in 2050 than in 3000: the former is more foreseeable, more affectable, and more neglected.

We currently lack a good account of leverage, so (going less meta) I’ll begin one for AI strategy here. Given a baseline of weighting possible worlds by their probability, all else equal, you should generally:

Upweight worlds that you have more control over and that you can better plan for
- Upweight worlds with short-ish timelines [? · GW] (since others will exert more influence over AI in long-timelines worlds, and since we have more clarity about the nearer future, and since we can revise strategies in long-timelines worlds)
- Take into account future strategy research
  - For example, if you focus on the world in 2030 (or assume that human-level AI is developed in 2030) you can be deferring, not neglecting, some work on 2040
  - For example, if you focus on worlds in which important events happen without much advance warning or clearsightedness, you can be deferring, not neglecting, some work on worlds in which important events happen foreseeably
- Focus on what you can better plan for and influence; for AI, perhaps this means:
  - Short timelines
  - The deep learning paradigm continues
  - Powerful AI is resource-intensive
  - Maybe some propositions about risk awareness, warning shots, and world-craziness
- Upweight worlds where the probability of victory is relatively close to 50%¹³
- Upweight more neglected worlds (think on the margin)
Upweight short-timelines worlds insofar as there is more non-AI existential risk in long-timelines worlds
Upweight analysis that better generalizes to or improves other worlds
Notice the possibility that you live in a simulation (if that is decision-relevant; unfortunately, the practical implications of living in a simulation are currently unclear)
Upweight worlds that you have better personal fit for analyzing
- Upweight worlds where you have more influence, if relevant
Consider side effects of doing strategy, including what you gain knowledge about, testing fit, and gaining credible signals of fit [EA · GW]

In practice, I tentatively think the biggest (analytically useful) considerations for weighting worlds beyond probability are generally:

Short timelines
1. More foreseeable¹⁴
2. More affectable
3. More neglected (by the AI strategy community)
  1. Future people can work on the further future
    1. The AI strategy field is likely to be bigger in the future
4. Less planning or influence exerted from outside the AI strategy community
Fast takeoff [? · GW]¹⁵
1. Shorter, less foreseeable a certain time in advance, and less salient to the world in advance
  1. More neglected by the AI strategy community; the community would have a longer clear-sighted period to work on slow takeoff
  2. Less planning or influence exerted from outside the AI strategy community

(But there are presumably diminishing returns to focusing on particular worlds, at least at the community level, so the community should diversify the worlds it analyzes.) And I’m most confused about

Upweighting worlds where probability of victory is closer to 50% (I’m confused about what the probability of victory is in various possible worlds),
How leverage relates to variables like total influence exerted to affect AI (the rest of the world exerting influence means that you have less relative influence insofar as you’re pulling the rope along similar axes, but some interventions are amplified by something like greater attention on AI) (and related variables like attention on AI and general craziness due to AI), and
The probability and implications of living in a simulation.

A background assumption or approximation in this section is that you allocate research toward a world and the research is effective just if that world obtains. This assumption is somewhat crude: the impact of most research isn’t so binary, being fully effective in some possible futures and totally ineffective in the rest.¹⁶ And thinking in terms of influence over a world is crude: influence depends on the person and on the intervention. Nevertheless, reasoning about leverage in terms of worlds to allocate research toward might sometimes be useful for prioritization. And we might discover a better account of leverage.

Leverage considerations should include not just prioritizing between possible worlds but also prioritizing within a world. For example, it seems high-leverage to focus on important actors’ blind spots and on certain important decisions or “crunchy” periods. And for AI strategy, it might be high-leverage to focus on the first few deployments of powerful AI systems.

Strategy work is complemented by

actually executing interventions, especially causing actors to make better decisions,
gaining resources to better execute interventions and improve strategy, and
field-building to better execute interventions and improve strategy.

An individual’s strategy work is complemented by informing the relevant community of their findings (e.g., for AI strategy, the AI strategy community).

In this post, I don’t try to make an ontology of AI strategy frames, or do comparative analysis of frames, or argue about the AI strategy community’s prioritization between frames.¹⁷ But these all seem like reasonable things for someone to do.

Related sources are linked above as relevant; see also Sam Clarke’s The longtermist AI governance landscape [EA · GW] (2022), Allan Dafoe’s AI Governance: Opportunity and Theory of Impact [EA · GW] (2020), and Matthijs Maas’s Strategic Perspectives on Long-term AI Governance [? · GW] (2022).

If I wrote a post on “Framing AI governance,” it would substantially overlap with this list, and it would substantially draw on The longtermist AI governance landscape [EA · GW]. See also Allan Dafoe’s AI Governance: A Research Agenda (2018) and hanadulset and Caroline Jeanmaire’s A Map to Navigate AI Governance [EA · GW] (2022). I don’t know whether an analogous “Framing technical AI safety” would make sense; if so, I would be excited about such a post.

Many thanks to Alex Gray. Thanks also to Linch Zhang for discussion of leverage and to Katja Grace, Eli Lifland, Rick Korzekwa, and Jeffrey Heninger for comments on a draft.

1 comments

Comments sorted by top scores.

comment by Zach Stein-Perlman · 2023-02-15T00:00:10.152Z · LW(p) · GW(p)

I expect to update this comment with additional sources—and perhaps new analytic frames—as I become aware of them and they become public. Last updated 23 May 2023.

Affordances:

[draft] Matthijs Maas's "Levers of governance" in Transformative AI Governance: A Literature Review
Eugene Bardach's Things Governments Do (affordances for states from a non-AI perspective) (thanks to Matthijs Maas for this suggestion)
Observation: you get different taxonomies if you start at goals (like "slow China") vs levers (like "immigration policy"). And your uncertainties are like "how can actor X achieve goal Y" vs "how can actor X leverage its ability Z." Maybe you think of different affordances; try both ways.
Alex Gray mentions as a motivating/illustrative example (my paraphrasing): windfall clauses (or equity-sharing or other benefit-sharing mechanisms) are unlikely to be created by labs but it's relatively easy for labs to take an existing windfall-clause affordance

Intermediate goals:

Rethink Priorities's Survey on intermediate goals in AI governance [EA · GW] (2023)
[draft] Matthijs Maas's "Parameters of Transformative AI Governance"

Theories of victory:

Matthijs Maas's Strategic Perspectives on Transformative AI Governance [EA · GW] (2022) and related work in progress

Memes & frames:

Maybe Holden Karnofsky's Spreading messages to help with the most important century (2023)

Leverage:

[draft] Alex Lintz's "A simple model for how to prioritize different timelines"
Vaniver's Weight by Impact (2023)

If I was rewriting this post today, I would probably discuss something like object-level frames or strategic perspectives. They make aspects of a situation more salient; whether or not they’re true, and whether or not they’re the kind-of-thing that can be true, they can be useful. See Matthijs Maas's Strategic Perspectives on Transformative AI Governance [EA · GW] for illustration.

Framing AI strategy

Contents

Make a plan

Affordances

Intermediate goals

Threat modeling

Theories of victory

Tactics and policy development

Memes & frames

Exploration, world-modeling, and forecasting

Nearcasting

Leverage

1 comments