The Governance Problem and the "Pretty Good" X-Risk

post by Zach Stein-Perlman · 2021-08-29T18:00:28.190Z · LW · GW · 2 comments


  I. Introduction
  II. Good Uses of Superintelligence
  III. Short-Term Issues
  IV. Unipolar Failure Modes
  V. Multipolar Failure Modes
  VI. Conclusion

No longer endorsed.

Like a child suddenly given awesome strength, I could have pushed too hard, and left the world a broken toy I could never repair. —The Hero of Ages

I hope we create superintelligence, and I hope it does what we want. But this would not suffice to make the future great. For a great future from superintelligence, a third condition is necessary: what we want is great. This post is about what the controllers of a superintelligence will want and how we can improve that.

Epistemic status: confident about the problem, not confident about specific failure modes or solutions.


I. Introduction

If it were available to them, I think current human elites (or almost any other group of humans) would overwhelmingly choose this bargain over the status quo:

Earthly Utopia. Human civilization will last on Earth for ten billion years. During that time, almost every person's life will be better than the best life ever lived before now. Earth will be a place of great learning and discovery, personal excellence and achievement, expansive freedom, extraordinary beauty, wonderful experiences, deep relationships, harmony with nature, meaningful projects, and profound joy. However, we may never use resources outside our solar system.

It may be that more intelligent, thoughtful, and wise versions of ourselves would take such a bargain. But more likely, I think, our enlightened counterparts would regard it as the worst mistake and the greatest catastrophe ever. If an astronomical number of wonderful planets is astronomically better than one, we have a problem:

Our decisionmakers and decisionmaking institutions would take a terrible bargain because they neglect nontraditional sources of value (viz., optimizing the rest of the universe) that happen to be overwhelmingly important (in expectation). This is not surprising — these same processes egregiously fail to appreciate the value or risk of AI. So if we solve alignment and create superintelligence, it will not just take care of the rest. It's not enough to have nice people control the superintelligence. To adapt from The Rocket Alignment Problem [LW · GW]:

Superintelligence may be developed by ill-intentioned people. But that's not the failure mode I'm worried about right now. I'm more worried that right now, even nice and generous potential-controllers-of-superintelligence would want it to do non-optimal things. Whether Google or the United States or North Korea is the one to develop superintelligence won't make a major difference to the probability of an existential win from my perspective, because right now potential controllers of superintelligence don't want what's optimal.

Aligned superintelligence is good to the extent that the operator wants good things. Longtermists/EAs/rationalists worry a lot about alignment, but little about aligning the eventual operator's instructions with what is best. I aim to investigate what could make us succeed or fail and explain why we should care a lot about this problem, which I call the governance problem since I expect governance issues to be vital.[1] If we survive and create powerful aligned AI, I believe it is almost certain that the future will be wonderful by provincial human standards. But most such scenarios are still existential catastrophes.


II. Good Uses of Superintelligence

Some uses of superintelligence would be near-optimal.[2] Call such uses "great," and call a possible future "great" if it involves great use of superintelligence. Under reasonable normative and empirical assumptions, whether a future is great depends overwhelmingly on how we use most of the matter available to us in the universe in the long run, not on the short-term future or the future of Earth.[3] I make such assumptions here, although some of my conclusions hold without them.

A final use of superintelligence would be great if and only if we near-perfectly:

Fortunately, we need not make a final decision—one that closes off most possible futures—immediately after creating superintelligence. The immediate problem of what to do with superintelligence is not how to optimize the universe but how to decide how to optimize the universe. A use of superintelligence is great if we:

Some prima facie great systems for choosing how to use superintelligence are:

But I won't speculate here on the details of great systems. Instead, I'll consider what affects the system that we will end up with.


III. Short-Term Issues

We will presumably be able to choose what precisely to do with aligned superintelligence after we create it (rather than the use being determined before or as we create the superintelligence). But what happens before superintelligence affects what we ultimately choose. Our short-term actions affect:

For example, a poorly-designed "AI constitution," whether hard (truly binding) or soft (with political and psychological power), would be bad. A well-designed one would be good.

I believe it is very likely that the controller of superintelligence will be a state, a group of states, or a new international organization. In particular, I expect that states will appreciate AI before we create superintelligence and will nationalize or oversee promising projects.

The organizations and ideas with power just before superintelligence will determine how we use it. I expect that at the beginning of a fast-takeoff intelligence explosion, how we use superintelligence will be predictable; it will appear very likely or very unlikely that we will use it well. So while the failures I will discuss manifest after superintelligence, and while the period during and directly after the intelligence explosion may be important, I think whether we succeed on governance will be mostly determined before the intelligence explosion.[6]


IV. Unipolar Failure Modes

A world is "unipolar" if it is dominated by a single agent (such as an organization, human, or AI), called a singleton. A unipolar world order could arise if multiple powerful organizations (presumably states) unite. More plausibly, an organization could form a singleton if it becomes sufficiently powerful relative to others, such as by creating powerful aligned AI as the result of an intelligence explosion.

Suppose a single aligned superintelligence is much more powerful than the rest of the world, and its controller is a singleton, making decisions for the whole world. What does the controller do with this power? It depends on the controller's values and decisionmaking structure — we should expect different choices depending on the nature of the controller (individual, state, coalition of states, international organization), its decisionmaking structure, and popular ideas about AI and what to do with it. Brainstorming potential failure modes, examples or subcases indented (in no particular order):

But attempting to separate these possibilities may be analytically counterproductive by concealing the large-scale reason for concern. Few people want what's optimal, so political forces just don't push in that direction. Most people, interest groups, and policymakers will just have prosaic goals. Political institutions like ours would likely struggle to achieve anything meaningful: it is much easier to imagine decisive political support for something prosaic than for a plan for using our cosmic endowment. A radical (by our standards) plan for how to use our cosmic endowment—which is presumably necessary for a great future[7]—is not politically feasible. Instead, we may end up with a "pretty good" future: one excellent along prosaic dimensions, like Earthly Utopia, but not great. Extremely democratic uses of AI are prima facie similarly problematic—most people don't want what's best—and also have coordination issues.

I think superintelligence will likely be governed by something like our current political institutions and these institutions may fare poorly. We should not trust the long-term future to decisionmaking systems based in current humans' preferences; we prima facie should delegate this choice to another system. Any candidate system must be both politically feasible and likely to choose well. That is, I think we need something like long reflection to use our cosmic endowment well.[8] If ideas like long reflection sound radical or just unreasonable immediately before the intelligence explosion, our future is in trouble.

"In the very near future, we are going to lift something to Heaven."[9] And even if it's aligned with what we want, it might not be aligned with what's good. So what can we do? Some desiderata:


V. Multipolar Failure Modes

Despite their titles, this section does not complement section IV. That section was about the risk that an agent can do whatever it wants and it fails to choose well. This section is about additional ways we could fail if nobody has such power after the intelligence explosion.

I am generally more pessimistic about multipolar scenarios (although it really depends on the specifics): roughly, multipolar scenarios include the risk of unipolar failure for each powerful agent. If an omnipotent organization can fail, two semi-omnipotent organizations can each fail in the same way. But multipolar scenarios have their own special failure modes as well:


VI. Conclusion

The governance problem is the practical problem of getting the controller of superintelligence to use it near-optimally.

Here are some propositions I believe:

If we create aligned superintelligence, how we use it will involve political institutions and processes. Superintelligence will probably be controlled by a state or a group of states. This is more likely the more AI becomes popularly appreciated and the more legibly powerful AI is created before the intelligence explosion.

Aligned superintelligence enables directing the arbitrarily distant future. Consider what an intelligent but not omniscient observer would predict about the future of Earth and Earth-originating systems. Throughout human history, events almost always have had negligible effects on the observer's credences. In the last century, some events had non-negligible effects on the credences through their effects on extinction risk. But because of the possibility of superintelligence, it may soon become possible to lock in narrow classes of possible futures. This could happen intentionally (a singleton could optimize for any preferences) or unintentionally (if we create unaligned powerful AI or fail to coordinate to use aligned powerful AI well).

Accidental governance failure is possible. We could create aligned superintelligence but still end up with an outcome that nobody wants.

"Pretty good" governance failure is possible. We could end up with an outcome that many or most influential people want, but that wiser versions of ourselves would strongly disapprove of. This scenario is plausibly the default outcome of aligned superintelligence: great uses of power are a tiny subset of the possible uses of power, the people/institutions that currently want great outcomes constitute a tiny share of total influence, and neither will those who want non-great outcomes be persuaded nor will those who want great outcomes acquire influence much without us working to increase it.

The governance problem depends (in largely predictable ways) on various factors that we can affect before TAI. These include:

To improve our chances of achieving successful governance, we should think about what affects how superintelligence is used and how we can affect those factors, then do it.


Thanks to Daniel Kokotajlo for suggestions.

  1. I am not aware of an existing name for the important problem of getting a superintelligence that does what its operator wants to do what is best. This problem roughly requires wisdom and caution to avoiding locking in object-level values prematurely and coordination among people with influence over using superintelligence.

    Nick Bostrom defined the "political problem," complementing the control problem, as "how to achieve a situation in which individuals or institutions empowered by such AI use it in ways that promote the common good." To the extent that value is binary [LW(p) · GW(p)], it matters less whether AI promotes the common good on net and more whether AI does astronomical good. To the extent that superintelligence (not previous AI) is all that matters after superintelligence exists, it only matters how we use the superintelligence. I assume Bostrom used this less carving-at-the-joints-y definition for simplicity and to decrease inferential distance for people outside the community; I'm pretty sure that my "governance problem" is closer to how we should be thinking about the problem of using AI well.

    Will MacAskill once called some related issues the "second-level alignment problem," but it's not clear what exactly he meant.

    Note that, roughly, P(win) = P(aligned powerful AI) * P(great use) = P(survive until powerful AI) * P(powerful AI is aligned) * P(great use). This suggests a decomposition of the problem of achieving an existential win into three subproblems: the survival problem, the alignment problem, and the governance problem. ↩︎

  2. That is, some uses of superintelligence would have near-optimal expected value, where optimal expected value is roughly what we would achieve if we were thoughtful, wise, coordinated, and successful, by our standards. ↩︎

  3. Acausal trade is a conceivable source of value that does not necessarily require our colonizing the universe well. But it is prima facie even more politically challenging. Regardless, the prospect of it and other speculative, potentially radically effective strategies gives us additional reason to increase our collective ability to do unintuitive things with superintelligence. ↩︎

  4. These look similar in practice — rather than just telling the superintelligence to optimize for X, we'd probably have it tell us what optimizing for X would look like first, so we're effectively hearing the object-level way to optimize for X and then telling it to pursue that path. ↩︎

  5. Similarly to note 4, level number isn't really meaningful; it just matters that there's a chain of delegation that ends in something great. ↩︎

  6. More uncertain scenarios would occur if (1) the controller of superintelligence does not make decisions in a predictable way (e.g., it's a group of states with different goals, or it's an international organization without a clear mandate for using superintelligence) or (2) there is a multipolar outcome of some sort — e.g., if there is slow takeoff (in particular, no threshold-y behavior) or superintelligence is not able to form a singleton. ↩︎

  7. Since almost all of the resources eventually available to us involve colonizing the universe and it's prima facie unlikely that what sounds normal to current humans is optimal. ↩︎

  8. I am sympathetic to long reflection but will not defend it here. I merely use it as a prima facie example of a system that could have the two necessary properties for successful governance: acceptability and great decisionmaking. ↩︎

  9. Scott Alexander's Meditations on Moloch. While superintelligence could kill Moloch dead, Moloch might choose how we use it. That would be ironic. ↩︎


Comments sorted by top scores.

comment by Mitchell_Porter · 2021-08-30T05:45:57.108Z · LW(p) · GW(p)

Are you familiar with CEV

Replies from: Zach Stein-Perlman
comment by Zach Stein-Perlman · 2021-08-30T11:45:47.493Z · LW(p) · GW(p)

Yes, I definitely consider (successful, philosophically sound) CEV to be a great use of superintelligence. An earlier draft mentioned CEV explicitly, but I decided to just mention the broader category "indirect normativity," which should include any sound method for specifying values indirectly.