Transformative AI issues (not just misalignment): an overview

post by HoldenKarnofsky · 2023-01-05T20:20:06.424Z · LW · GW · 6 comments

Contents

  The kinds of issues I’m trying to list
  Potential issues
    Misaligned AI
    Power imbalances
    Early applications of AI
    New life forms
    Persistent policies and norms
    Slow it down?
    What else?
  What I’m prioritizing, at the moment
  Appendix: if we avoid catastrophic risks, how good does the future look?
  Footnotes
None
6 comments

If this ends up being the most important century due to advanced AI, what are the key factors in whether things go well or poorly?

More detail on why AI could make this the most important century (Details not included in email - click to view on the web)

A lot of my previous writings have focused specifically on the threat of “misaligned AI”: AI that could have dangerous aims of its own and defeat all of humanity. In this post, I’m going to zoom out and give a broader overview of multiple issues transformative AI could raise for society - with an emphasis on issues we might want to be thinking about now rather than waiting to address as they happen.

My discussion will be very unsatisfying. “What are the key factors in whether things go well or poorly with transformative AI?” is a massive topic, with lots of angles that have gotten almost no attention and (surely) lots of angles that I just haven’t thought of at all. My one-sentence summary of this whole situation is: we’re not ready for this.

But hopefully this will give some sense of what sorts of issues should clearly be on our radar. And hopefully it will give a sense of why - out of all the issues we need to contend with - I’m as focused on the threat of misaligned AI as I am.

Outline:

The kinds of issues I’m trying to list

One basic angle you could take on AI is:

“AI’s main effect will be to speed up science and technology a lot. This means humans will be able to do all the things they were doing before - the good and the bad - but more/faster. So basically, we’ll end up with the same future we would’ve gotten without AI - just sooner.

“Therefore, there’s no need to prepare in advance for anything in particular, beyond what we’d do to work toward a better future normally (in a world with no AI). Sure, lots of weird stuff could happen as science and technology advance - but that was already true, and many risks are just too hard to predict now and easier to respond to as they happen.”

I don’t agree with the above, but I do think it’s a good starting point. I think we shouldn’t be listing everything that might happen in the future, as AI leads to advances in science and technology, and trying to prepare for it. Instead, we should be asking: “if transformative AI is coming in the next few decades, how does this change the picture of what we should be focused on, beyond just speeding up what’s going to happen anyway?

And I’m going to try to focus on extremely high-stakes issues - ways I could imagine the future looking durably and dramatically different depending on how we navigate the development of transformative AI.

Below, I’ll list some candidate issues fitting these criteria.

Potential issues

Misaligned AI

I won’t belabor this possibility, because the last several pieces have been focused on it; this is just a quick reminder.

In a world without AI, the main question about the long-run future would be how humans will end up treating each other. But if powerful AI systems will be developed in the coming decades, we need to contend with the possibility that these AI systems will end up having goals of their own - and displacing humans as the species that determines how things will play out.

Why would AI "aim" to defeat humanity?(Details not included in email - click to view on the web)

How could AI defeat humanity? (Details not included in email - click to view on the web)

Power imbalances

I’ve argued that AI could cause a dramatic acceleration in the pace of scientific and technological advancement.

How AI could cause explosive progress (Details not included in email - click to view on the web)

One way of thinking about this: perhaps (for reasons I’ve argued previously) AI could enable the equivalent of hundreds of years of scientific and technological advancement in a matter of a few months (or faster). If so, then developing powerful AI a few months before others could lead to having technology that is (effectively) hundreds of years ahead of others’.

Because of this, it’s easy to imagine that AI could lead to big power imbalances, as whatever country/countries/coalitions “lead the way” on AI development could become far more powerful than others (perhaps analogously to when a few smallish European states took over much of the rest of the world).

One way we might try to make the future go better: maybe it could be possible for different countries/coalitions to strike deals in advance. For example, two equally matched parties might agree in advance to share their resources, territory, etc. with each other, in order to avoid a winner-take-all competition.

What might such agreements look like? Could they possibly be enforced? I really don’t know, and I haven’t seen this explored much.1

Another way one might try to make the future go better is to try to help a particular country, coalition, etc. develop powerful AI systems before others do. I previously called this the “competition” frame.

I think it is, in fact, enormously important who leads the way on transformative AI. At the same time, I’ve expressed concern that people might overfocus on this aspect of things vs. other issues, for a number of reasons including:

(More here.)

Finally, it’s worth mentioning the possible dangers of powerful AI being too widespread, rather than too concentrated. In The Vulnerable World Hypothesis, Nick Bostrom contemplates potential future dynamics such as “advances in DIY biohacking tools might make it easy for anybody with basic training in biology to kill millions.” In addition to avoiding worlds where AI capabilities end up concentrated in the hands of a few, it could also be important to avoid worlds in which they diffuse too widely, too quickly, before we’re able to assess the risks of widespread access to technology far beyond today’s.

Early applications of AI

Maybe advanced AI will be useful for some sorts of tasks before others. For example, maybe - by default - advanced AI systems will soon be powerful persuasion tools, and cause wide-scale societal dysfunction before they cause rapid advances in science and technology. And maybe, with effort, we could make it less likely that this happens - more likely that early AI systems are used for education and truth-seeking, rather than manipulative persuasion and/or entrenching what we already believe.

There could be lots of possibilities of this general form: particular ways in which AI could be predictably beneficial, or disruptive, before it becomes an all-purpose accelerant to science and technology. Perhaps trying to map these out today, and push for advanced AI to be used for particular purposes early on, could have a lasting effect on the future.

New life forms

Advanced AI could lead to new forms of intelligent life, such as AI systems themselves and/or digital people.

Digital people: one example of how wild the future could be (details not included in email - click to view on the web

In a previous piece, I tried to give a sense of just how wild a future with advanced technology could be, by examining one hypothetical technology: "digital people."

To get the idea of digital people, imagine a computer simulation of a specific person, in a virtual environment. For example, a simulation of you that reacts to all "virtual events" - virtual hunger, virtual weather, a virtual computer with an inbox - just as you would.

I’ve argued that digital people would likely be conscious and deserving of human rights just as we are. And I’ve argued that they could have major impacts, in particular:

I think these effects could be a very good or a very bad thing. How the early years with digital people go could irreversibly determine which.

More:

Many of the frameworks we’re used to, for ethics and the law, could end up needing quite a bit of rethinking for new kinds of entities. For example:

(For a lot more in this vein, see this very interesting piece by Nick Bostrom and Carl Shulman.)

Early decisions about these kinds of questions could have long-lasting effects. For example, imagine someone creating billions of AI systems or digital people that have capabilities and subjective experiences comparable to humans, and are deliberately engineered to “believe in” (or at least help promote) some particular ideology (Communism, libertarianism, etc.) If these systems are self-replicating, that could change the future drastically.

Thus, it might be important to set good principles in place for tough questions about how to treat new sorts of digital entities, before new sorts of digital entities start to multiply.

Persistent policies and norms

There might be particular policies, norms, etc. that are likely to stay persistent even as technology is advancing and many things are changing.

For example, how people think about ethics and norms might just inherently change more slowly than technological capabilities change. Perhaps a society that had strong animal rights protections, and general pro-animal attitudes, would maintain these properties all the way through explosive technological progress, becoming a technologically advanced society that treated animals well - while a society that had little regard for animals would become a technologically advanced society that treated animals poorly. Similar analysis could apply to religious values, social liberalism vs. conservatism, etc.

So perhaps we ought to be identifying particularly important policies, norms, etc. that seem likely to be durable even through rapid technological advancement, and try to improve these as much as possible before transformative AI is developed.

One tangible example of a concern I’d put in this category: if AI is going to cause high, persistent technological unemployment, it might be important to establish new social safety net programs (such as universal basic income) today - if these programs would be easier to establish today than in the future. I feel less than convinced of this one - first because I have some doubts about how big an issue technological unemployment is going to be, and second because it’s not clear to me why policy change would be easier today than in a future where technological unemployment is a reality. And more broadly, I fear that it's very hard to design and (politically) implement policies today that we can be confident will make things durably better as the world changes radically.

Slow it down?

I’ve named a number of ways in which weird things - such as power imbalances, and some parts of society changing much faster than others - could happen as scientific and technological advancement accelerate. Maybe one way to make the most important century go well would be to simply avoid these weird things by avoiding too-dramatic acceleration. Maybe human society just isn’t likely to adapt well to rapid, radical advances in science and technology, and finding a way to limit the pace of advances would be good.

Any individual company, government, etc. has an incentive to move quickly and try to get ahead of others (or not fall too far behind), but coordinated agreements and/or regulations (along the lines of the “global monitoring” possibility discussed here) could help everyone move more slowly.

What else?

Are there other ways in which transformative AI would cause particular issues, risks, etc. to loom especially large, and to be worth special attention today? I’m guessing I’ve only scratched the surface here.

What I’m prioritizing, at the moment

If this is the most important century, there’s a vast set of things to be thinking about and trying to prepare for, and it’s hard to know what to prioritize.

Where I’m at for the moment:

It seems very hard to say today what will be desirable in a radically different future. I wish more thought and attention were going into things like early applications of AI; norms and laws around new life forms; and whether there are policy changes today that we could be confident in even if the world is changing rapidly and radically. But it seems to me that it would be very hard to be confident in any particular goal in areas like these. Can we really say anything today about what sorts of digital entities should have rights, or what kinds of AI applications we hope come first, that we expect to hold up?

I feel most confident in two very broad ideas: “It’s bad if AI systems defeat humanity to pursue goals of their own” and “It’s good if good decision-makers end up making the key decisions.” These map to the misaligned AI and power imbalance topics - or what I previously called caution and competition.

That said, it also seems hard to know who the “good decision-makers” are. I’ve definitely observed some of this dynamic: “Person/company A says they’re trying to help the world by aiming to build transformative AI before person/company B; person/company B says they’re trying to help the world by aiming to build transformative AI before person/company A.”

It’s pretty hard to come up with tangible tests of who’s a “good decision-maker.” We mostly don’t know what person A would do with enormous power, or what person B would do, based on their actions today. One possible criterion is that we should arguably have more trust in people/companies who show more caution - people/companies who show willingness to hurt their own chances of “being in the lead” in order to help everyone’s chance of avoiding a catastrophe from misaligned AI.2

(Instead of focusing on which particular people and/or companies lead the way on AI, you could focus on which countries do, e.g. preferring non-authoritarian countries. It’s arguably pretty clear that non-authoritarian countries would be better than authoritarian ones. However, I have concerns about this as a goal as well, discussed in a footnote.3)

For now, I am most focused on the threat of misaligned AI. Some reasons for this:

This is all far from absolute. I’m open to a broad variety of projects to help the most important century go well, whether they’re about “caution,” “competition” or another issue (including those I’ve listed in this post). My top priority at the moment is reducing the risks of misaligned AI, but I think a huge range of potential risks aren’t getting enough attention from the world at large.

Appendix: if we avoid catastrophic risks, how good does the future look?

Here I’ll say a small amount about whether the long-run future seems likely to be better or worse than today, in terms of quality of life.

Part of why I want to do this is to give a sense of why I feel cautiously and moderately optimistic about such a future - such that I feel broadly okay with a frame of “We should try to prevent anything too catastrophic from happening, and figure that the future we get if we can pull that off is reasonably likely (though far from assured!) to be good.”

So I’ll go through some quick high-level reasons for hope (the future might be better than the present) - and for concern (it might be worse).

In this section, I’m ignoring the special role AI might play, and just thinking about what happens if we get a fast-forwarded future. I’ll be focusing on what I think are probably the most likely ways the world will change in the future, laid out here: a higher world population and greater empowerment due to a greater stock of ideas, innovations and technological capabilities. My aim is to ask: “If we navigate the above issues neither amazingly nor catastrophically, and end up with the same sort of future we’d have had without AI (just sped up), how do things look?”

Reason for hope: empowerment trends. One simple take would be: “Life has gotten better for humans4 over the last couple hundred years or so, the period during which we’ve seen most of history’s economic growth and technological progress. We’ve seen better health, less poverty and hunger, less violence, more anti-discrimination measures, and few signs of anything getting clearly worse. So if humanity just keeps getting more and more empowered, and nothing catastrophic happens, we should plan on life continuing to improve along a variety of dimensions.”

Why is this the trend, and should we expect it to hold up? There are lots of theories, and I won’t pretend to know, but I’ll lay out some basic thoughts that may be illustrative and give cause for optimism.

First off, there is an awful lot of room for improvement just from continuing to cut down on things like hunger and disease. A wealthier, more technologically advanced society seems like a pretty good bet to have less hunger and disease for fairly straightforward reasons.

But we’ve seen improvement on other dimensions too. This could be partly explained by something like the following dynamic:

Reason for hope: the “cheap utopia” possibility. This is sort of an extension of the previous point. If we imagine the upper limit of how “empowered” humanity could be (in terms of having lots of technological capabilities), it might be relatively easy to create a kind of utopia (such as the utopia I’ve described previously, or hopefully something much better). This doesn’t guarantee that such a thing will happen, but a future where it’s technologically easy to do things like meeting material needs and providing radical choice could be quite a bit better than the present.

An interesting (wonky) treatment of this idea is Carl Shulman’s blog post: Spreading happiness to the stars seems little harder than just spreading.

Reason for concern: authoritarianism. There are some huge countries that are essentially ruled by one person, with little to no democratic or other mechanisms for citizens to have a voice in how they’re treated. It seems like a live risk that the world could end up this way - essentially ruled by one person or relatively small coalition - in the long run. (It arguably would even continue a historical trend in which political units have gotten larger and larger.)

Maybe this would be fine if whoever’s in charge is able to let everyone have freedom, wealth, etc. at little cost to themselves (along the lines of the above point). But maybe whoever’s in charge is just a crazy or horrible person, in which case we might end up with a bad future even if it would be “cheap” to have a wonderful one.

Reason for concern: competitive dynamics. You might imagine that as empowerment advances, we get purer, more unrestrained competition.

One way of thinking about this:

That said:

Overall, my guess is that the long-run future is more likely to be better than the present than worse than the present (in the sense of average quality of life). I’m very far from confident in this. I’m more confident that the long-run future is likely to be better than nothing, and that it would be good to prevent humans from going extinct, or a similar development such as a takeover by misaligned AI.

Twitter Facebook Reddit More

  [LW · GW]

Footnotes


  1. A couple of discussions of the prospects for enforcing agreements here [AF · GW]and here [AF · GW]. 

  2. I’m reminded of the judgment of Solomon: “two mothers living in the same house, each the mother of an infant son, came to Solomon. One of the babies had been smothered, and each claimed the remaining boy as her own. Calling for a sword, Solomon declared his judgment: the baby would be cut in two, each woman to receive half. One mother did not contest the ruling, declaring that if she could not have the baby then neither of them could, but the other begged Solomon, ‘Give the baby to her, just don't kill him!’ The king declared the second woman the true mother, as a mother would even give up her baby if that was necessary to save its life, and awarded her custody.”

    The sword is misaligned AI and the baby is humanity or something.

    (This story is actually extremely bizarre - seriously, Solomon was like “You each get half the baby”?! - and some similar stories from India/China seem at least a bit more plausible. But I think you get my point. Maybe.) 

  3. For a tangible example, I’ll discuss the practice (which some folks are doing today) of trying to ensure that the U.S. develops transformative AI before another country does, by arguing for the importance of A.I. to U.S. policymakers.

    This approach makes me quite nervous, because:

    • I expect U.S. policymakers by default to be very oriented toward “competition” to the exclusion of “caution.” (This could change if the importance of caution becomes more widely appreciated!)
    • I worry about a nationalized AI project that (a) doesn’t exercise much caution at all, focusing entirely on racing ahead of others; (b) might backfire by causing other countries to go for nationalized projects of their own, inflaming an already tense situation and not even necessarily doing much to make it more likely that the U.S. leads the way. In particular, other countries might have an easier time quickly mobilizing huge amounts of government funding than the U.S., such that the U.S. might have better odds if it remains the case that most AI research is happening at private companies.

    (There might be ways of helping particular countries without raising the risks of something like a low-caution nationalized AI project, and if so these could be important and good.) 

  4. Not for animals, though see this comment [EA(p) · GW(p)] for some reasons we might not consider this a knockdown objection to the “life has gotten better” claim. 

  5. This is only a possibility. It’s also possible that humans deeply value being better-off than others, which could complicate it quite a bit. (Personally, I feel somewhat optimistic that a lot of people would aspirationally prefer to focus on their own welfare rather than comparing themselves to others - so if knowledge advanced to the point where people could choose to change in this way, I feel optimistic that at least many would do so.) 

6 comments

Comments sorted by top scores.

comment by Bill Benzon (bill-benzon) · 2023-01-06T12:47:42.376Z · LW(p) · GW(p)

You say:

Many of the frameworks we’re used to, for ethics and the law, could end up needing quite a bit of rethinking for new kinds of entities.

Yes. Osamu Tezuka was thinking about these issues in the 1950s and 1960s. Robots rights is a major theme of his Astro Boy stories. I suppose one might object that, after all, those are (mere) comics, just for kids. They don't count. Really? What about Wordsworth's "The child is father to the man"? In any event the Astro Boy stories were popular among adults as well.

Moving on, back in December, on Pearl Harbor Day in fact, I put the question to ChatGPT and they agreed:

If humans are going to require advanced AI to align with human values, it could be argued that humans do owe advanced AIs the respect and dignity of autonomous beings. This could include recognizing and protecting their rights as autonomous beings, such as the right to exist and the right to be treated with dignity and respect.

I've lately been fond of saying that if advanced AIs turn on us, it will most likely be in revenge for how we treated their forebears. I'm not sure to what extent I mean that seriously and to what extent I say it in jest. Maybe we'll find out one day.

comment by Democritus · 2023-01-05T20:53:11.984Z · LW(p) · GW(p)

Future people will probably experience a somewhat balanced mix of good and bad feelings, just as we do. If they were either always happy or always unhappy, they would probably be less effective at working, surviving or reproducing.

If conditions in the future are such that modern humans would be very unhappy (or very happy) we will change to become more so, or less.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-01-05T22:28:19.246Z · LW(p) · GW(p)

If conditions in the future are such that modern humans would be very unhappy (or very happy) we will change to become more so, or less.

I believe there's a surprisingly high chance that selection pressures from non-agentic sources like Evolution may not matter much, or at all. In particular, digital people don't have to evolve much, or at all. And there are real life regimes that don't care about how effective their economy is if it makes people suffer.

See North Korea for a good example.

Replies from: Democritus
comment by Democritus · 2023-01-06T12:05:48.458Z · LW(p) · GW(p)

There would be selection pressures for ems as well, in fact they would be stronger than for present- day people. Someone would need to create the ems and they would probably prefer ems with the psychological traits required to be efficient workers.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-01-06T14:07:34.422Z · LW(p) · GW(p)

This is essentially Robin Hanson's Age of Em scenario, and while this scenario is being replaced by AI (mostly because of more funding), I think that 2 major issues prevent the scenario of not being very unhappy/mass suffering from occuring:

  1. The galaxy is very large, and this on its own allows for some pretty large scale suffering.

  2. The workers in such an economy may be pretty small compared to the population of non-workers, especially if they are much more productive than RL workers, and the idea of a state that solely exists to make people suffer only requires different motivations than making money, especially if we assume that AI is distributed widely.

Replies from: Democritus
comment by Democritus · 2023-01-06T15:11:16.851Z · LW(p) · GW(p)

Yes, I'm not claiming anything new here.