Transcript of a presentation on catastrophic risks from AI

post by RobertM (T3t) · 2023-05-05T01:38:17.948Z · LW · GW · 0 comments

Contents

  Assumptions and Content Warning
  Preface
  What is intelligence?
  Instrumental Convergence
  Orthogonality Thesis
  Takeover
  The Builders are Worried
  Open Research Questions
  Further Reading
None
No comments

This is an approximate outline/transcript of a presentation I'll be giving for a class of COMP 680, "Advanced Topics in Software Engineering", with footnotes and links to relevant source material.

Assumptions and Content Warning

This presentation assumes a couple things.  The first is that something like materialism/reductionism (non-dualism) is true, particularly with respect to intelligence - that intelligence is the product of deterministic phenomena which we are, with some success, reproducing.  The second is that humans are not the upper bound of possible intelligence.

This is also a content warning that this presentation includes discussion of human extinction.

If you don't want to sit through a presentation with those assumptions, or given that content warning, feel free to sign off for the next 15 minutes.

Preface

There are many real issues with current AI systems, which are not the subject of this presentation:

This is about the unfortunate likelihood that, if we create sufficiently intelligent AI using anything resembling the current paradigm, then we will all die.

Why? I'll give you a more detailed breakdown, but let's sketch out some basics first.

What is intelligence?

One useful way to think about intelligence is that it's what lets us imagine that we'd like the future to be a certain way, and then - intelligently - plan and take actions to cause that to happen.  The default state of nature is entropy.  The reason that we have nice things is because we, intelligent agents optimizing for specific goals, can reliably cause things to happen in the external world by understanding it and using that understanding to manipulate it into a state we like more.

We know that humans are not anywhere near the frontier of intelligence.  We have many reasons to believe this, both theoretical and empirical:

Instrumental Convergence

Terminal goals are those things that you value for their own sake.  Some examples in humans: aesthetic experiences, success in overcoming challenges, the good regard of friends and family.

Instrumental goals are those things that you value for the sake of achieving other things (such as other instrumental or terminal goals).  A common human example: making money.  The case of human goals is complicated by the fact that many goals are dual-purpose - you might value the good regard of your friends not just for its own sake, but also for the sake of future benefits that might accrue to you as a result.  (How much less would you value the fact that your friends think well of you, if you never expected that to make any other observable difference in your life?)

The theory of instrumental convergence says that sufficiently intelligent agents will converge to a relatively small set of instrumental goals, because those goals are useful for a much broader set of terminal goals. 

A narrow example of this has been demonstrated, both formally and experimentally, by Turner in Parametrically retargetable decision-makers tend to seek power [LW · GW].

Convergent instrumental goals include power seeking, resource acquisition, goal preservation, and avoiding shut-off.

Orthogonality Thesis

Arbitrarily intelligent agents can have arbitrary goals.

Most goals are not compatible with human flourishing; even goals that seem "very close" to ideal when described informally will miss enormous amounts of relevant detail. (Intuition pump: most possible configurations of atoms do not satisfy human values very well.) Human values are a very complicated manifold in a high-dimensional space.  We, ourselves, have not yet figured out how to safely optimize for own values yet - all ethical systems to date have degenerate outcomes (i.e. the repugnant conclusion with total utilitarianism).

We don't currently know how to use existing ML architectures to create models with any specific desired goal, let alone avert future problems that might crop up if we do solve that problem (such as ontological shifts).

The reason that humans care about other humans is because of evolutionary pressures that favored cooperative strategies, and "actually caring" turned out to outcompete other methods of achieving cooperation. Needless to say, this is not how AIs are trained.

Takeover

A sufficiently intelligent agent would have little difficulty in taking over.  No canonical source, but:

The Builders are Worried

A bunch of very smart people are trying very hard to build AGI.  As the amount of available compute grows, the cleverness required to do so drops.  Those people claim to think that this is a real risk, but are not taking it as seriously as I would expect.

  1. Sam Altman:
    1. "Development of superhuman machine intelligence (SMI) is probably the greatest threat to the continued existence of humanity"[1] (2015)
    2. "AI will probably most likely lead to the end of the world, but in the meantime, there'll be great companies."[2] (2015)
    3. "Some people in the AI field think the risks of AGI (and successor systems) are fictitious; we would be delighted if they turn out to be right, but we are going to operate as if these risks are existential."[3] (2023)
  2. Shane Legg:
    1. "Eventually, I think human extinction will probably occur, and technology will likely play a part in this.  But there's a big difference between this being within a year of something like human level AI, and within a million years. As for the former meaning...I don't know.  Maybe 5%, maybe 50%. I don't think anybody has a good estimate of this."[4]  (2011, responding to the question "What probability do you assign to the possibility of negative/extremely negative consequences as a result of badly done AI?")
    2. "It's my number 1 risk for this century, with an engineered biological pathogen coming a close second (though I know little about the latter)." (same interview as above, responding to "Do possible risks from AI outweigh other possible existential risks, e.g. risks associated with the possibility of advanced nanotechnology?")
  3. Dario Amodei:
    1. "I think at the extreme end is the Nick Bostrom style of fear that an AGI could destroy humanity. I can’t see any reason and principle why that couldn’t happen."[5] (2017)
  4. Other AI luminaries who are not attempting to build AGI, such as Stuart Russell and Geoffrey Hinton, also think it's a real risk.
    1. Geoffrey Hinton:
      1. "It's somewhere between, um, naught percent and 100 percent. I mean, I think, I think it's not inconceivable."[6] (2023, responding to “...what do you think the chances are of AI just wiping out humanity? Can we put a number on that?”)
      2. “Well, here’s a subgoal that almost always helps in biology: get more energy. So the first thing that could happen is these robots are going to say, ‘Let’s get more power. Let’s reroute all the electricity to my chips.’ Another great subgoal would be to make more copies of yourself. Does that sound good?”[7] (2023)
      3. Quit his job at Google 3 days ago to talk about AI safety “without having to worry about how it interacts with Google’s business”[8] (2023).

Open Research Questions

  1. Deep reinforcement learning agents will not come to intrinsically and primarily value their reward signal; reward is not the trained agent’s optimization target.
  2. Utility functions express the relative goodness of outcomes. Reward is not best understood as being a kind of utility function. Reward has the mechanistic effect of chiseling cognition into the agent's network. Therefore, properly understood, reward does not express relative goodness and is therefore not an optimization target at all.

Further Reading

  1. ^

    https://archive.is/o/8GkKl/blog.samaltman.com/machine-intelligence-part-1

  2. ^

    https://archive.is/u3pjs

  3. ^

    https://openai.com/blog/planning-for-agi-and-beyond

  4. ^

    https://www.lesswrong.com/posts/No5JpRCHzBrWA4jmS/q-and-a-with-shane-legg-on-risks-from-ai

  5. ^

    https://80000hours.org/podcast/episodes/the-world-needs-ai-researchers-heres-how-to-become-one/#transcript

  6. ^

    https://www.youtube.com/watch?v=qpoRO378qRY

  7. ^

    https://www.technologyreview.com/2023/05/02/1072528/geoffrey-hinton-google-why-scared-ai/

  8. ^

    https://www.technologyreview.com/2023/05/02/1072528/geoffrey-hinton-google-why-scared-ai/

  9. ^

    https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target

0 comments

Comments sorted by top scores.