Posts

CapResearcher's Shortform 2025-02-06T19:08:00.358Z

Comments

Comment by CapResearcher on CapResearcher's Shortform · 2025-02-06T20:18:29.450Z · LW · GW

I don't know how to avoid ASI killing us. However, when I try to imagine worlds in which humanity isn't immediately destroyed by ASI, humanity's success can often be traced back to some bottleneck in the ASI's capabilities.

For example, Eliezer's list of lethalities point 35 argues that "Schemes for playing "different" AIs off against each other stop working if those AIs advance to the point of being able to coordinate via reasoning about (probability distributions over) each others' code." because "Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you're playing them against each other." Note that he says "probably" (boldface mine).

In a world there humanity wasn't immediately destroyed by ASI, I find it plausible (let's say 10%) that something like Arrow's impossibility theorem exists for coordination. And that we were able to exploit that to successfully pit different AIs against each other.

Of course you may argue that "10% of worlds not immediately destroyed by ASI" is a tiny slice of probability space. And that even in those worlds, the ability to pit AIs against each other is not sufficient. And you may disagree that the scenario is plausible. However, I hope I explained why I believe the idea of exploiting ASI limitations is a step in the right direction.

Comment by CapResearcher on CapResearcher's Shortform · 2025-02-06T13:46:38.755Z · LW · GW

ASI will be severely limited in what it can do.

No matter how smart it is, ASI can't predict the outcome of a fair dice roll, predict the weather far into the future, or beat you in a fair game of tic-tac-toe. Why is this important? Because strategies for avoiding x-risk from ASI might exploit limitations like these.

Some general classes of limitations:

  • Limited information. A chatbot ASI can't know how many fingers you're holding up behind your back, unless you tell it.
  • Predicting chaotic systems. A chaotic system is highly sensitive to its initial conditions. This means that without perfect information about the initial conditions, which is impossible to get because of Heisenberg's Uncertainty Principle, it is impossible to predict the far future states of chaotic systems. This famously makes it impossible to predict the weather far into the future, or the motion of a double pendulum. Plausibly, many complex systems like human thoughts and the stock market are also chaotic.
  • Physical limitations. ASI can't travel faster than the speed of light or make a perpetual motion machine.
  • At best optimal. Many idealized games have optimal strategies, and ASI can't beat those. Hence ASI can't beat you at tic-tac-toe or make money by playing blackjack. This phenomenon likely generalizes to non-idealized situations with poor optimal strategies.
  • Computational limits. By the Time hierarchy theorem, we know there are computational problems which require exponential time to solve. ASI can't solve those for large instances in reasonable time. While proving computational hardness is notoriously difficult, many experts believe that P != NP, which would imply that ASI can't solve a host of practical problems like the Travelling salesman problem. Plausibly, we can make practical encryption algorithms which the ASI can't crack.
  • Mathematical impossibilities. The ASI can't prove a false theorem, can't make a voting system which beats Arrow's impossibility theorem, and can't solve the Halting problem.

Caveats: In practice, the ASI can likely partially bypass some of these limitations. It might be able to use social engineering to make you reveal how many fingers you are holding behind your back, use card counting to make money playing blackjack, exploit implementation bugs in the encryption algorithm, our current understand of physics might be wrong, and so on. However, I still think the listed limitations are likely to correlate well with what is hard for the ASI, making it directionally correct.