Otherness and control in the age of AGI

post by Joe Carlsmith (joekc) · 2024-01-02T18:15:54.168Z · LW · GW · 0 comments

(Cross-posted from my website.)

“With malice towards none; with charity towards all; with firmness in the right, as God gives us to see the right…” 

- Abraham Lincoln

Lincoln’s Second Inaugural (image source here)

I’ve written a series of essays that I’m calling “Otherness and control in the age of AGI.” The series examines a set of interconnected questions about how agents with different values should relate to one another, and about the ethics of seeking and sharing power. They’re old questions – but I think that we will have to grapple with them in new ways as increasingly powerful AI systems come online. And I think they’re core to some parts of the discourse about existential risk from misaligned AI (hereafter, “AI risk”).[1]

The series covers a lot of ground, but I’m hoping the individual essays can be read fairly well on their own. Here’s a brief summary of the essays that have been released thus far (I’ll update it as I release more):

I’ll also note two caveats about the series as a whole. First, the series is centrally an exercise in philosophy, but it also touches on some issues relevant to the technical challenge of ensuring that the AI systems we build do not kill all humans, and to the empirical question of whether our efforts in this respect will fail. And I confess to some worry about bringing the philosophical stuff too near to the technical/empirical stuff. In particular: my sense is that people are often eager, in discussions about AI risk, to argue at the level of grand ideological abstraction rather than brass-tacks empirics – and I worry that these essays will feed such temptations. This isn’t to say that philosophy is irrelevant to AI risk – to the contrary, part of my hope, in these essays, is to help us see more clearly the abstractions that move and shift underneath certain discussions of the issue. But we should be very clear about the distinction between affiliating with some philosophical vibe and making concrete predictions about the future. Ultimately, it’s the concrete-prediction thing that matters most;[3] and if the right concrete prediction is “advanced AIs have a substantive chance of killing all the humans,” you don’t need to do much philosophy to get upset, or to get to work. Indeed, particularly in AI, it’s easy to argue about philosophical questions over-much. Doing so can be distracting candy, especially if it lets you bounce off more technical problems. And if we fail on certain technical problems, we may well end up dead.

Second: even as the series focuses on philosophical stuff rather than technical/empirical stuff, it also focuses on a very particular strand of philosophical stuff – namely, a cluster of related philosophical assumptions and frames that I associate most centrally with Eliezer Yudkowsky, whose writings have done a lot to frame and popularize AI risk as an issue. And here, too, I worry about pushing the conversation in the wrong direction. That is: I think that Yudkowsky’s philosophical views are sufficiently influential, interesting, and fleshed-out that it’s worth interrogating them in depth. But I don’t want people to confuse their takes on Yudkowsky’s philosophical views (or his more technical/empirical views, or his vibe more broadly) for their takes on the severity of existential risk from AI more generally – and I worry these essays might prompt such a conflation. So please, remember: there are a very wide variety of ways to care about making sure that advanced AIs don’t kill everyone. Fundamentalist Christians can care about this; deep ecologists can care about this; solipsists can care about this; people who have no interest in philosophy at all can care about this. Indeed, in many respects, these essays aren’t centrally about AI risk in the sense of “let’s make sure that the AIs don’t kill everyone” (i.e., “AInotkilleveryoneism”) – rather, they’re about a set of broader questions about otherness and control that arise in the context of trying to ensure that the future goes well more generally. And what’s more, as I note in the series in various places, much of my interrogation of Yudkowsky’s views has to do with the sort of philosophical momentum they create in various directions, rather than with whether Yudkowsky in particular takes them there. In this sense, my concern is not ultimately with Yudkowsky’s views per se, but rather with a sort of abstracted existential narrative that I think Yudkowsky’s writings often channel and express – one that I think different conversations about advanced AI live within to different degrees, and which I hope to help us see more whole.   

Thanks to Katja Grace, Ketan Ramakrishnan, Carl Shulman, Anna Salamon, Will MacAskill, and many others over the years for conversation about these topics; and thanks to Carl Shulman for written comments. Some of my thinking and writing on these topics occurred in the context of my work for Open Philanthropy, but I am here speaking only for myself and not for my employer.

  1. ^

    There are lots of other risks from AI, too; but I want to focus on existential risk from misalignment, here, and I want the short phrase “AI risk” for the thing I’m going to be referring to repeatedly.

  2. ^

    My relationship to the MtG Color Wheel is mostly via somewhat-reinterpreting Duncan Sabien’s presentation here, who credits Mark Rosewater for a lot of his understanding. My characterization won’t necessarily resonate with people who actually play Magic.

  3. ^

    See here and here for a few of my attempts at more quantitative forecasts.

0 comments

Comments sorted by top scores.