Spreading messages to help with the most important century

post by HoldenKarnofsky · 2023-01-25T18:20:07.322Z · LW · GW · 4 comments

Contents

  Challenges of AI-related messages
  Messages that seem risky to spread in isolation
  Messages that seem important and helpful (and right!)
    We should worry about conflict between misaligned AI and all humans
    AIs could behave deceptively, so “evidence of safety” might be misleading
    AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems
    Alignment research is prosocial and great
    It might be important for companies (and other institutions) to act in unusual ways
    We’re not ready for this
  How to spread messages like these?
  Footnotes
None
4 comments
Spreading messages to help with the most important century

Click lower right to download or find on Apple Podcasts, Spotify, Stitcher, etc.

In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.

In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”

So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out tangible ideas for how to help (beyond the pretty lame suggestions I gave before).

This piece is about one broad way to help: spreading messages that ought to be more widely understood.

One reason I think this topic is worth a whole piece is that practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. Call it slacktivism if you want, but I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously.

And then there are a lot of potential readers who might have special opportunities to spread messages. Maybe they are professional communicators (journalists, bloggers, TV writers, novelists, TikTokers, etc.), maybe they’re non-professionals who still have sizable audiences (e.g., on Twitter), maybe they have unusual personal and professional networks, etc. Overall, the more you feel you are good at communicating with some important audience (even a small one), the more this post is for you.

That said, I’m not excited about blasting around hyper-simplified messages. As I hope this series has shown, the challenges that could lie ahead of us are complex and daunting, and shouting stuff like “AI is the biggest deal ever!” or “AI development should be illegal!” could do more harm than good (if only by associating important ideas with being annoying). Relatedly, I think it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea, like “AI systems could harm society.” Some of the unintuitive details are crucial.

Instead, the gauntlet I’m throwing is: “find ways to help people understand the core parts of the challenges we might face, in as much detail as is feasible.” That is: the goal is to try to help people get to the point where they could maintain a reasonable position in a detailed back-and-forth, not just to get them to repeat a few words or nod along to a high-level take like “AI safety is important.” This is a lot harder than shouting “AI is the biggest deal ever!”, but I think it’s worth it, so I’m encouraging people to rise to the challenge and stretch their communication skills.

Below, I will:

Here’s a simplified story for how spreading messages could go badly.

(Click to expand) More on the “competition” frame vs. the “caution” frame”

In a previous piece, I talked about two contrasting frames for how to make the best of the most important century:

The caution frame. This frame emphasizes that a furious race to develop powerful AI could end up making everyone worse off. This could be via: (a) AI forming dangerous goals of its own and defeating humanity entirely; (b) humans racing to gain power and resources and “lock in” their values.

Ideally, everyone with the potential to build something powerful enough AI would be able to pour energy into building something safe (not misaligned), and carefully planning out (and negotiating with others on) how to roll it out, without a rush or a race. With this in mind, perhaps we should be doing things like:

The “competition” frame. This frame focuses less on how the transition to a radically different future happens, and more on who's making the key decisions as it happens.

This means it could matter enormously "who leads the way on transformative AI" - which country or countries, which people or organizations.

Some people feel that we can make confident statements today about which specific countries, and/or which people and organizations, we should hope lead the way on transformative AI. These people might advocate for actions like:

Tension between the two frames. People who take the "caution" frame and people who take the "competition" frame often favor very different, even contradictory actions. Actions that look important to people in one frame often look actively harmful to people in the other.

For example, people in the "competition" frame often favor moving forward as fast as possible on developing more powerful AI systems; for people in the "caution" frame, haste is one of the main things to avoid. People in the "competition" frame often favor adversarial foreign relations, while people in the "caution" frame often want foreign relations to be more cooperative.

That said, this dichotomy is a simplification. Many people - including myself - resonate with both frames. But I have a general fear that the “competition” frame is going to be overrated by default for a number of reasons, as I discuss here.

Unfortunately, I’ve seen something like the above story play out in multiple significant instances (though I shouldn’t give specific examples).

And I’m especially worried about this dynamic when it comes to people in and around governments (especially in national security communities), because I perceive governmental culture as particularly obsessed with staying ahead of other countries (“If AI is dangerous, we’ve gotta build it first”) and comparatively uninterested in things that are dangerous for our country because they’re dangerous for the whole world at once (“Maybe we should worry a lot about pandemics?”)1

You could even argue (although I wouldn’t agree!2) that to date, efforts to “raise awareness” about the dangers of AI have done more harm than good (via causing increased investment in AI, generally).

So it’s tempting to simply give up on the whole endeavor - to stay away from message spreading entirely, beyond people you know well and/or are pretty sure will internalize the important details. But I think we can do better.

This post is aimed at people who are good at communicating with at least some audience. This could be because of their skills, or their relationships, or some combination. In general, I’d expect to have more success with people who hear from you a lot (because they’re your friend, or they follow you on Twitter or Substack, etc.) than with people you reach via some viral blast of memery - but maybe you’re skilled enough to make the latter work too, which would be awesome. I'm asking communicators to hit a high bar: leave people with strong understanding, rather than just getting them to repeat a few sentences about AI risk.

Messages that seem risky to spread in isolation

First, here are a couple of messages that I’d rather people didn’t spread (or at least have mixed feelings about spreading) in isolation, i.e., without serious efforts to include some of the other messages I cover below.

One category is messages that generically emphasize the importance and potential imminence of powerful AI systems. The reason for this is in the previous section: many people seem to react to these ideas (especially when unaccompanied by some other key ones) with a “We’d better build powerful AI as fast as possible, before others do” attitude. (If you’re curious about why I wrote The Most Important Century anyway, see footnote for my thinking.3)

Another category is messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks.

Messages that seem important and helpful (and right!)

We should worry about conflict between misaligned AI and all humans

Unlike the messages discussed in the previous section, this one directly highlights why it might not be a good idea to rush forward with building AI oneself.

The idea that an AI could harm the same humans who build it has very different implications from the idea that AI could be generically dangerous/powerful. Less “We’d better get there before others,” more “there’s a case for moving slowly and working together here.”

The idea that AI could be a problem for the same people who build it is common in fictional portrayals of AI (HAL 9000, Skynet, The Matrix, Ex Machina) - maybe too much so? It seems to me that people tend to balk at the “sci-fi” feel, and what’s needed is more recognition that this is a serious, real-world concern.

The main pieces in this series making this case are Why would AI “aim” to defeat humanity? and AI could defeat all of us combined. There are many other pieces on the alignment problem (see list here); also see Matt Yglesias's case for specifically embracing the “Terminator”/Skynet analogy.

I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans).

Transmitting ideas about the “how and why” is a lot harder than getting people to nod along to “AI could be dangerous.” I think there’s a lot of effort that could be put into simple, understandable yet relatable metaphors/analogies/examples (my pieces make some effort in this direction, but there’s tons of room for more).

AIs could behave deceptively, so “evidence of safety” might be misleading

I’m very worried about a sequence of events like:

I worry about AI systems’ being deceptive in the same way a human might: going through chains of reasoning like “If I do X, I might get caught, but if I do Y, no one will notice until it’s too late.” But it can be hard to get this concern taken seriously, because it means attributing behavior to AI systems that we currently associate exclusively with humans (today’s AI systems don’t really do things like this4).

One of the central things I’ve tried to spell out in this series is why an AI system might engage in this sort of systematic deception, despite being very unlike humans (and not necessarily having e.g. emotions). It’s a major focus of both of these pieces from this series:

Whether this point is widely understood seems quite crucial to me. We might end up in a situation where (a) there are big commercial and military incentives to rush ahead with AI development; (b) we have what seems like a set of reassuring experiments and observations.

At that point, it could be key whether people are asking tough questions about the many ways in which “evidence of AI safety” could be misleading, which I discussed at length in AI Safety Seems Hard to Measure.

(Click to expand) Why AI safety could be hard to measure

In previous pieces, I argued that:

4 comments

Comments sorted by top scores.

comment by Michaël Trazzi (mtrazzi) · 2023-01-26T04:57:13.726Z · LW(p) · GW(p)

meta: it seems like the collapse feature doesn't work on mobile, and the table is hard to read (especially the first column)

Replies from: Raemon
comment by Raemon · 2023-01-26T05:21:09.089Z · LW(p) · GW(p)

it's more that the collapse feature doesn't work on LessWrong (it's from Holden's blog, which this is crossposted from)

I do think it'd be a good thing to build

Replies from: MondSemmel
comment by MondSemmel · 2023-01-26T10:16:20.575Z · LW(p) · GW(p)

Oooh, I would love to have Workflowy-style collapsible toggles on LW. That's a way to write essays which expand on small details, or which meander somewhat, without inconveniencing readers who just want to see the important points. (Side note: Notion furthermore has collapsible headings, which are also great.)

comment by TsviBT · 2023-01-26T12:51:11.459Z · LW(p) · GW(p)

Also, "No one knows how to make AI systems that try to do what we'd want them to do."