Spreading messages to help with the most important centurypost by HoldenKarnofsky · 2023-01-25T18:20:07.322Z · LW · GW · 4 comments
Challenges of AI-related messages Messages that seem risky to spread in isolation Messages that seem important and helpful (and right!) We should worry about conflict between misaligned AI and all humans AIs could behave deceptively, so “evidence of safety” might be misleading AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems Alignment research is prosocial and great It might be important for companies (and other institutions) to act in unusual ways We’re not ready for this How to spread messages like these? Footnotes None 4 comments
In the most important century series, I argued that the 21st century could be the most important century ever for humanity, via the development of advanced AI systems that could dramatically speed up scientific and technological advancement, getting us more quickly than most people imagine to a deeply unfamiliar future.
In this more recent series, I’ve been trying to help answer this question: “So what? What can I do to help?”
So far, I’ve just been trying to build a picture of some of the major risks we might face (especially the risk of misaligned AI that could defeat all of humanity), what might be challenging about these risks, and why we might succeed anyway. Now I’ve finally gotten to the part where I can start laying out tangible ideas for how to help (beyond the pretty lame suggestions I gave before).
This piece is about one broad way to help: spreading messages that ought to be more widely understood.
One reason I think this topic is worth a whole piece is that practically everyone can help with spreading messages at least some, via things like talking to friends; writing explanations of your own that will appeal to particular people; and, yes, posting to Facebook and Twitter and all of that. Call it slacktivism if you want, but I’d guess it can be a big deal: many extremely important AI-related ideas are understood by vanishingly small numbers of people, and a bit more awareness could snowball. Especially because these topics often feel too “weird” for people to feel comfortable talking about them! Engaging in credible, reasonable ways could contribute to an overall background sense that it’s OK to take these ideas seriously.
And then there are a lot of potential readers who might have special opportunities to spread messages. Maybe they are professional communicators (journalists, bloggers, TV writers, novelists, TikTokers, etc.), maybe they’re non-professionals who still have sizable audiences (e.g., on Twitter), maybe they have unusual personal and professional networks, etc. Overall, the more you feel you are good at communicating with some important audience (even a small one), the more this post is for you.
That said, I’m not excited about blasting around hyper-simplified messages. As I hope this series has shown, the challenges that could lie ahead of us are complex and daunting, and shouting stuff like “AI is the biggest deal ever!” or “AI development should be illegal!” could do more harm than good (if only by associating important ideas with being annoying). Relatedly, I think it’s generally not good enough to spread the most broad/relatable/easy-to-agree-to version of each key idea, like “AI systems could harm society.” Some of the unintuitive details are crucial.
Instead, the gauntlet I’m throwing is: “find ways to help people understand the core parts of the challenges we might face, in as much detail as is feasible.” That is: the goal is to try to help people get to the point where they could maintain a reasonable position in a detailed back-and-forth, not just to get them to repeat a few words or nod along to a high-level take like “AI safety is important.” This is a lot harder than shouting “AI is the biggest deal ever!”, but I think it’s worth it, so I’m encouraging people to rise to the challenge and stretch their communication skills.
Below, I will:
- Outline some general challenges of this sort of message-spreading.
- Go through some ideas I think it’s risky to spread too far, at least in isolation.
- Go through some of the ideas I’d be most excited to see spread.
- Talk a little bit about how to spread ideas - but this is mostly up to you.
Challenges of AI-related messages
Here’s a simplified story for how spreading messages could go badly.
- You’re trying to convince your friend to care more about AI risk.
- You’re planning to argue: (a) AI could be really powerful and important within our lifetimes; (b) Building AI too quickly/incautiously could be dangerous.
- Your friend just isn’t going to care about (b) if they aren’t sold on some version of (a). So you’re starting with (a).
- Unfortunately, (a) is easier to understand than (b). So you end up convincing your friend of (a), and not (yet) (b).
- Your friend announces, “Aha - I see that AI could be tremendously powerful and important! I need to make sure that people/countries I like are first to build it!” and runs off to help build powerful AI as fast as possible. They’ve chosen the competition frame (“will the right or the wrong people build powerful AI first?”) over the caution frame (“will we screw things up and all lose?”), because the competition frame is easier to understand.
- Why is this bad? See previous pieces on the importance of caution.
(Click to expand) More on the “competition” frame vs. the “caution” frame”
In a previous piece, I talked about two contrasting frames for how to make the best of the most important century:
The caution frame. This frame emphasizes that a furious race to develop powerful AI could end up making everyone worse off. This could be via: (a) AI forming dangerous goals of its own and defeating humanity entirely; (b) humans racing to gain power and resources and “lock in” their values.
Ideally, everyone with the potential to build something powerful enough AI would be able to pour energy into building something safe (not misaligned), and carefully planning out (and negotiating with others on) how to roll it out, without a rush or a race. With this in mind, perhaps we should be doing things like:
- Working to improve trust and cooperation between major world powers. Perhaps via AI-centric versions of Pugwash (an international conference aimed at reducing the risk of military conflict), perhaps by pushing back against hawkish foreign relations moves.
- Discouraging governments and investors from shoveling money into AI research, encouraging AI labs to thoroughly consider the implications of their research before publishing it or scaling it up, working toward standards and monitoring, etc. Slowing things down in this manner could buy more time to do research on avoiding misaligned AI, more time to build trust and cooperation mechanisms, and more time to generally gain strategic clarity
The “competition” frame. This frame focuses less on how the transition to a radically different future happens, and more on who's making the key decisions as it happens.
- If something like PASTA is developed primarily (or first) in country X, then the government of country X could be making a lot of crucial decisions about whether and how to regulate a potential explosion of new technologies.
- In addition, the people and organizations leading the way on AI and other technology advancement at that time could be especially influential in such decisions.
This means it could matter enormously "who leads the way on transformative AI" - which country or countries, which people or organizations.
Some people feel that we can make confident statements today about which specific countries, and/or which people and organizations, we should hope lead the way on transformative AI. These people might advocate for actions like:
- Increasing the odds that the first PASTA systems are built in countries that are e.g. less authoritarian, which could mean e.g. pushing for more investment and attention to AI development in these countries.
- Supporting and trying to speed up AI labs run by people who are likely to make wise decisions (about things like how to engage with governments, what AI systems to publish and deploy vs. keep secret, etc.)
Tension between the two frames. People who take the "caution" frame and people who take the "competition" frame often favor very different, even contradictory actions. Actions that look important to people in one frame often look actively harmful to people in the other.
For example, people in the "competition" frame often favor moving forward as fast as possible on developing more powerful AI systems; for people in the "caution" frame, haste is one of the main things to avoid. People in the "competition" frame often favor adversarial foreign relations, while people in the "caution" frame often want foreign relations to be more cooperative.
That said, this dichotomy is a simplification. Many people - including myself - resonate with both frames. But I have a general fear that the “competition” frame is going to be overrated by default for a number of reasons, as I discuss here.
Unfortunately, I’ve seen something like the above story play out in multiple significant instances (though I shouldn’t give specific examples).
And I’m especially worried about this dynamic when it comes to people in and around governments (especially in national security communities), because I perceive governmental culture as particularly obsessed with staying ahead of other countries (“If AI is dangerous, we’ve gotta build it first”) and comparatively uninterested in things that are dangerous for our country because they’re dangerous for the whole world at once (“Maybe we should worry a lot about pandemics?”)1
You could even argue (although I wouldn’t agree!2) that to date, efforts to “raise awareness” about the dangers of AI have done more harm than good (via causing increased investment in AI, generally).
So it’s tempting to simply give up on the whole endeavor - to stay away from message spreading entirely, beyond people you know well and/or are pretty sure will internalize the important details. But I think we can do better.
This post is aimed at people who are good at communicating with at least some audience. This could be because of their skills, or their relationships, or some combination. In general, I’d expect to have more success with people who hear from you a lot (because they’re your friend, or they follow you on Twitter or Substack, etc.) than with people you reach via some viral blast of memery - but maybe you’re skilled enough to make the latter work too, which would be awesome. I'm asking communicators to hit a high bar: leave people with strong understanding, rather than just getting them to repeat a few sentences about AI risk.
Messages that seem risky to spread in isolation
First, here are a couple of messages that I’d rather people didn’t spread (or at least have mixed feelings about spreading) in isolation, i.e., without serious efforts to include some of the other messages I cover below.
One category is messages that generically emphasize the importance and potential imminence of powerful AI systems. The reason for this is in the previous section: many people seem to react to these ideas (especially when unaccompanied by some other key ones) with a “We’d better build powerful AI as fast as possible, before others do” attitude. (If you’re curious about why I wrote The Most Important Century anyway, see footnote for my thinking.3)
Another category is messages that emphasize that AI could be risky/dangerous to the world, without much effort to fill in how, or with an emphasis on easy-to-understand risks.
- Since “dangerous” tends to imply “powerful and important,” I think there are similar risks to the previous section.
- If people have a bad model of how and why AI could be risky/dangerous (missing key risks and difficulties), they might be too quick to later say things like “Oh, turns out this danger is less bad than I thought, let’s go full speed ahead!” Below, I outline how misleading “progress” could lead to premature dismissal of the risks.
Messages that seem important and helpful (and right!)
We should worry about conflict between misaligned AI and all humans
Unlike the messages discussed in the previous section, this one directly highlights why it might not be a good idea to rush forward with building AI oneself.
The idea that an AI could harm the same humans who build it has very different implications from the idea that AI could be generically dangerous/powerful. Less “We’d better get there before others,” more “there’s a case for moving slowly and working together here.”
The idea that AI could be a problem for the same people who build it is common in fictional portrayals of AI (HAL 9000, Skynet, The Matrix, Ex Machina) - maybe too much so? It seems to me that people tend to balk at the “sci-fi” feel, and what’s needed is more recognition that this is a serious, real-world concern.
The main pieces in this series making this case are Why would AI “aim” to defeat humanity? and AI could defeat all of us combined. There are many other pieces on the alignment problem (see list here); also see Matt Yglesias's case for specifically embracing the “Terminator”/Skynet analogy.
I’d be especially excited for people to spread messages that help others understand - at a mechanistic level - how and why AI systems could end up with dangerous goals of their own, deceptive behavior, etc. I worry that by default, the concern sounds like lazy anthropomorphism (thinking of AIs just like humans).
Transmitting ideas about the “how and why” is a lot harder than getting people to nod along to “AI could be dangerous.” I think there’s a lot of effort that could be put into simple, understandable yet relatable metaphors/analogies/examples (my pieces make some effort in this direction, but there’s tons of room for more).
AIs could behave deceptively, so “evidence of safety” might be misleading
I’m very worried about a sequence of events like:
- As AI systems become more powerful, there are some concerning incidents, and widespread concern about “AI risk” grows.
- But over time, AI systems are “better trained” - e.g., given reinforcement to stop them from behaving in unintended ways - and so the concerning incidents become less common.
- Because of this, concern dissipates, and it’s widely believed that AI safety has been “solved.”
- But what’s actually happened is that the “better training” has caused AI systems to behave deceptively - to appear benign in most situations, and to cause trouble only when (a) this wouldn’t be detected or (b) humans can be overpowered entirely.
I worry about AI systems’ being deceptive in the same way a human might: going through chains of reasoning like “If I do X, I might get caught, but if I do Y, no one will notice until it’s too late.” But it can be hard to get this concern taken seriously, because it means attributing behavior to AI systems that we currently associate exclusively with humans (today’s AI systems don’t really do things like this4).
One of the central things I’ve tried to spell out in this series is why an AI system might engage in this sort of systematic deception, despite being very unlike humans (and not necessarily having e.g. emotions). It’s a major focus of both of these pieces from this series:
Whether this point is widely understood seems quite crucial to me. We might end up in a situation where (a) there are big commercial and military incentives to rush ahead with AI development; (b) we have what seems like a set of reassuring experiments and observations.
At that point, it could be key whether people are asking tough questions about the many ways in which “evidence of AI safety” could be misleading, which I discussed at length in AI Safety Seems Hard to Measure.
(Click to expand) Why AI safety could be hard to measure
In previous pieces, I argued that:
- If we develop powerful AIs via ambitious use of the “black-box trial-and-error” common in AI development today, then there’s a substantial risk that:
- These AIs will develop unintended aims (states of the world they make calculations and plans toward, as a chess-playing AI "aims" for checkmate);
- These AIs could deceive, manipulate, and even take over the world from humans entirely as needed to achieve those aims.
- People today are doing AI safety research to prevent this outcome, but such research has a number of deep difficulties:
“Great news - I’ve tested this AI and it looks safe.” Why might we still have a problem? Problem Key question Explanation The Lance Armstrong problem Did we get the AI to be actually safe or good at hiding its dangerous actions?
When dealing with an intelligent agent, it’s hard to tell the difference between “behaving well” and “appearing to behave well.”
When professional cycling was cracking down on performance-enhancing drugs, Lance Armstrong was very successful and seemed to be unusually “clean.” It later came out that he had been using drugs with an unusually sophisticated operation for concealing them.
The King Lear problem
The AI is (actually) well-behaved when humans are in control. Will this transfer to when AIs are in control?
It's hard to know how someone will behave when they have power over you, based only on observing how they behave when they don't.
AIs might behave as intended as long as humans are in control - but at some future point, AI systems might be capable and widespread enough to have opportunities to take control of the world entirely. It's hard to know whether they'll take these opportunities, and we can't exactly run a clean test of the situation.
Like King Lear trying to decide how much power to give each of his daughters before abdicating the throne.
The lab mice problem Today's "subhuman" AIs are safe.What about future AIs with more human-like abilities?
Today's AI systems aren't advanced enough to exhibit the basic behaviors we want to study, such as deceiving and manipulating humans.
Like trying to study medicine in humans by experimenting only on lab mice.
The first contact problem
Imagine that tomorrow's "human-like" AIs are safe. How will things go when AIs have capabilities far beyond humans'?
AI systems might (collectively) become vastly more capable than humans, and it's ... just really hard to have any idea what that's going to be like. As far as we know, there has never before been anything in the galaxy that's vastly more capable than humans in the relevant ways! No matter what we come up with to solve the first three problems, we can't be too confident that it'll keep working if AI advances (or just proliferates) a lot more.
Like trying to plan for first contact with extraterrestrials (this barely feels like an analogy).
An analogy that incorporates these challenges is Ajeya Cotra’s “young businessperson” analogy:
Imagine you are an eight-year-old whose parents left you a $1 trillion company and no trusted adult to serve as your guide to the world. You must hire a smart adult to run your company as CEO, handle your life the way that a parent would (e.g. decide your school, where you’ll live, when you need to go to the dentist), and administer your vast wealth (e.g. decide where you’ll invest your money).
You have to hire these grownups based on a work trial or interview you come up with -- you don't get to see any resumes, don't get to do reference checks, etc. Because you're so rich, tons of people apply for all sorts of reasons. (More)
If your applicants are a mix of "saints" (people who genuinely want to help), "sycophants" (people who just want to make you happy in the short run, even when this is to your long-term detriment) and "schemers" (people who want to siphon off your wealth and power for themselves), how do you - an eight-year-old - tell the difference?
AI projects should establish and demonstrate safety (and potentially comply with safety standards) before deploying powerful systems
I’ve written about the benefits we might get from “safety standards." The idea is that AI projects should not deploy systems that pose too much risk to the world, as evaluated by a systematic evaluation regime: AI systems could be audited to see whether they are safe. I've outlined how AI projects might self-regulate by publicly committing to having their systems audited (and not deploying dangerous ones), and how governments could enforce safety standards both nationally and internationally.
Today, development of safety standards is in its infancy. But over time, I think it could matter a lot how much pressure AI projects are under to meet safety standards. And I think it’s not too early, today, to start spreading the message that AI projects shouldn’t unilaterally decide to put potentially dangerous systems out in the world; the burden should be on them to demonstrate and establish safety before doing so.
(Click to expand) How standards might be established and become national or international
I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:
- Today’s leading AI companies could self-regulate by committing not to build or deploy a system that they can’t convincingly demonstrate is safe (e.g., see Google’s 2018 statement, "We will not design or deploy AI in weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people”).
- Even if some people at the companies would like to deploy unsafe systems, it could be hard to pull this off once the company has committed not to.
- Even if there’s a lot of room for judgment in what it means to demonstrate an AI system is safe, having agreed in advance that certain evidence is not good enough could go a long way.
- As more AI companies are started, they could feel soft pressure to do similar self-regulation, and refusing to do so is off-putting to potential employees, investors, etc.
- Eventually, similar principles could be incorporated into various government regulations and enforceable treaties.
- Governments could monitor for dangerous projects using regulation and even overseas operations. E.g., today the US monitors (without permission) for various signs that other states might be developing nuclear weapons, and might try to stop such development with methods ranging from threats of sanctions to cyberwarfare or even military attacks. It could do something similar for any AI development projects that are using huge amounts of compute and haven’t volunteered information about whether they’re meeting standards.
Alignment research is prosocial and great
Most people reading this can’t go and become groundbreaking researchers on AI alignment. But they can contribute to a general sense that the people who can do this (mostly) should.
Today, my sense is that most “science” jobs are pretty prestigious, and seen as good for society. I have pretty mixed feelings about this:
- I think science has been good for humanity historically.
- But I worry that as technology becomes more and more powerful, there’s a growing risk of a catastrophe (particularly via AI or bioweapons) that wipes out all the progress to date and then some. (I've written that the historical trend to date arguably fits something like "Declining everyday violence, offset by bigger and bigger rare catastrophes.") I think our current era would be a nice time to adopt an attitude of “proceed with caution” rather than “full speed ahead.”
- I resonate with Toby Ord’s comment (in The Precipice), “humanity is akin to an adolescent, with rapidly developing physical abilities, lagging wisdom and self-control, little thought for its longterm future and an unhealthy appetite for risk.”
I wish there were more effort, generally, to distinguish between especially dangerous science and especially beneficial science. AI alignment seems squarely in the latter category.
I’d be especially excited for people to spread messages that give a sense of the specifics of different AI alignment research paths, how they might help or fail, and what’s scientifically/intellectually interesting (not just useful) about them.
The main relevant piece in this series is High-level hopes for AI alignment, which distills a longer piece (How might we align transformative AI if it’s developed very soon? [AF · GW]) that I posted on the Alignment Forum.
There are a number (hopefully growing) of other careers that I consider especially valuable, which I'll discuss in my next post on this topic.
It might be important for companies (and other institutions) to act in unusual ways
It always makes me sweat when I’m talking to someone from an AI company and they seem to think that commercial success and benefiting humanity are roughly the same goal/idea.
A lot of the most helpful actions might be “out of the ordinary.” When racing through a minefield, I hope key actors will:
- Put more effort into alignment, threat assessment, and security than is required by commercial incentives;
- Consider measures for avoiding races and global monitoring that could be very unusual, even unprecedented.
- Do all of this in the possible presence of ambiguous, confusing information about the risks.
(To be clear, I don't think an AI project's only goal should be to avoid the risk of misaligned AI. I've given this risk a central place in this piece partly because I think it's especially at risk of being too quickly dismissed - but I don't think it's the only major risk. I think AI projects need to strike a tricky balance between the caution and competition frames, and consider a number of issues beyond the risk of misalignment. But I think it's a pretty robust point that they need to be ready to do unusual things rather than just following commercial incentives.)
I’m nervous about a world in which:
- Most people stick with paradigms they know - a company should focus on shareholder value, a government should focus on its own citizens (rather than global catastrophic risks), etc.
- As the pace of progress accelerates, we’re sitting here with all kinds of laws, norms and institutions that aren’t designed for the problems we’re facing - and can’t adapt in time. A good example would be the way governance works for a standard company: it’s legally and structurally obligated to be entirely focused on benefiting its shareholders, rather than humanity as a whole. (There are alternative ways of setting up a company without these problems!5)
At a minimum (as I argued previously), I think AI companies should be making sure they have whatever unusual governance setups they need in order to prioritize benefits to humanity - not returns to shareholders - when the stakes get high. I think we’d see more of this if more people believed something like: “It might be important for companies (and other institutions) to act in unusual ways.”
We’re not ready for this
If we’re in the most important century, there’s likely to be a vast set of potential challenges ahead of us, most of which have gotten very little attention. (More here: Transformative AI issues (not just misalignment): an overview)
If it were possible to slow everything down, by default I’d think we should. Barring that, I’d at least like to see people generally approaching the topic of AI with a general attitude along the lines of “We’re dealing with something really big here, and we should be trying really hard to be careful and humble and thoughtful” (as opposed to something like “The science is so interesting, let’s go for it” or “This is awesome, we’re gonna get rich” or “Whatever, who cares”).
I’ll re-excerpt this table from an earlier piece:
Situation Appropriate reaction (IMO) "This could be a billion-dollar company!" "Woohoo, let's GO for it!" "This could be the most important century!" "... Oh ... wow ... I don't know what to say and I somewhat want to vomit ... I have to sit down and think about this one."
I’m not at all sure about this, but one potential way to spread this message might be to communicate, with as much scientific realism, detail and believability as possible, about what the world might look like after explosive scientific and technological advancement brought on by AI (for example, a world with digital people). I think the enormous unfamiliarity of some of the issues such a world might face - and the vast possibilities for utopia or dystopia - might encourage an attitude of not wanting to rush forward.
How to spread messages like these?
I’ve tried to write a series that explains the key issues to careful readers, hopefully better equipping them to spread helpful messages. From here, individual communicators need to think about the audiences they know and the mediums they use (Twitter? Facebook? Essays/newsletters/blog posts? Video? In-person conversation?) and what will be effective with those audiences and mediums.
The main guidelines I want to advocate:
- Err toward sustained, repeated, relationship-based communication as opposed to prioritizing “viral blasts” (unless you are so good at the latter that you feel excited to spread the pretty subtle ideas in this piece that way!)
- Aim high: try for the difficult goal of “My audience walks away really understanding key points” rather than the easier goal of “My audience has hit the ‘like’ button for a sort of related idea.”
- A consistent piece of feedback I’ve gotten on my writing is that making things as concrete as possible is helpful - so giving real-world examples of problems analogous to the ones we’re worried about, or simple analogies that are easy to imagine and remember, could be key. But it’s important to choose these carefully so that the key dynamics aren’t lost.
When I imagine what the world would look like without any of the efforts to “raise awareness,” I picture a world with close to zero awareness of - or community around - major risks from transformative AI. While this world might also have more time left before dangerous AI is developed, on balance this seems worse. A future piece will elaborate on the many ways I think a decent-sized community can help reduce risks. ↩
I do think “AI could be a huge deal, and soon” is a very important point that somewhat serves as a prerequisite for understanding this topic and doing helpful work on it, and I wanted to make this idea more understandable and credible to a number of people - as well as to create more opportunities to get critical feedback and learn what I was getting wrong.
But I was nervous about the issues noted in this section. With that in mind, I did the following things:
- The title, “most important century,” emphasizes a time frame that I expect to be less exciting/motivating for the sorts of people I’m most worried about (compared to the sorts of people I most wanted to draw in).
- I tried to persistently and centrally raise concerns about misaligned AI (raising it in two pieces, including one (guest piece) devoted to it, before I started discussing how soon transformative AI might be developed), and extensively discussed the problems of overemphasizing “competition” relative to “caution.”
- I ended the series with a piece arguing against being too “action-oriented.”
- I stuck to “passive” rather than “active” promotion of the series, e.g., I accepted podcast invitations but didn’t seek them out. I figured that people with proactive interest would be more likely to give in-depth, attentive treatments rather than low-resolution, oversimplified ones.
I don’t claim to be sure I got all the tradeoffs right. ↩
There are some papers arguing that AI systems do things something like this (e.g., see the “Challenges” section of this post), but I think the dynamic is overall pretty far from what I’m most worried about. ↩
Comments sorted by top scores.