The Social Alignment Problem

post by irving (judith) · 2023-04-28T14:16:17.825Z · LW · GW · 13 comments

Contents

  Real Talking To The Public has never been tried
    They won’t listen/They won’t understand
    Don’t cry wolf/Preserve dry powder
    We need to avoid angering the labs
    Don’t create more idiot disaster monkeys
  My Bigger Point: We Lack Coordination
None
13 comments

TLDR: I think public outreach is a very hopeful path to victory. More importantly, extended large-scale conversation around questions such as whether public outreach is a hopeful path to victory would be very likely to decrease p(doom).

You’re a genius mechanical engineer in prison. You stumble across a huge bomb rigged to blow in a random supply closet. You shout for two guards passing by, but they laugh you off. You decide to try to defuse it yourself.

This is arguably a reasonable response, given that this is your exact skill set. This is what you were trained for. But after a few hours of fiddling around with the bomb, you start to realize that it's much more complicated than you thought. You have no idea when it’s going to go off, but you start despairing that you can defuse it on your own. You sink to the floor with your face in your hands. You can’t figure it out. Nobody will listen to you.

 

Real Talking To The Public has never been tried

Much like the general public has done with the subject of longevity, I think many people in our circle have adopted an assumption of hopelessness toward public outreach social alignment, before a relevant amount of effort has been expended. In truth, there are many reasons to expect this strategy to be quite realistic, and very positively impactful too. A world in which the cause of AI safety is as trendy as the cause of climate change, and in which society is as knowledgeable about questions of alignment as it it about vaccine efficacy (meaning not even that knowledgeable), is one where sane legislation designed to slow capabilities and invest in alignment becomes probable, and where capabilities research is stigmatized and labs find access to talent and resources harder to come by.

I’ve finally started to see individual actors taking steps towards this goal, but I’ve seen a shockingly small amount of coordinated discussion about it. When the topic is raised, there are four common objections: They Won’t Listen, Don’t Cry Wolf, Don’t Annoy the Labs, and Don’t Create More Disaster Monkeys.
 

They won’t listen/They won’t understand

I cannot overstate how clearly utterly false this is at this point.

It’s understandable that this has been our default belief. I think debating e/accs on Twitter has broken our brains. The experience of explaining again and again why something smarter than you that doesn’t care about you is dangerous, and being met with these arguments [LW(p) · GW(p)], is a soul-crushing experience. It made sense to expect that if it’s this hard to explain to a fellow computer enthusiast, then there’s no hope of reaching the average person. For a long time I avoided talking about it with my non-tech friends (let's call them "civilians") for that reason. However, when I finally did, it felt like the breath of life. My hopelessness broke, because they instantly vigorously agreed, even finishing some of my arguments for me. Every single AI safety enthusiast I’ve spoken with who has engaged with civilians has had the exact same experience. I think it would be very healthy for anyone who is still pessimistic about convincing people to just try talking to one non-tech person in their life about this. It’s an instant shot of hope.

The truth is, if we were to decide that getting the public on our side is our goal, I think we would have one of the easiest jobs any activists social alignment researchers have ever had.

Far from being closed to the idea, civilians in general literally already get it. It turns out, Terminator and the Matrix have been in their minds this whole time. We assumed they'd been inoculated against serious AI risk concern - turns out, they walked out of the theaters thinking “wow, that’ll probably happen someday”. They’ve been thinking that the entire time we’ve been agonizing about nobody understanding us. And now, ChatGPT has taken that “someday” and made it feel real.

At this point AI optimists are like the Black Knight from Monty Python. You can slice apart as many of their arguments as you want but they can’t be killed – however, you can just go around them. We're spending all our time and effort debating them and getting nowhere, when we could just go around them to the hosts of civilians perfectly willing to listen.

The belief is already there. They just haven’t internalized it, like a casual Christian casually sinning even though their official internal belief is that they’re risking being tortured literally forever. They just need the alief.

A month ago, there had only been a handful of attempts at social alignment from us. Rob Miles has been producing accessible, high-quality content for half a decade. A petition [LW · GW] was floated to shut down Bing, which we downvoted into oblivion. There was the Bankless podcast. There was the 6-month open letter, and then the Time opinion piece and several podcast appearances. This wasn't that much effort as PR pushes go, and yet it accomplished a very appreciable news cycle that hasn't yet ended (although there were unforced errors in messaging that more coordination likely could have avoided).

Additionally, it seems to me that the incentives of almost all relevant players already align with being open to the message of slowing progress (beyond the free-bingo-square incentive of not wanting to die).

(Tangent: I think it’s worth mentioning here that stigmatization also seems very relevant to the problem of Chinese AI enthusiasm. China has invested many resources into mitigating climate change risk, in order to improve its global reputation. A future where AI capabilities research carries a heavy moral stigma globally and China decides to disinvest as a result isn’t entirely unrealistic. China has the additional incentive here that American companies are clearly ahead, and a global pause would benefit China, just as it would benefit smaller companies wanting a chance to catch up. China would then be incentivized to avoid disincentivizing an American pause.)

 

Don’t cry wolf/Preserve dry powder

The question of whether now is the time to seriously go public is a valid one. But the question assumes that at some point in the future it will be the correct time. This almost mirrors the AI risk debate itself: even if the crucial moment is in the future, it doesn’t make sense to wait until then to start preparing. A public-facing campaign can take months to plan and hone, and it seems like it makes sense to start preparing one now, even if we decide that now isn’t the correct moment.

 

We need to avoid angering the labs

Practically speaking I’ve seen no evidence that the very few safety measures labs have taken have been for our benefit. Possibly, to some small extent, they've been for PR points because of public concern we’ve raised, but certainly not out of any loyalty or affection for us. The opportunity to regulate them or impose bottlenecks on access to talent and resources via stigmatization of the field of capabilities research seems much larger than the expected benefit of hoping that they'll hold back because we've been polite.

 

Don’t create more idiot disaster monkeys

It’s true that we’re mostly in this situation because certain people heard about the arguments for risk and either came up with terrible solutions to them or smelled a potent fount of personal power. A very valid concern I’ve heard raised is that something similar could happen with governments, which would be an even worse situation than the one we’re in.

It seems unlikely that AI capabilities can advance much further without governments and other parties taking notice of their potential. If we could have a choice between them realizing the potential without hearing about the risks, or realizing the potential via hearing about the risks, the latter seems preferable. The more the public is convinced of the risk, the more incentivized governments are to act as though they are, too. Additionally, there doesn't seem to be an alternative. Unaligned superintelligence approaches by default unless something changes.


Without concerted effort from us, there are two possible outcomes. Either the current news cycle fizzles out like the last ones did, or AI risk goes truly mainstream but we lose all control over the dialogue. If it fizzles out, there’s always a chance to start another one after the next generation of AI and another doom-dice roll, assuming we won’t just say the same thing then. But even then, much of our dry powder will be gone and our time much shorter. It's hard to say how bad losing control over the dialogue could be; I don’t know how asinine the debate around this could get. But if we believe that our thinking about this topic tends to be more correct than the average person, retaining control over it should have a positive expected value.

Realistically, the latter failure appears much much more likely. I’m fairly certain that this movement is in the process taking off with or without us. There are a few groups already forming that are largely unaffiliated with EA/rationalism but are very enthusiastic. They've mostly heard of the problem through us, but they're inviting people who haven't, who will invite more people who haven't. I’ve started to see individuals scared out of all reason, sounding more and more unhinged, because they have no guidance and nowhere to get it, at least until they find these groups. A very realistic possible future includes a large AI safety movement that we have no influence over, doing things we would never have sanctioned for goals we disagree with. Losing the ability to influence something once it gets sufficiently more powerful than you; why does that sound familiar?

 

My Bigger Point: We Lack Coordination

You probably disagree with many things I’ve said, which brings me to my main point: questions like these haven’t been discussed enough for there to be much prior material to reference, let alone consensuses reached. I could be wrong about a lot of what I suggested; maybe going public is the wrong move, or maybe now isn't the right time; but I wouldn't know it because there is no extended conversation around real-world strategy. The point has been raised a couple times before that actions taken by individuals in our circle have been very uncoordinated. Every [LW · GW] time [LW · GW] this [LW · GW] is raised, some people agree and a handful of comment chains are written, but then the conversation fizzles out and nothing results.

One very annoying practical consequence of this lack of coordination is that I never have any idea what prominent figures like Eliezer are thinking. It would have been extremely useful for example to know how his meeting with Sam Altman had gone, or if he considers the famous tweet to be as indicative of personal cruelty as it seems, but I had to watch a podcast for the former and still don't know the latter. It would have been useful for his TIME article to have been proof-read by many people. It would currently be extremely useful to know what if any dialogue he’s having with Elon Musk (probably none, but if he is, this changes the gameboard). I'm not wishing I could personally ask these questions; I'm wishing there were public record of somebody asking him, after deeming them important datapoints for strategy. In general there seems to be no good way to cooperate with AI safety leadership.

I don’t like saying the phrase “we should”, but it is my strong belief that a universe in which a sizable portion of our dialogue and efforts is dedicated to ongoing, coordinated real-world strategizing is ceteris paribus much safer. It seems clear that this will be the case at some point. Even most outreach-skeptics say only that now is too soon. But starting now can do nothing but maximize time available.

To avoid passing the buck and simply hoping this time is different, I’ve set up the subreddit r/AISafetyStrategy to serve as a dedicated extended conversation about strategy for now, funded it with $1000 for operations, and am building a dedicated forum to replace it with. I realize unilateral action like this is considered a little gauche on here. To be clear, I think these actions are very suboptimal – I would much prefer something with equivalent function to be set up with the approval and input of everyone here, and I hope something is created that supercedes my thing. Even simply adding a “strategy” tag to LessWrong would probably be better. But until that something better, feel free to join and contribute your strategy questions and ideas.
 

13 comments

Comments sorted by top scores.

comment by romeostevensit · 2023-04-29T03:12:42.254Z · LW(p) · GW(p)

An important dimension of previous social movements involved giving concerned people concrete things to do so that emotional energy doesn't get wasted and/or harm the person.

comment by gilch · 2023-04-30T01:11:58.394Z · LW(p) · GW(p)

Well, the bomb story was a good hook, but it doesn't seem to fit the rest very well. There are good points in here, so I weak-upvoted anyway.

Re cry wolf/dry powder, if there's one thing I learned from our COVID experience, it's that the timing of exponential changes are hard to call and reason about. Even when you're right, you're either going to be too early, or too late. Too late is unacceptable in this case. ChatGPT made the mainstream finally take notice. The time to act is now.

comment by Dagon · 2023-04-28T17:01:07.081Z · LW(p) · GW(p)

The prisoner/bomb analogy is not useful, for the reason that there's zero chance the prisoner is incorrect.  The bomb is definitely there, and it's just a matter of getting a guard to look, not a complicated multi-step set of actions that may or may not delay a maybe-fictional-or-dud bomb.

I'm deeply suspicious of "we should" (as you acknowledge you are), and I probably won't put much effort into this, but I want to give you full credit and honor for actual attempts to improve things.  The opposite of your apology for unilateral action - you deserve praise for action rather than discussion.  All real, action is unilateral to some extent - it cannot start as a full consensus of humanity, and it's probably not going to get there ever.  

Wait, reverse the sequence of my pessimism and my praise.  Good job, whether it works or not!

Replies from: judith
comment by irving (judith) · 2023-04-28T22:46:53.433Z · LW(p) · GW(p)

Yes, I should have been more clear that I was addressing people who have very high p(doom). The prisoner/bomb is indeed somewhat of a simplification, but I do think there's a valid connection in the form of half-heartedly attempting to get the assistance of people more powerful than you and prematurely giving it up as hopeless.

Thank you for your kind words! I was expecting most reactions to be fairly anti-"we should", but I figured it was worth a try.

comment by Seth Herd · 2023-05-01T06:59:45.916Z · LW(p) · GW(p)

I couldn't agree more. Count me in. I'll join the subreddit.

My one thought is that a good PR strategy should include ways to not create polarization on the issue.

I know you saw my post AI scares and changing public beliefs [LW · GW] but I thought I'd link it here as a few thoughts on the polarization issue.

comment by rotatingpaguro · 2023-04-29T19:56:01.954Z · LW(p) · GW(p)

I think it would be very healthy for anyone who is still pessimistic about convincing people to just try talking to one non-tech person in their life about this. It’s an instant shot of hope.

I've experienced this, but also seen people dependent on chatgpt.

Replies from: judith
comment by irving (judith) · 2023-05-02T03:50:23.337Z · LW(p) · GW(p)

I've seen the latter but much more of the former.

comment by [deleted] · 2023-05-02T07:27:29.334Z · LW(p) · GW(p)

On the subject of losing control of the discourse, this tumblr post on the development of traditional social movements seems to have some relevant insights. (this is not to say it's 1:1 applicable) https://balioc.tumblr.com/post/187004568356/your-ideology-if-it-gets-off-the-ground-at-all

(Disclaimer: I'm newer to the alignment problem than most here, I'm not an experienced researcher, just sharing this in case it helps)

Your ideology – if it gets off the ground at all – will start off with a core base of natural true believers.  These are the people for whom the ideology is made.  Unless it’s totally artificial, they are the people by whom the ideology is made.  It serves their psychological needs; it’s compatible with their temperaments; it plays to their interests and preferences.  They’re easy to recruit, because you’re offering something that’s pretty much tailor-made for them.   

This is the level at which ideological movements are the most diverse, in terms of human qualities.  Natural true believers are heavily selected, and different movements select for different things.  A natural true radical feminist is a very different creature from a natural true fascist, and neither of them looks very much like a natural true Hastur cultist.  

Life in a baby movement, populated entirely (or almost entirely) by natural true believers, can be pretty sweet.  You may not necessarily be getting a lot done, but you’re surrounded by kindred spirits, and that’s worth a lot by itself.

One of the most common ideological failure modes involves imagining that expansion is tantamount to “transforming outsiders into natural true believers.”  It’s not.  The population of natural true believers is a limited and precious resource, and while it’s theoretically possible to make more…if you have some truly gifted cultural engineers…it’s a difficult, costly, and failure-prone process at the best of times.  It doesn’t work at scale.  

You can grow, but the growth process necessarily involves attracting other kinds of people to your ideology.  And then it won’t be the same.  

Success, I think, requires some understanding of what growth is actually going to bring you, and being able to roll with those changes. 

**********

The first outsiders to flock to your banner will be the perpetual seekers – or, to put it less charitably, the serial converters.  These are the hipsters and connoisseurs of belief, the people who join movements because they really like joining movements.  

They’ll think that you and your doctrines are amazing, at least for a little while.  They’re primed for that.  But they get bored easily, and they like chasing after the high of new epiphanies.  Unless you figure out how to hold their attention in a sustained way, which requires constant work, they’ll drift off.  

This is the second-most-common way for a movement to die (after “never really getting anywhere in the first place”).  You attract a few interested seekers, but not enough of them to give you a foothold in less-accessible demographics, and after a while they just give up and move on.  If you’re lucky, they leave you with something like the original core of natural true believers, sadder but wiser after their experience trying to go big.  If you’re unlucky, they cause lots of drama and shred everything on the way out.  

These guys can be very annoying to natural true believers, but if you want to expand, you 100% absolutely need them.  If you’re smart, you’ll take precautions to make sure they don’t walk off with key pieces of your infrastructure. 

**********

If you display some serious growth potential, you start getting the profiteers, who don’t much care about your doctrine or your happy vibe but do care about that growth potential.  These are people who see your movement as a vehicle for their private ambitions, who want to sell you to the world and ride you all the way to the top.

…I’ve used some mercantile language here, but they’re not necessarily merchants trying to get rich, although that’s the prototype case I have in mind.  They may be going for political power, or simple fame, or all sorts of things.  Whatever it is they want, they think that you can help them get it, because your star is rising.  

In the long term, even the medium term, the profiteers can utterly wreck you if you’re not careful.  They tend to amass a lot of movement-internal power very fast, because they have big plans, and they promise concrete rewards quick.  But they usually don’t get whatever-it-is that the movement is really about, and even if they do get it, they don’t care as much as you do.  Their instinct is to make your Whole Thing as bland and generic and palatable as they can, so that they can sell it to the widest possible consumer base in the shortest possible timeframe.  This is a miserable and degrading experience, of course, but it’s also bad strategy in an eating-your-seed-corn kind of way.  The world gets a constant stream of bland generic palatable Hot New Things, and it chews through them fast.  There’s a future in being something genuinely weird enough to change the world; there’s no future in being last year’s fad.  The profiteers, however, aren’t interested in being careful shepherds of your movement’s power and credibility.  The arc of an individual’s career is not that long.  Consciously or otherwise, they are happy to burn you up as fuel for themselves.

In the short term, the profiteers are super awesome.  They will work tirelessly to help your movement grow, and they will do so in a very effective and practical-minded sort of way, without getting bogged down in the dysfunctions and the arcane abstract concerns that (probably) dominate your natural true believers.

Yes – these first three groups map roughly onto the geeks, MOPs, and sociopaths of that one Meaningness essay.  There’s a lot of applicable insight in there.  It’s important, however, that if your group is built around a serious ideology rather than a consumable toy, standard-issue Members of the Public aren’t going to come flocking to you during these early stages.  Members of the Public don’t adopt new ideologies that easily.  Your weirdos will be able to attract only other, different kinds of weirdos.  

**********

Close on the heels of the profiteers, you will get the exploiters.  Where the profiteers are trying to sell you to the world, the exploiters are trying to sell themselves to you; where the profiteers are trying to make your movement grow (for their own purposes), the exploiters see you as an environment that’s already big enough for them to thrive in it.  

Some of them are hucksters and con artists.  Some of them are, yes, sexual predators in the classic mold, going after a known population of unusually-naive unusually-vulnerable people who let their guard down around anyone speaking the right shibboleths.  (That describes pretty much any ideological movement at this stage.  Sorry.)

And some of them are just lonely people desperate to belong to something, who think that they’ve found your movement’s cheat codes for belonging.  Some of them are fetishist-types who don’t have the whatever-it-takes to be one of your natural true believers, but who admire or desire that thing, and hope that they can be around their favorite people and get a Your Movement GF or whatever.  

Often they’ll be harmless.  Sometimes they really, really, really won’t.  There will be more of them than you expect. 

At the very least, they’re a marker of success.  Apparently you’re worth exploiting!

**********

You’ll know that you’ve really made it, as a movement, when you start getting the fifth wave of converts: the status-mongers.  They’re joining up with you because they think it will be good for their social lives or their careers – not in an “I’m going to be the guy who gets rich off of this” kind of way, but in a much lower-key “this makes me look cool or smart or moral, this is good for my reputation” kind of way.  They want the generic approval that comes from being on the forefront of the zeitgeist, and apparently the forefront of the zeitgeist is where you are, now.  Congratulations. 

The arrival of the status-mongers represents a crisis point for your ideology.  There will be a lot of them; they’ll soon outnumber all your other people by an order of magnitude or more.  (Status-mongers attract more status-mongers, as each one makes it clearer to the world-at-large that your ideology is in fact cool.)  They will become the general public’s image of your movement, whether you like it or not.  Most of them definitely will not get your Whole Thing, not really.  They are interested mostly in being comfortable, in showing off to unenlightened mainstream audiences, and in using your doctrine as a cudgel to beat on their personal rivals.  

At this point you don’t really have to fear disappearing into obscurity, but you’re in more danger than ever of losing your way and becoming something totally alien.  The status-mongers will be doing their level best to make that happen.  You will also start attracting enemies far more powerful and dangerous than any you’ve known before.  Anything truly popular and high-status represents a threat to someone big.  You need to start prepping for persecution, culture war, and other varieties of large-scale social conflict.  

**********

If you can weather all that and come out on top, you finally get the sixth wave of converts, the big prize: the normies.  People will join your movement because that’s what everyone else is doing, because that’s what they’ve been taught, because they don’t want to stand out or make waves, because they don’t really care and you represent a plausible default.  

Most of the people out there are normies.  

That’s the endgame, the victory condition for an expansionist ideology: that you are the normies’ choice.  

**********

These are the groups that are out there.  This is what you’ll get, when you turn your gaze toward the path of growth.  This, and not whatever visions of radical social transformation dance before your eyes when you look at your beloved allies who are just like you.

Brace yourself for it.

comment by AGO · 2023-04-30T15:03:12.973Z · LW(p) · GW(p)

I’ve often heard and seen the divide people are talking about where people involved in the tech world are a lot more skeptical than your average non-tech friend. I’m curious what people think is the reason for this. Is the main claim that people in tech have been lulled into a false sense of security by familiarity? Or perhaps that they look down on safety concerns as coming from a lay audience scared by Terminator or vague sci fi ideas without understanding the technology deeply enough?

Replies from: judith
comment by irving (judith) · 2023-05-02T04:10:45.778Z · LW(p) · GW(p)

I honestly can't say. I wish I could.

comment by Dande · 2023-04-29T14:11:15.606Z · LW(p) · GW(p)

It’s true that we’re mostly in this situation because certain people heard about the arguments for risk and either came up with terrible solutions to them or smelled a potent fount of personal power.

A little new to the AI Alignment Field building effort, would you put head researchers at OpenAI in this category?

Replies from: judith
comment by irving (judith) · 2023-05-02T03:51:44.278Z · LW(p) · GW(p)

Hmm, not necessarily the researchers, but the founders undoubtedly. OpenAI was specifically formed to increase AI safety.

comment by RussellThor · 2023-04-29T02:52:48.608Z · LW(p) · GW(p)

Good article, I agree that we definitely need to try now and its likely that if we don't another group will take over the narrative.

I also think that it is important for people to know what they are working towards as well as away from. Imagining what a positive Singularity for them personally is like is something I think the general public should also start doing. Positive visions inspire people, we know that. To me its obvious that such a future would involve different groups with different values somewhat going their own ways. Thinking about it, that is about the only thing I can be sure of. Some people will obviously be much more enthusiastic for biological/tech enhancement than others, and of course living of earth. we agree that coherent extrapolated volition is important, its time we thought a bit about what its details are.