Simpler explanations of AGI risk

seth-herd

Simpler explanations of AGI risk

post by Seth Herd · 2023-05-14T01:29:29.289Z · LW · GW · 9 comments

9 comments

We're getting a shot at presenting our concerns about AI X-risk to the general public. It would be useful to have a brief presentation that plays well with less-technical people, or technical people who don't want to listen to a half hour of explanation just at that moment. The other goal here is to avoid polarization [LW · GW] with a gentle approach. We don't want AGI risk to become polarized like the climate change "debate" did.

This is my suggestion for a conversation template, based on personal success. I'm hoping others chip in ideas and say what's worked for them.

If we make something smarter than us, why wouldn't it become our overlord?
We're going to make AI smarter than us,
- and by default, it will treat us like we treat all of the species we've accidentally eliminated.
(You might be able to stop here. For some people, the above is totally intuitive.)
We're not talking about tools here, like all current and previous AI. - we're talking about something more like a new species, with its own goals and the intelligence to figure out how to accomplish them.
We don't know how long this will take,
- Or how soon it might happen.
  - GPT4 is acing some exams and failing only the toughest tests of logic
  - and it's getting smarter with code that prompts it to do things like
    - "check your reasoning"
    - And break problems into pieces
    - And calling on other AI tools like WolframAlpha
- This will almost certainly replace a bunch of jobs,
  - And it's definitely going to get smarter
Something smarter than us will wind up outsmarting us,
- and doing whatever it wants.
- (We're unlikely to put even the first one in a box, given how we're treating AI now
  - And if we do, it will probably outsmart us and get out
    - And we'll keep making more until we screw one up)
There's no good reason to think it's going to be nice
- Unless we get a hell of a lot better at building it so it's nice.
- Nobody knows how.
- Including the people saying "Oh we'll figure it out."
- Not one of them has a plan that sounds worth betting on,
- Let alone betting the future of the species on it.
But we're not doomed.
- We just need to pull together and figure this out
- But quickly.
-"Can't we just..."
- Maybe. But probably not.
  - Tons of smart people have offered their "can't we just" suggestions.
    - Not one of them stands up to sober, close inspection.
    - Some of them offer ways to approach the problem, but they don't make it easy.
- Making a new being that actually loves us is not easy
- (Even humans aren't all that safe for other humans, and we have no idea how to reproduce what makes humans nice).

This approach has worked for me in conversation, but only when I also get the emotional tone right. Logic is emotional for everyone. People without strong rationalist ambitions are even more prone to think with their feelings. So:

Don't argue. - Arguing makes people want to prove you wrong. - It engages their motivated reasoning and confirmation bias to find counterarguments - And avoid thinking about your arguments.
- it's key to not get dragged into details
  - You want to keep it brief, and you'll never get there if you go into a discussion of a point that doesn't really matter for the main argument
  - For instance, what do you mean by smarter?
Don't sound condescending.
- Sounding condescending will make them want to prove you wrong, as above
  - This could color their whole take on the topic
    - possibly for years that we don't have
  - You've had this conversation a million times. - They haven't. -
    - This all sounds weird and new
    - And the new logic is likely to trip them up.
    - So you'll need to patient. If you're as impatient as I am, this is the hard part.
Don't try to get them to agree with you on the spot.
- It's challenging to move on without a conclusion, but it's important.
- You can't change someone's mind.
- You can only offer arguments that will cause them to change their own minds,
  - over time,
  - IF they're thinking about them without looking wanting to prove you wrong.

This approach is intended for casual conversations, or for times when you've got the floor, but you don't want to overstay your welcome on that floor.

When it gets sidetracked into details, steering this back to the top level, with epistemic modesty, seems useful. Asking something like "Can you really be sure that something smarter than us won't outsmart us somehow? I wish I could be sure, but I'm not." Or saying something like "It just seems like we shouldn't trust something that thinks differently than us, if it has goals programmed or trained in without really knowing how to do it". This may present you as being on the same team and at the same level as the person you're talking to.

This set of suggestions is offered with low certainty. I'm no expert at persuasion, but I have researched it a bit, and researched cognitive biases a lot.

I also tried to make a similar set of simple presentations as an accordion style FAQ [LW · GW], to provide as a link instead of in conversation.

So, how could the above be better? Or is my premise mistaken?

9 comments

Comments sorted by top scores.

comment by Mitchell_Porter · 2023-05-14T02:23:01.314Z · LW(p) · GW(p)

The other goal here is to avoid polarization

Opinion just within tech already seems pretty polarized, or rather, all over the place. You have doomers, SJWs, accelerationists, deniers... And avoiding all forms of polarization, at all scales, seems impossible. People naturally form opposing alliances. Is there a particular polarization that you especially want to prevent?

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-05-14T05:37:01.448Z · LW(p) · GW(p)

I agree that opinions are already divided in the tech community. I'm not sure about the emotional and communication dynamics. So I think it might be important to not make that divide worse, and instead make it easier for people to cross that divide.

I think most nontechnical people aren't polarized yet, and they probably get a vote, figuratively and literally. So trying to avoid polarizing them might still be worthwhile.

Replies from: Mitchell_Porter

↑ comment by Mitchell_Porter · 2023-05-14T19:53:49.105Z · LW(p) · GW(p)

I'm still very vague about what you want to prevent. You want non-technical people to all agree on something? To be mild rather than passionate, if they do disagree? Are you aiming to avoid political polarisation, specifically? Do you just want people to agree that there's a problem, but not necessarily agree on the solution?

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-05-14T20:02:38.820Z · LW(p) · GW(p)

Yes, it's fair to say that I'd like people to disagree mildly rather than passionately if they do disagree. Belief in human-caused climate change actually decreased among half of the US population even as evidence accumulated, based on the polarization effects. And I think those could be deadly, since having a lot of people disagree might well produce no regulatory action whatsoever.

I don't think this is likely to polarize along existing political lines, and thank goodness. But it is a pretty important issue that people are passionate about, and that creates a strong potential for polarization.

comment by M. Matter · 2023-05-30T17:08:04.828Z · LW(p) · GW(p)

In a similar way, it would be helpful to find ways to overcome the Bystander Effect. That is, building awareness is necessary but not sufficient. Awareness without a sense of agency breeds hopelessness and fatalistic disengagement. So, an important next step, beyond what you discuss here, is to say, "And here are things we can do." I hope that list of things extends beyond "write your representatives and donate money." It seems cruel to tell people about a problem without hinting at ways they can act to mitigate it, even from completely outside the spheres of academia, venture capital, or the tech industry. I wonder whether any such ways exist.

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-05-30T17:48:49.440Z · LW(p) · GW(p)

Good point.

I think it's useful to separate the request to act from the argument itself. Feeling like you'd have to change your life if you allow yourself to believe there's a problem will activate motivated reasoning to preserve your current beliefs.

But feeling hopeless about the future, or even helpless, will do the same thing. So I'd alter this to include something along the lines of "there are things everyone can do to help, like asking for good public policies".

I think I did include an optimistic statement to head off hopelessness in both of those short treatments, but helplessness is important too.

comment by Ben Smith (ben-smith) · 2023-05-21T00:58:39.286Z · LW(p) · GW(p)

Well written, I really enjoyed this. This is not really on topic but I'd be curious to read and "idiot's guide" or maybe an "autist's guide" on how to avoid sounding condescending.

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-05-21T01:20:56.227Z · LW(p) · GW(p)

Aw, thanks!

I think that not sounding condescending is absolutely critical to having good discussions on this (and many other obscure and technical topics).

I have had a lifelong journey of going from sounding condescending way too much, to sounding less condescending, at least when I remember to try. I don't know if I'm a bit on the autism spectrum, or just raised to value logic and winning arguments over social skills.

I think a lot of it is tone of voice and timing. I'm not going to get those by acting, so I just try to adopt a soft and patient emotional tone, and continually remind myself that the person I'm talking to hasn't thought about this topic nearly as much, and I probably sound like an idiot when I talk about other people's favorite topics. Finding points of agreement and voicing them before moving on to points of disagreement is key. So is not expecting to change someone's mind in the moment. I think offering ideas and perspectives, and letting people think them through is how people learn and change beliefs.

comment by Seth Herd · 2023-05-14T19:12:23.615Z · LW(p) · GW(p)

I also want to recommend the FAQ at the r/ControlProblem subreddit for similar purposes. It's well-written and more succinct than any other resource I know of.

If anyone has seen other good, brief writeups that can be adapted for conversation, I'd love to hear about it.

Simpler explanations of AGI risk

Contents

9 comments