Ideas for improving epistemics in AI safety outreach

post by mic (michael-chen) · 2023-08-21T19:55:45.654Z · LW · GW · 6 comments

Contents

  What are some ways that AI safety field building may be epistemically unhealthy?
  Why are good epistemics valuable?
  Ideas to improve epistemics
None
6 comments

In 2022 and 2023, there has been a growing focus on recruiting talented individuals to work on mitigating the potential existential risks posed by artificial intelligence. For example, we’ve seen an increase in the number of university clubs, retreats, and workshops dedicated to introducing people to the issue of existential risk from AI.

However, these efforts might foster an environment with suboptimal epistemics. Given the goal of enabling people to contribute positively to AI safety, there’s an incentive to focus on that without worrying as much about whether our arguments are solid. Many people working on field building are not domain experts in AI safety or machine learning but are motivated due to a belief that AI safety is an important issue. Some participants may hold the belief that addressing the risks associated with AI is important, without fully comprehending their reasoning behind this belief or having engaged with strong counterarguments.

This post is a brief examination of this issue and suggests some ideas to improve epistemics in outreach efforts.

Note: I first drafted this in December 2022. Since then, concern about AI x-risk has been increasingly discussed in the mainstream, so AI safety field builders should hopefully be using fewer weird, epistemically poor arguments. Still, I think epistemics are still relevant to discuss after a recent post [EA · GW] noted poor epistemics in EA community building.

What are some ways that AI safety field building may be epistemically unhealthy?

Why are good epistemics valuable?

For the sake of epistemic rigor, I’ll also make a few possible arguments about why epistemics may be overrated.

Ideas to improve epistemics

6 comments

Comments sorted by top scores.

comment by Thomas Kwa (thomas-kwa) · 2023-08-22T16:09:59.953Z · LW(p) · GW(p)

Stay in touch with the broader ML community (e.g., by following them on Twitter, attending AI events)

I got a lot of value out of attending ICML and would probably recommend attending an ML conference to anyone who has the resources. You actually get to talk to authors about their field and research process, which gets a lot more than reading papers or reading Twitter.

Anyway, I think you missed one of the best ideas: actually trying to understand the arguments yourself and only using correct ones. An argument isn't correct just because it's "grounded" or "contemporary" although it is good to have supporting evidence. The steps all have to be locally valid and you have to make valid assumptions. Sometimes an argument needs to be slightly changed from the most common version to be valid [1], but this only makes it more important.

Community builders often don't do technical research themselves so my guess is it's easy to underinvest, but sometimes the required steps are as simple as listing out an argument in great detail and looking at it skeptically, or checking with someone who knows ML about whether a current ML system has some property that we assume and if we expect it to arise anytime soon.

[1]: two examples: making various arguments compatible with Reward is not the optimization target [LW · GW], and making coherence arguments work even though AI systems will not necessarily have a single fixed utility function

Replies from: BrianTan, michael-chen
comment by BrianTan · 2023-08-22T17:37:11.623Z · LW(p) · GW(p)

Your last sentence in the first paragraph seems to be cut off at "gets a lot more than"!

comment by mic (michael-chen) · 2023-08-22T19:26:18.904Z · LW(p) · GW(p)

Great point, I've added this suggestion to the post.

comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-08-22T11:29:49.226Z · LW(p) · GW(p)

. “An Overview of Catastrophic AI Risks” seems more grounded than Superintelligence,

 

Huh. I guess it is more "grounded" in the sense that it has that GIF of a boat going in circles, and some other links to other toy examples... but is it really more epistemicallly rigorous than Superintelligence? I think the answer is "obviously not, the opposite is true" though it's an unfair comparison since Superintelligence is a whole book written for an academic audience.

(This is a nitpick, I basically agree with your post overall)

comment by dr_s · 2023-08-22T07:42:49.341Z · LW(p) · GW(p)

Organizers may promote arguments for AI safety that may be (comparatively*) compelling yet flawed

 

I feel like there's an asymmetry here. "10% of researchers believe AI extinction is a possibility" isn't somehow offset by "but 90% don't". For such an outrageous claim, 10% is a huge number! Similarly, "maybe AIs won't be instrumentally convergent" is not enough here. "We are absolutely positive that we can build AIs that are not instrumentally convergent, and that no amount of unavoidable successive dumbass tinkering will suffice to change that" would be. Which is kind of what alignment research is about? Whenever people have a P(doom) lower than 100% (which is most people besides Yud), that margin usually lies somewhere in these possibilities. But even a P(doom) of 1% is stupid high and worth spending effort reducing further. 

comment by Herb Ingram · 2023-08-22T23:08:31.664Z · LW(p) · GW(p)

I think any outreach must start with understanding where the audience is coming from. The people most likely to make the considerable investment of "doing outreach" are in danger of being too convinced of their position and thinking it obvious; "how can people not see this?".

If you want to have a meaningful conversation with someone and interest them in a topic, you need to listen to their perspective, even if it sounds completely false and missing the point, and be able to empathize without getting frustrated. For most people to listen and consider any object level arguments about a topic they don't care about, there must first be a relationship of mutual respect, trust and understanding. Getting people to consider some new ideas, rather than convincing them of some cause, is already a very worthy achievement.