AI safety content you could create

post by Adam Jones (domdomegg) · 2025-01-06T15:35:56.167Z · LW · GW · 0 comments

This is a link post for https://adamjones.me/blog/ai-safety-content/

Contents

  Communication of catastrophic AI safety problems outside alignment
  Case studies for analogous problems
  Plans for AI safety
  Defining things
None
No comments

This is a (slightly chaotic and scrappy) list of gaps in AI safety literature that I think would be useful/interesting to exist. I’ve broken it down into sections:

If you think there are articles that exist covering the topics described, please verify the articles you are thinking about do meet the criteria, and then tell me.

Communication of catastrophic AI safety problems outside alignment

I’ve previously written about how alignment is not all you need. And before me, others had written great things on parts of these problems [LW · GW]. Friends have written up articles on parts of the economic transition, and specifically the intelligence curse [LW · GW].

Few people appear to be working on these problems, despite them seeming extremely important and neglected - and plausibly tractable? This might be because:

  1. there is little understanding of these problems in the community;
  2. the problems don’t match the existing community’s skills and experiences;
  3. few people have started, so there aren’t obvious tractable in-roads to these problems; and
  4. there aren’t organisations / structures for people to fit in to work on these problems

Tackling the first two issues might be done by spreading these messages more clearly. Corresponding semi-defined audiences:

  1. The existing community. Funders or decision makers at AI policy orgs might be particularly useful.
  2. People who would be useful to add to the community. These might be experts who could help by working on these problems, or at least beginning to think about them (e.g. economists, politics and international relations scholars, military/field strategists). I suspect there are many things where people from these fields will see obvious things we are missing.

There is downside risk here. We want to be particularly careful not heating up race dynamics further, particularly in messaging to the general public or people likely to make race-y decisions. For this reason I’m more excited about spreading messages about the coordination problem and economic transition problem, than the power distribution problem (see my problem naming for more context).

Case studies for analogous problems

Related to the plans above, I think we could probably get a lot of insight into building better plans by looking at other case studies through time.

Unfortunately, a lot of existing resources on case studies are:

I think people should be flexible as to what they look into here (provided you expect it has relevance to the key problems in AI safety). Some questions I brainstormed were:

Plans for AI safety

For the last few weeks, I’ve been working on trying to find plans for AI safety. They should cover the whole problem, including the major hurdles after intent alignment. Unfortunately, this has not gone well - my rough conclusion is that there aren’t any very clear and well publicised plans (or even very plausible stories) for making this go well. (More context on some of this work can be found in BlueDot Impact’s AI safety strategist job posting).

In short: what series of actions might get us to a state of existential security, and ideally at a more granular level than ‘pause’ or ‘regulate companies’.

Things that are kind of going in this direction:

However, many of them stop (or are very vague) after preventing misalignment, or don’t describe how we will achieve the intended outcomes (e.g. bringing about a pause successfully). Additionally, while there has been criticism of some of the above plans, there is relatively little consensus building on these plans, or further development to improve the plans from the community.

Building a better plan, or improving on one of these plans (not just criticising where it fails) would be really valuable.

Defining things

I run BlueDot Impact’s AI safety courses. This involves finding resources that explain what AI safety people are talking about.

There are useful concepts that people in AI safety take for granted but there are ~no easy-to-find resources explaining them well. It’d be great to fix that.

I imagine these articles primarily as definitions. You might even want to create them as a LessWrong tag [? · GW], AISafety.info article, or Arbital page (although I’m not certain Arbital is still maintained?). They could be a good match for a Wikipedia page, although I don’t know if they’re big enough for this.

I think these are slightly less important to write than the other ideas, but might be a good low-stakes entrypoint.

0 comments

Comments sorted by top scores.