What does success look like?

post by Raymond D · 2025-01-23T17:48:35.618Z · LW · GW · 0 comments

Contents

  Win/lose asymmetry
  Noticing the gaps
  Robust, agnostic work
  So what does success look like?
None
No comments

The general movement around AI safety is currently pursuing many different agendas. These are individually very easy to motivate with some specific story of how things could naturally go wrong. For example:

I would further claim that this is how most people tend to think about each agenda most of the time. If you’re bought in on x-risk, it’s much easier to describe very specific failures than it is to describe very specific success stories.

But in the long term, I think trying to avoid all the losing conditions is a bad strategy. So I believe it’s pretty useful for anyone working on existential risk to at least consider what the success story is.

Win/lose asymmetry

There are many ways to lose and many ways to win, but crucially, we need to avoid all of the paths to failure, whereas we only need to achieve one of the paths to success. 

Of course, as a matter of strategy, it’s probably smart to spread your bets especially when things are so uncertain, and it is important that we consistently avoid failure.

Still, I think it’s easy for individuals to slip into working based on more easily-motivated stories about failure, and I think this:

It’s also actually pretty hard to describe and discuss good paths to success.

Nonetheless, I think that if humanity does eventually succeed, it's likely to be because at some point someone had an actual plan for how to succeed which some people actually followed, rather than just continually dodging mistakes and putting out fires. We can only defer these questions for so long.

Noticing the gaps

I think really engaging with the question of what success looks like is pretty connected with actually noticing the gaps in our current approach.

My impression is that there are a few pretty huge unanswered questions in alignment, including things like:

There’s a natural pull towards working ‘under the streetlight’ on the problems that seem easier to solve. I think it’s easy to not notice you’re doing this, and my impression is that the most reliable way to get out of that trap is to have really thought about what it would mean to solve the whole problem. 

Robust, agnostic work

It’s also possible to do work that is just generally useful, with that as your goal. For example:

I am in favour of this. But even here, I think it’s useful to have thought pretty hard at some point about what this is all building up to. Otherwise, there’s a risk that you:

I also think it’s really important to notice the skulls — I believe there are pretty compelling cases that a lot of work on each of the ‘agnostic’ approaches I’ve described above has ended up causing more harm than good: growth that damages the community, dual-use research, and social pressure causing value drift, for example.

So what does success look like?

The goal of this piece is mainly to spur people towards asking this question for themselves, about the work that they’re doing. Nonetheless, I’ll try to give some examples that currently seem salient to me:

I think all of these proposals have serious challenges and need a lot more work. And I would really like for that work to happen.

0 comments

Comments sorted by top scores.