When do alignment researchers retire?

post by Jordan Taylor (Nadroj) · 2024-06-25T23:30:25.520Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    5 Seth Herd
None
1 comment

At what point will it no longer be useful for humans to be involved in the process of alignment research? After the first slightly-superhuman AGI, well into superintelligence, or somewhere in between?

Feel free to answer differently for different kinds of human involvement:

What do you envision we are doing between AGI and superintelligence?

Answers

answer by Seth Herd · 2024-06-26T04:23:18.625Z · LW(p) · GW(p)

All being dead? I don't think we'll necessarily get from AGI to ASI if we don't get the initial stages just right. This question sounds a bit blasé about our odds here. I don't think we're doomed, my point estimate of p(doom) is approximately 50%, but more importantly it's gaining uncertainty as I continue to learn more and take in more of the many interacting complex arguments and states of the world that I don't have enough expertise in to form a good estimate. And that's after spending a very substantial amount of time on the question. I don't think anyone has a good p(doom) estimate at this point.

I mention this before answering, because I think assuming success is the surest route to failure.

To take the question seriously, my point estimate is that some humans, and hopefully as many, will be doing technical alignment research for a few years between AGI and ASI. I think we'll be doing all of your latter three categories of research you mention; loosely, being in charge (for better or worse) and filling in gaps in AGI thinking at each point in its advancement. 

I think it's somewhat likely that we'll create AGI that roughly follows our instructions as it passes through a parahuman band (in which it is better than us at some cognitive tasks and worse at others). As it advances, alignment per se will be out of our hands. But as we pass through that band, human work on alignment will be at its most intense and most important. We'll know what sort of mind we're aligning, and what details of its construction and training might keep it on track or throw it off. 

If we do a good job with that critical risk period, we can, and more of us will, advance to the more fun parts of current alignment thinking: deciding what sort of fantastic future we want: what values we want AGI to follow for the long-term. If we get aligned human-plus AGI, and haven't destroyed the world yet through misalignment, misuse or human conflict with AGI-created superweapons, we'll have pretty good odds of making it for the long haul, doing a long reflection [LW · GW], and inviting everyone in on the fun parts of alignment.

If we do our jobs well, our retirement will be as slow as we care to make it. But there's much to do in the meantime. Particularly, right now.

1 comment

Comments sorted by top scores.

comment by ryan_greenblatt · 2024-06-26T03:47:07.689Z · LW(p) · GW(p)

It's plausible that reflection and figuring out what should happen with the future will be ongoing work among humans for tens or hundreds of years after the singularity.