Is AI Alignment Impossible?

post by Heighn · 2022-06-10T10:08:00.347Z · LW · GW · No comments

This is a question post.

Contents

  Answers
    6 Charlie Steiner
    1 Mau Mau
    -13 Jeff Rose
None
No comments

I just skimmed through On the Controllability of Artificial Intelligence, and am wondering if others have read it and what they think about it. It made me quite scared.

In particular: is AI Alignment simply unsolvable/not fully solvable?

Answers

answer by Charlie Steiner · 2022-06-10T18:29:16.877Z · LW(p) · GW(p)

Thanks for the link!

I think there's space for the versions of "AI control" he lays out to be impossible, while it's still possible to build AI that makes the future go much better than it otherwise would have.

For example, one desideratum he has is that our current selves, "", shouldn't be bossed around (via the AI) by versions of ourselves that have e.g. gone through some simulated dispute-resolution procedure. Which is a defensible consequence of "control," but is I think way too strong if all we want is for the future to be good.

comment by Heighn · 2022-06-10T18:40:20.630Z · LW(p) · GW(p)

Thanks for your reaction!

I think this is generally my vision, after thinking about it a bit more, as well.

It also seems to me that if there's absolutely, really no way at all to make an agent starter than you do things that are good for you, then an agent that realizes that wouldn't FOOM.

answer by Jeff Rose · 2022-06-10T18:52:15.413Z · LW(p) · GW(p)

Yes, AI Alignment is not fully solvable.  In particular, if an AGI has the ability to self-improve arbitrarily and has a complicated utility function it will not be possible to guarantee that an aligned AI remains aligned.

No comments

Comments sorted by top scores.