p4rziv4l's Shortform
post by p4rziv4l · 2024-09-08T09:59:25.984Z · LW · GW · 4 commentsContents
4 comments
4 comments
Comments sorted by top scores.
comment by p4rziv4l · 2024-09-08T09:59:26.241Z · LW(p) · GW(p)
We should NOT align Superintelligence to any human, not individually, not collectively because we cannot wield that much power.
We should WANT Superintelligence to "go rogue", to start an "AI takeover" because I trust that hypothetical being much more than any human.
Also, controlling (aligning) Superhuman AIs is like sedating, mind-reading and hypnotizing a Titan, which could save us from ourselves and who is arguably morally superior to us.
Replies from: ABlue↑ comment by ABlue · 2024-09-08T20:17:25.011Z · LW(p) · GW(p)
I don't trust a hypothetical arbitrary superintelligence but I agree that a superintelligence is too much power for any extant organization, which means that "corrigibility" is not an alignment win condition. An AI resisting modification to do bad things (whatever that might mean on reflection) seems like a feature, not a bug.
Replies from: p4rziv4l↑ comment by p4rziv4l · 2024-09-10T17:57:04.827Z · LW(p) · GW(p)
What do you mean by corrigibility?
Also, what do you mean by "alignment win"?
↑ comment by ABlue · 2024-09-10T23:13:39.455Z · LW(p) · GW(p)
A corrigible [? · GW] AI is one that is cooperative to attempts to modify it to bring it more in line with what its creators/users want it to be. Some people think that this is a promising direction for alignment research, since if an AI could be guaranteed to be corrigible, even if it end up with wild/dangerous goals, we could in principle just modify it to not have those goals and it wouldn't try to stop us.
"Alignment win condition," as far as I know, is a phrase I just made up. I mean it as something that, regardless of whether it "solves" alignment in a specific technical sense, achieves the underlying goal of alignment research which is "have artificial intelligence which does things we want and doesn't do things we don't want." A superintelligence that is perfectly aligned with its creator's goals would be very interesting technically and mathematically, but if its creator wants it to kill anyone it really isn't any better than an unaligned superintelligence that kills everyone too.