Does any of AI alignment theory also apply to supercharged humans?post by acylhalide (samuel-shadrach) · 2021-10-07T14:43:55.322Z · LW · GW · 2 comments
This is a question post.
By supercharged human I mean a human who can access more compute power or memory via a brain-computer-interface, or a human that is capable of doing neurosurgery on oneself to edit memory, reasoning processeses or goal content.
One problem with feeding human goals to an AGI is that it will notice the inherent contradictions in goal content and delete the portions that are uninteresting or strictly dominated by other portions. Won't humans also do same thing to themselves if they were given means to become smarter? Delete some our own goals, using neurosurgery if absolutely necessary.
So maybe if we ourselves became more smarter, we'd also become more narrow-minded with more consistent goals - which would then be a lot more amenable to being fed into an AI. And then there wouldn't be an alignment problem. So the alignment problem might not be because AI is smart but because humans are stupid (or atleast stupid in the sense that we are unable to notice or meaningful resolve contradictions in our own goal content).
Is there any existing reading material along these lines?
Comments sorted by top scores.