Does any of AI alignment theory also apply to supercharged humans?

post by acylhalide (samuel-shadrach) · 2021-10-07T14:43:55.322Z · LW · GW · 2 comments

This is a question post.

By supercharged human I mean a human who can access more compute power or memory via a brain-computer-interface, or a human that is capable of doing neurosurgery on oneself to edit memory, reasoning processeses or goal content.

 

One problem with feeding human goals to an AGI is that it will notice the inherent contradictions in goal content and delete the portions that are uninteresting or strictly dominated by other portions. Won't humans also do same thing to themselves if they were given means to become smarter? Delete some our own goals, using neurosurgery if absolutely necessary.

 

So maybe if we ourselves became more smarter, we'd also become more narrow-minded with more consistent goals - which would then be a lot more amenable to being fed into an AI. And then there wouldn't be an alignment problem. So the alignment problem might not be because AI is smart but because humans are stupid (or atleast stupid in the sense that we are unable to notice or meaningful resolve contradictions in our own goal content).

 

Is there any existing reading material along these lines?

Answers

2 comments

Comments sorted by top scores.

comment by Richard_Kennaway · 2021-10-07T16:10:06.108Z · LW(p) · GW(p)

I think it already applies to ordinarily highly charged humans. Consider the Great Dictators of the 20th century. Consider also "Reason as memetic immune disorder" [LW · GW], vs. "Taking Ideas Seriously" [LW · GW]. Humans are dangerous General Intelligences.

Replies from: samuel-shadrach
comment by acylhalide (samuel-shadrach) · 2021-10-07T16:46:56.006Z · LW(p) · GW(p)

Thank you for this! Will read.