Dehumanisation *errors*

post by Stuart_Armstrong · 2020-09-23T09:51:53.091Z · LW · GW · None comments


  The meaning of errors
No comments

In response to my post contrasting value learning with anthropomorphisation [LW · GW], steve2152 brought up the fact that dehumanisation can be seen as the opposite of anthropomorphisation [LW(p) · GW(p)].

I agree with this insight, but only when dehumanisation causes errors of interpretation. I was using empathy in the sense of "insight into the other agent", rather than "sympathy with the other agent".

In practice, dehumanisation does tend to cause errors. We see outgroups as more homogeneous, coherent, and organised than they actually are. Despite the suave psychopaths depicted in movies, psychopaths tend to be less effective at achieving their goals (as evidenced by the large number of psychopaths in prison). Torturers are less effective at extracting true information than classical interrogators.

Now, it's not a universal law by any means, but it does seem that dehumanisation can often lead to errors, and from that perspective can be seen as a failure of value learning.

The meaning of errors

This a very valid point, strawman, but I've also pointed out [LW · GW] that human theory of mind/empathy is very similar from human to human, and tends to agree with how we interpret our own goals. Because of this, there is a rough "universal human theory of mind", ie a universal way of going from human policy to human preferences.

When I'm talking about errors, I'm talking about deviations from this ideal[1].

  1. Because human theories of mind do not agree perfectly, there will always be an irreducible level of uncertainty in this ideal, but there is agreement on the broad strokes of it. ↩︎

None comments

Comments sorted by top scores.