What is malevolence? On the nature, measurement, and distribution of dark traits
post by David Althaus (wallowinmaya), Chi Nguyen, Clare (claredianeharris7@gmail.com) · 2024-10-23T08:41:33.197Z · LW · GW · 23 commentsContents
23 comments
23 comments
Comments sorted by top scores.
comment by Ruby · 2025-02-08T02:54:55.448Z · LW(p) · GW(p)
Edit: we are not going to technically curate this post since it's an EA Forum crosspost and for boring technical reasons that breaks the curation email. I will leave this notice up though.
Curated. This piece definitely got me thinking. If we grant that some people are unusually altruistic, empathetic, etc., it stands to reason that there are others on the other end of various distributions. And then we should also expect various selection effects on where they end up.
It was definitely a puzzle piece clicking for me that these traits can coexist with [genuine] moral conviction and that the traits are egodystonic. This rings true but somehow hasn't been an explicit model for me, but yes. Combine with this the difficult of detecting these traits and resultant behaviors...and yeah, there's stuff here to think about.
I appreciate that the authors were thorough in their research but don't especially love the format. This was pretty dense and I think a post that pulled out the most key pieces of info and argued for some conclusions would be a better read, but I much prefer this to no post.
To the extent I should add my own opinions to curation notices, my thought is this makes me update against "benefit of the doubt" when witnessing concerning behaviors. I don't know that everyone beginning to scrutinize everyone else for having big D vibes would be good, but I do think scrutinizing behaviors for being high-integrity, cooperative, transparent, etc. might actually be a good direction – with the understanding that good norms around acceptable behaviors prevents abuses that anyone (however much D) is tempted towards. Something like we want to build "robust-to-malevolence" orgs and community that make it impractical or too costly to manipulate, etc.
↑ comment by DusanDNesic · 2025-02-10T12:43:32.516Z · LW(p) · GW(p)
I did somehow get this in my email, so it is curated?
Replies from: Ruby↑ comment by Ruby · 2025-02-10T18:31:02.315Z · LW(p) · GW(p)
Was there the text of the post in the email or just a link to it?
Replies from: DusanDNesic↑ comment by DusanDNesic · 2025-02-11T10:56:04.399Z · LW(p) · GW(p)
Oh, I didn't notice, but yeah, just a link to it, not the whole text!
comment by cousin_it · 2024-10-23T15:47:02.564Z · LW(p) · GW(p)
I'm not sure focusing on individual evil is the right approach. It seems to me that most people become much more evil when they aren't punished for it. A lot of evil is done by organizations, which are composed of normal people but can "normalize" the evil and protect the participants. (Insert usual examples such as factory farming, colonialism and so on.) So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.
Replies from: wallowinmaya, Mo Nastri↑ comment by David Althaus (wallowinmaya) · 2024-10-24T09:18:30.596Z · LW(p) · GW(p)
I agree that the problem of "evil" is multifactorial with individual personality traits being only one of several relevant factors, with others like "evil/fanatical ideologies" or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous.
It seems to me that most people become much more evil when they aren't punished for it. [...] So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.
Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section [EA · GW]).
Replies from: cousin_it↑ comment by cousin_it · 2024-10-24T14:28:58.622Z · LW(p) · GW(p)
I'm afraid in a situation of power imbalance these interpersonal differences won't matter much. I'm thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn't stop it.
To be clear, I'm not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they're very similar to humans) will just systemically treat us badly.
Replies from: wallowinmaya, Viliam↑ comment by David Althaus (wallowinmaya) · 2024-10-27T08:42:14.542Z · LW(p) · GW(p)
Thanks, I mostly agree.
But even in colonialism, individual traits played a role. For example, compare King Leopold II's rule over the Congo Free State vs. other colonial regimes.
While all colonialism was exploitative, under Leopold's personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.
Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms.
One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners).
In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.[1]
- ^
Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism.
↑ comment by cousin_it · 2024-10-27T09:42:26.743Z · LW(p) · GW(p)
The British weren't much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.
Colonialism didn't end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.
So I guess the difference between your view and mine is that I think colonialism kept going basically as long as it benefited the dominant group. Benevolence or malevolence didn't come into it much. And if we get back to the AI conversation, my view is that when AIs become more powerful than people and can use resources more efficiently, the systemic gradient in favor of taking everything away from people will be just way too strong. It's a force acting above the level of individuals (hmm, individual AIs) - it will affect which AIs get created and which ones succeed.
↑ comment by Viliam · 2024-10-25T14:42:41.341Z · LW(p) · GW(p)
I don't have enough data about it, but I think it is possible that these horrible mass behaviors start by some dark individuals doing it first... and others gradually joining them after observing that the behavior wasn't punished, and maybe that they kinda need to do the same thing in order to remain competitive.
In other words, the average person is quite happy to join some evil behavior that is socially approved, but there are individuals who are quite happy to initiate it. Removing those individuals from the positions of power could stop many such avalanches.
(In my model, the average person is kinda amoral -- happy to copy most behaviors of their neighbors, good and bad alike -- and then we have small fractions of genuinely good and genuinely bad people, who act outside the Overton window; plus we can make the society better or worse by incentives and propaganda. For example, punishing bad behavior will deter most people, and stories about heroes will inspire some.)
EDIT:
For example, you mention colonialism. Maybe most people approved of it, but only some of them made the decisions and organized it. Remove the organizers, and there is no colonialism. More importantly, I think that most people approved of having the colonies simply because it was the status quo. The average person's moral compass could probably be best described as "don't do weird things".
Replies from: cousin_it↑ comment by cousin_it · 2024-10-25T21:07:38.746Z · LW(p) · GW(p)
I think a big part of the problem is that in a situation of power imbalance, there's a large reward lying around for someone to do bad things - plunder colonies for gold, slaves, and territory; raise and slaughter animals in factory farms - as long as the rest can enjoy the fruits of it without feeling personally responsible. There's no comparable gradient in favor of good things ("good" is often unselfish, uncompetitive, unprofitable).
Replies from: Viliam↑ comment by Viliam · 2024-10-26T13:24:26.575Z · LW(p) · GW(p)
In theory, the reward for doing good should be prestige. (Which in turn may translate to more tangible rewards.) But that mostly works in small groups and doesn't scale well.
Some aspect of this seems like a coordination problem. Whatever is your personal definition of "good", you would probably approve of a system that gives good people some kind of prestige, at least among other good people.
For example, people may disagree about whether veganism is good or bad, but from a perspective of a vegan, it would be nice if vegans could have some magical "vegan mark" that would be unfalsifiable and immediately visible to other vegans. That way, you could promote your values not just by practicing and preaching your values, but also by rewarding other people who practice the same values. (For example, if you sell some products, you could give discounts to vegans. If many people start doing that, veganism may become more popular. Perhaps some people would criticize that as doing things for the wrong reasons, but the animals probably wouldn't mind.) Similarly, effective altruists would approve of rewarding effective altruists, open source developers would approve of rewarding open source developers, etc.
These things exist to some degree (e.g. the open source developers can put a link to their projects in a profile), but often the existing solutions don't scale well. If you only have dozen effective altruists, they know each other by name, but if you get thousands, this stops working.
One problem here is the association of "good" with "unselfish" and "non-judgmental", which suggests that good people rewarding other good people is somehow... bad? In my opinion, we need to rethink that, because from the perspective of incentives and reinforcement, that is utterly stupid. The reason for these memes is that the past attempts to reward good often led to... people optimizing to be seen as good, rather than actually being good. That is a serious problem that I don't know how to solve; I just have a strong feeling that going to the opposite extreme is not the right answer.
↑ comment by Mo Putera (Mo Nastri) · 2024-10-24T06:15:23.902Z · LW(p) · GW(p)
I agree; Eichmann in Jerusalem and immoral mazes come to mind.
comment by romeostevensit · 2024-10-24T18:16:43.949Z · LW(p) · GW(p)
I think this is an important topic and am glad to see substantial scholarship efforts on it.
Wrt AI relevance: I think the meme that it matters a lot who builds the self activating doomsday device has done potentially quite a bit of harm and may be a main contributor to what kills us.
Wrt people detecting these traits: I personally feel that the self domestication of humans has made us easier targets for such people, and undermined our ability to even think of doing anything about them. I don't think this is entirely random.
comment by David Gross (David_Gross) · 2024-10-23T16:07:20.393Z · LW(p) · GW(p)
shame—no need to exacerbate such feelings if it can be avoided
Shame [LW · GW] may be an important tool that people with dark traits can leverage to overcome those traits. Exacerbating it may in some cases be salutary.
Replies from: wallowinmaya↑ comment by David Althaus (wallowinmaya) · 2024-10-24T09:20:46.981Z · LW(p) · GW(p)
Thanks, good point! I suppose it's a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame ("carrot and stick") may be best.
comment by John Peng` (john-peng) · 2025-02-08T15:18:04.571Z · LW(p) · GW(p)
Its interesting to note the compability of malevolence with strong moral convictions, and in fact, is a personal conundrum that I have had for long time. In certain situations, malevolent personalities are not only conducive to power, but also instrumental to obtaining it. And in a utilitarian framework, if you discount the future rewards against your present acts of malevolence and find the former >> later, then would it not be imperative for you to pursue the malevolent path? The trickiest part is when these people are proven to be effective in outcomes. In modern corporate, Jobs is the epitomous example, single-handedly leading Apple after his return to one of the most valuable companies in the world. In 20th century history, Stalin bootstrapped an industrial powerhouse from an agrarian society, eventually becoming the dominant challenger to the post-WW2 world order. Malevolence in both of them are not only correlated but seems to be instrumental with their success, and I don't think we need to reach very far for other examples, both comtemporary and historical. So I guess the conundrum is, are bad people nessescary to do good things?
Replies from: wallowinmaya↑ comment by David Althaus (wallowinmaya) · 2025-02-09T15:22:54.264Z · LW(p) · GW(p)
So I guess the conundrum is, are bad people nessescary to do good things?
Hm, I don't think so. What about Lincoln, JFK, Roosevelt, Marcus Aurelius, Adenauer, etc.?
comment by ZY (AliceZ) · 2024-10-23T21:45:07.338Z · LW(p) · GW(p)
Amazingly detailed article covering malevolence, interaction with power, and the other nuances! Have been thinking of exploring similar topics, and found this very helpful. Besides the identified research questions, some of which I highly agree with, one additional question I was wondering is: do self-awareness of one's own malevolence factors help one to limit the malevolence factors? if so how effective would that be? how would this change when they have power?
Replies from: Viliam↑ comment by Viliam · 2024-11-11T15:48:00.195Z · LW(p) · GW(p)
do self-awareness of one's own malevolence factors help one to limit the malevolence factors?
Probably the effect would be nonlinear, like the evil people would just laugh, the average might get depressed and give up, and the mostly-good would strive to achieve perfection (or conclude that they are already good enough compared to others, and relax their efforts?).
Replies from: AliceZ↑ comment by ZY (AliceZ) · 2024-11-11T18:30:53.683Z · LW(p) · GW(p)
True. I wonder for the average people, if being self-aware would at least unconsciously be a partial "blocker" on the next malevolence action they might do, and that may evolve across time too (even if it may take a bit longer than a mostly-good)
comment by Vecn@tHe0veRl0rd · 2025-02-08T22:42:53.010Z · LW(p) · GW(p)
I took the Dark factor test and got a very low score, but I kept second-guessing myself on the answers. I did that because I wasn't sure what my actions in a real-life scenario would be. Even though I had good intentions and I believe that other people's well-being has inherent value, I would put a high probability that I would get at least a slightly higher score if this sort of test was a real-world test that I didn't know I was taking. That makes me pessimistic about the data that the authors cite in this article. If (for example) "over 16% of people agree or strongly agree that they 'would like to make some people suffer even if it meant that I would go to hell with them'" when they know they are being tested for malevolent traits, how many people actually would do that given the choice? Also - for people who believe in hell, I hope this question is scale insensitivity problem, since infinite time being tortured seems to me to have infinite negative utility, so you would need to value harming others more than helping yourself to agree with that statement.
comment by Alephwyr · 2025-02-08T04:47:13.628Z · LW(p) · GW(p)
This seems directly important and actionable in a way that most Less Wrong posts, while well considered and informative, are not. I have not acquitted myself well in the rationalist community so I would like to offer engagement with this article and its concepts as a surrogate for any engagement with me.