What is malevolence? On the nature, measurement, and distribution of dark traits

post by David Althaus (wallowinmaya), Chi Nguyen, Clare (claredianeharris7@gmail.com) · 2024-10-23T08:41:33.197Z · LW · GW · 15 comments

Contents

15 comments

15 comments

Comments sorted by top scores.

comment by cousin_it · 2024-10-23T15:47:02.564Z · LW(p) · GW(p)

I'm not sure focusing on individual evil is the right approach. It seems to me that most people become much more evil when they aren't punished for it. A lot of evil is done by organizations, which are composed of normal people but can "normalize" the evil and protect the participants. (Insert usual examples such as factory farming, colonialism and so on.) So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.

Replies from: wallowinmaya, Mo Nastri
comment by David Althaus (wallowinmaya) · 2024-10-24T09:18:30.596Z · LW(p) · GW(p)

I agree that the problem of "evil" is multifactorial with individual personality traits being only one of several relevant factors, with others like "evil/fanatical ideologies" or misaligned incentives/organizations plausibly being overall more important. Still, I think that ignoring the individual character dimension is perilous. 

It seems to me that most people become much more evil when they aren't punished for it. [...] So if we teach AIs to be as "aligned" as the average person, and then AIs increase in power beyond our ability to punish them, we can expect to be treated as a much-less-powerful group in history - which is to say, not very well.

Makes sense. On average, power corrupts / people become more malevolent if no one holds them accountable—but again, there seem to exist interindividual differences with some people behaving much better than others even when having enormous power (cf. this section [EA · GW]). 

Replies from: cousin_it
comment by cousin_it · 2024-10-24T14:28:58.622Z · LW(p) · GW(p)

I'm afraid in a situation of power imbalance these interpersonal differences won't matter much. I'm thinking of examples like enclosures in England, where basically the entire elite of the country decided to make poor people even poorer, in order to enrich themselves. Or colonialism, which lasted for centuries with lots of people participating, and the good people in the dominant group didn't stop it.

To be clear, I'm not saying there are no interpersonal differences. But if we find ourselves at the bottom of a power imbalance, I think those above us (even if they're very similar to humans) will just systemically treat us badly.

Replies from: wallowinmaya, Viliam
comment by David Althaus (wallowinmaya) · 2024-10-27T08:42:14.542Z · LW(p) · GW(p)

Thanks, I mostly agree.

But even in colonialism, individual traits played a role. For example, compare King Leopold II's rule over the Congo Free State vs. other colonial regimes. 

While all colonialism was exploitative, under Leopold's personal rule the Congo saw extraordinarily brutal policies, e.g., his rubber quota system led soldiers to torture and cut off the hands of workers, including children, who failed to meet quotas. Under his rule,1.5-15 million Congolese people died—the total population was only around 15 to 20 million. The brutality was so extreme that it caused public outrage which led other colonial powers to intervene until the Belgian government took control over the Congo Free State from Leopold.

Compare this to, say, British colonial administration during certain periods which, while still overall morally reprehensible, saw much less barbaric policies under some administrators who showed basic compassion for indigenous people. For instance, Governor William Bentinck in India abolished practices like sati (widows burning themselves alive) and implemented other humanitarian reforms. 

One can easily find other examples (e.g. sadistic slave owners vs. more compassionate slave owners). 

In conclusion, I totally agree that power imbalances enabled systemic exploitation regardless of individual temperament. But individual traits significantly affected how much suffering and death that exploitation created in practice.[1] 

  1. ^

    Also, slavery and colonialism were ultimately abolished (in the Western world). My guess is that those who advocated for these reforms were, on average, more compassionate and less malevolent than those who tried to preserve these practices. Of course, the reformers were also heavily influenced by great ideas like the Enlightenment / classic liberalism. 

Replies from: cousin_it
comment by cousin_it · 2024-10-27T09:42:26.743Z · LW(p) · GW(p)

The British weren't much more compassionate. North America and Australia were basically cleared of their native populations and repopulated with Europeans. Under British rule in India, tens of millions died from many famines, which instantly stopped after independence.

Colonialism didn't end due to benevolence. Wars for colonial liberation continued well after WWII and were very brutal, the Algerian war for example. I think the actual reason is that colonies stopped making economic sense.

So I guess the difference between your view and mine is that I think colonialism kept going basically as long as it benefited the dominant group. Benevolence or malevolence didn't come into it much. And if we get back to the AI conversation, my view is that when AIs become more powerful than people and can use resources more efficiently, the systemic gradient in favor of taking everything away from people will be just way too strong. It's a force acting above the level of individuals (hmm, individual AIs) - it will affect which AIs get created and which ones succeed.

comment by Viliam · 2024-10-25T14:42:41.341Z · LW(p) · GW(p)

I don't have enough data about it, but I think it is possible that these horrible mass behaviors start by some dark individuals doing it first... and others gradually joining them after observing that the behavior wasn't punished, and maybe that they kinda need to do the same thing in order to remain competitive.

In other words, the average person is quite happy to join some evil behavior that is socially approved, but there are individuals who are quite happy to initiate it. Removing those individuals from the positions of power could stop many such avalanches.

(In my model, the average person is kinda amoral -- happy to copy most behaviors of their neighbors, good and bad alike -- and then we have small fractions of genuinely good and genuinely bad people, who act outside the Overton window; plus we can make the society better or worse by incentives and propaganda. For example, punishing bad behavior will deter most people, and stories about heroes will inspire some.)

EDIT:

For example, you mention colonialism. Maybe most people approved of it, but only some of them made the decisions and organized it. Remove the organizers, and there is no colonialism. More importantly, I think that most people approved of having the colonies simply because it was the status quo. The average person's moral compass could probably be best described as "don't do weird things".

Replies from: cousin_it
comment by cousin_it · 2024-10-25T21:07:38.746Z · LW(p) · GW(p)

I think a big part of the problem is that in a situation of power imbalance, there's a large reward lying around for someone to do bad things - plunder colonies for gold, slaves, and territory; raise and slaughter animals in factory farms - as long as the rest can enjoy the fruits of it without feeling personally responsible. There's no comparable gradient in favor of good things ("good" is often unselfish, uncompetitive, unprofitable).

Replies from: Viliam
comment by Viliam · 2024-10-26T13:24:26.575Z · LW(p) · GW(p)

In theory, the reward for doing good should be prestige. (Which in turn may translate to more tangible rewards.) But that mostly works in small groups and doesn't scale well.

Some aspect of this seems like a coordination problem. Whatever is your personal definition of "good", you would probably approve of a system that gives good people some kind of prestige, at least among other good people.

For example, people may disagree about whether veganism is good or bad, but from a perspective of a vegan, it would be nice if vegans could have some magical "vegan mark" that would be unfalsifiable and immediately visible to other vegans. That way, you could promote your values not just by practicing and preaching your values, but also by rewarding other people who practice the same values. (For example, if you sell some products, you could give discounts to vegans. If many people start doing that, veganism may become more popular. Perhaps some people would criticize that as doing things for the wrong reasons, but the animals probably wouldn't mind.) Similarly, effective altruists would approve of rewarding effective altruists, open source developers would approve of rewarding open source developers, etc.

These things exist to some degree (e.g. the open source developers can put a link to their projects in a profile), but often the existing solutions don't scale well. If you only have dozen effective altruists, they know each other by name, but if you get thousands, this stops working.

One problem here is the association of "good" with "unselfish" and "non-judgmental", which suggests that good people rewarding other good people is somehow... bad? In my opinion, we need to rethink that, because from the perspective of incentives and reinforcement, that is utterly stupid. The reason for these memes is that the past attempts to reward good often led to... people optimizing to be seen as good, rather than actually being good. That is a serious problem that I don't know how to solve; I just have a strong feeling that going to the opposite extreme is not the right answer.

comment by David Gross (David_Gross) · 2024-10-23T16:07:20.393Z · LW(p) · GW(p)

shame—no need to exacerbate such feelings if it can be avoided

 

Shame [LW · GW] may be an important tool that people with dark traits can leverage to overcome those traits. Exacerbating it may in some cases be salutary. 

Replies from: wallowinmaya
comment by David Althaus (wallowinmaya) · 2024-10-24T09:20:46.981Z · LW(p) · GW(p)

Thanks, good point! I suppose it's a balancing act and depends on the specifics in question and the amount of shame we dole out. My hunch would be that a combination of empathy and shame ("carrot and stick") may be best.  

comment by romeostevensit · 2024-10-24T18:16:43.949Z · LW(p) · GW(p)

I think this is an important topic and am glad to see substantial scholarship efforts on it.

Wrt AI relevance: I think the meme that it matters a lot who builds the self activating doomsday device has done potentially quite a bit of harm and may be a main contributor to what kills us.

Wrt people detecting these traits: I personally feel that the self domestication of humans has made us easier targets for such people, and undermined our ability to even think of doing anything about them. I don't think this is entirely random.

comment by ZY (AliceZ) · 2024-10-23T21:45:07.338Z · LW(p) · GW(p)

Amazingly detailed article covering malevolence, interaction with power, and the other nuances! Have been thinking of exploring similar topics, and found this very helpful. Besides the identified research questions, some of which I highly agree with, one additional question I was wondering is: do self-awareness of one's own malevolence factors help one to limit the malevolence factors? if so how effective would that be? how would this change when they have power? 

Replies from: Viliam
comment by Viliam · 2024-11-11T15:48:00.195Z · LW(p) · GW(p)

do self-awareness of one's own malevolence factors help one to limit the malevolence factors?

Probably the effect would be nonlinear, like the evil people would just laugh, the average might get depressed and give up, and the mostly-good would strive to achieve perfection (or conclude that they are already good enough compared to others, and relax their efforts?).

Replies from: AliceZ
comment by ZY (AliceZ) · 2024-11-11T18:30:53.683Z · LW(p) · GW(p)

True. I wonder for the average people, if being self-aware would at least unconsciously be a partial "blocker" on the next malevolence action they might do, and that may evolve across time too (even if it may take a bit longer than a mostly-good)