Enemies vs Malefactors

post by So8res · 2023-02-28T23:38:11.594Z · LW · GW · 60 comments

60 comments

Comments sorted by top scores.

comment by Richard Korzekwa (Grothor) · 2023-03-01T04:07:46.760Z · LW(p) · GW(p)

It's maybe fun to debate about whether they had mens rea, and the courts might care about the mens rea after it all blows up, but from our perspective, the main question is what behaviors they’re likely to engage in, and there turn out to be many really bad behaviors that don’t require malice at all.

I agree this is the main question, but I think it's bad to dismiss the relevance of mens rea entirely. Knowing what's going on with someone when they cause harm is important for knowing how best to respond, both for the specific case at hand and the strategy for preventing more harm from other people going forward.

I used to race bicycles with a guy who did some extremely unsportsmanlike things, of the sort that gave him an advantage relative to others. After a particularly bad incident (he accepted a drink of water from a rider on another team, then threw the bottle, along with half the water, into a ditch), he was severely penalized and nearly kicked off the team, but the guy whose job was to make that decision was so utterly flabbergasted by his behavior that he decided to talk to him first. As far as I can tell, he was very confused about the norms and didn't realize how badly he'd been violating them. He was definitely an asshole, and he was following clear incentives, but it seems his confusion was a load-bearing part of his behavior because he appeared to be genuinely sorry and started acting much more reasonably after.

Separate from the outcome for this guy in particular, I think it was pretty valuable to know that people were making it through most of a season of collegiate cycling without fully understanding the norms. Like, he knew he was being an asshole, but he didn't really get how bad it was, and looking back I think many of us had taken the friendly, cooperative culture for granted and hadn't put enough effort into acculturating new people.

Again, I agree that the first priority is to stop people from causing harm, but I think that reducing long-term harm is aided by understanding what's going on in people's heads when they're doing bad stuff.

comment by Zack_M_Davis · 2023-03-01T08:10:08.909Z · LW(p) · GW(p)

I suggest minting a new word, for people who have the effects of malicious behavior, whether it's intentional or not.

Why only malicious behavior? It seems like the relevant idea is more general: oftentimes we care about what outcomes a pattern of behavior looks optimized to achieve in the world, not about the person's conscious subjective verbal narrative. (Separately from whether we think those outcomes are good or bad.)

Previously, I had suggested "algorithmic" intent [LW · GW], as contrasted to "conscious" intent. Claims about algorithmic intent correspond to predictions about how the behavior responds to interventions. Mistakes that don't repeat themselves when corrected are probably "honest mistakes." "Mistakes" that resist correction, that systematically steer the future in a way that benefits the actor, are probably algorithmically intentional.

Replies from: Linch
comment by Linch · 2023-03-02T00:02:54.217Z · LW(p) · GW(p)

"Mistakes" that resist correction, that systematically steer the future in a way that benefits the actor, are probably algorithmically intentional.

is benefits the actor here load-bearing for you (as opposed to just predictably bad for others)? I can think of examples of situations that rarely benefit the actor but seem unlikely to be talked out of (e.g. temper tantrums at the workplace are rarely selfishly positive in professional Western contexts).

Replies from: Zack_M_Davis
comment by Zack_M_Davis · 2023-03-02T03:35:53.945Z · LW(p) · GW(p)

Sorry, not load-bearing; I think "steering the future" was the important part of that sentence.

Although in the case of tantrums, I think the game-theoretic logic is pretty clear: if I predictably make a fuss when I don't get my way, then people who don't want me to make a fuss are more likely to let me get my way (to a point). The fact that tantrums don't benefit the actor when they happen, isn't itself enough to show that they're not being used to successfully extort concessions to make them happen less often. If it doesn't work in the modern workplace, it probably worked in the environment of evolutionary adaptedness.

Replies from: martin-randall
comment by Martin Randall (martin-randall) · 2023-03-15T23:54:52.339Z · LW(p) · GW(p)

Sometimes also tantrums work in the training distribution of childhood and don't work in the deployment environment of professional work.

comment by RHollerith (rhollerith_dot_com) · 2023-03-01T02:45:42.651Z · LW(p) · GW(p)

I suggest minting a new word, for people who have the effects of malicious behavior, whether it’s intentional or not.

I've long used "destructive" for that.

Replies from: MichaelStJules, bortrand, Korz, ivy-mazzola, ivy-mazzola
comment by MichaelStJules · 2023-03-01T15:26:53.877Z · LW(p) · GW(p)

Harmful, maybe? Not all harms involve destruction (physical or relationships, etc.).

comment by bortrand · 2023-03-16T21:21:29.663Z · LW(p) · GW(p)

I recently started making a similar distinction in my life and using the word “toxic”

Replies from: Raemon
comment by Raemon · 2023-03-16T22:27:00.711Z · LW(p) · GW(p)

I don't like the word "toxic" because it's kind of essentialist without exposing actual causes/effects/mechanisms/inputs-outputs. I think it's useful sometimes as shorthand between people who have a high degree of agreement on what "toxic" means in a given context, but it's sort of a slippery word.

comment by Mart_Korz (Korz) · 2023-03-01T20:53:22.361Z · LW(p) · GW(p)

I also like "problematic" - it could be used as a 'we are not yet quite sure about how bad this is' version of "destructive"

Replies from: None, going-durden
comment by [deleted] · 2023-03-02T04:22:25.090Z · LW(p) · GW(p)

Problematic is already associated with bigotry and I don't think invoking a political frame is helpful for these sorts of situations.

Replies from: ivy-mazzola
comment by Ivy Mazzola (ivy-mazzola) · 2023-03-02T06:26:19.410Z · LW(p) · GW(p)

I don't think it does invoke a political frame if you use it right but perhaps I have too much confidence in how I've used the term

comment by Going Durden (going-durden) · 2023-03-20T13:11:58.205Z · LW(p) · GW(p)

problematic does not differentiate between "bad", "harmful" and "difficult". Replacing the carbouretor in Honda Civic with only a spatula and a corksrew for tools is problematic, but not necessarily harmful or bad.

comment by Ivy Mazzola (ivy-mazzola) · 2023-03-02T06:32:23.947Z · LW(p) · GW(p)

Maybe "troubling"

comment by Ivy Mazzola (ivy-mazzola) · 2023-03-02T06:25:27.294Z · LW(p) · GW(p)

I use problematic

comment by Vladimir_Nesov · 2023-03-01T04:07:13.306Z · LW(p) · GW(p)

Labeling (in particular) catastrophically incompetent people "maleficient" sounds malevolent. While the concern might be valid in theory, this label has connotations that probably don't help with the inherent practical witch hunt and reign of terror risks of the whole concept.

Also, the apparent Chesterton-Schelling fences [LW · GW] my intuition is loudly hallucinating at this post say to stop before instituting a habit of using such classification. Immediately-decision-relevant concepts are autonomous superweapons, controversial norms that resist attempts at keeping their boundaries in reasonable/intended places.

Replies from: Lukas_Gloor
comment by Lukas_Gloor · 2023-03-01T17:52:59.925Z · LW(p) · GW(p)

My stance is "the more we promote awareness of the psychological landscape around destructive patterns of behavior, the better." This isn't necessarily at odds with what you're saying because "the psychological landscape" is a descriptive thing, whereas your objection to Nate's proposal is that it seeks to be "immediately-decision-relevant," i.e., that it's normative (or comes with direct normative implications). 

So, maybe I'd agree that "maleficient" might be slightly too simplistic of a classification (because we may want to draw action-relevant boundaries in different places depending on the context – e.g., different situations call for different degrees of risk tolerance of false positives vs. false negatives). 

That said, I think there's an important message in Nate's post and (if I had to choose one or the other) I'm more concerned about people not internalizing that message than about it potentially feeding ammunition to witch hunts. (After all, someone who internalizes Nate's message will probably become more concerned about the possibility of witch hunts – if only explicitly-badly-intentioned people instigated witch hunts or added fuel to the fires, history would look very different.)

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-03-02T04:41:21.968Z · LW(p) · GW(p)

"maleficient" might be slightly too simplistic of a classification

There is an interesting phenomenon around culture wars where a crazy amount of concepts is generated to describe the contested territory with mind-boggling nuance. I have a hunch that this is not just expertise signaling, but actually useful for dissolving the conceptual superweapons in a sea of distinctions. This divests the original contentious immediately-decision-relevant concept of its special role that gives it power, by replacing it with a hundred slightly-decision-relevant distinctions where none of them have significant power.

A disagreement that was disputing a definition about placement of its boundaries becomes a disagreement about decision procedures in terms of many unchanging and uncontroversial definitions that cover all contested territory in detail. After the dispute is over, most of the technical distinctions can once again be discarded.

Replies from: Avnix
comment by Sweetgum (Avnix) · 2023-03-11T13:29:45.942Z · LW(p) · GW(p)

Could you give some examples? I understand you may not want to talk about culture war topics on lesswrong, so it's fine if you decline, but without examples I unfortunately cannot picture what you're talking about

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2023-03-11T15:56:45.173Z · LW(p) · GW(p)

so it's fine if you decline

The cost of this statement is feeding the frame where it's not necessarily fine.

comment by PeterMcCluskey · 2023-03-01T02:48:49.545Z · LW(p) · GW(p)

Humans care about this stuff enough to bake it into their legal codes.

It's mostly Western culture that does this. There's a lot of variation [LW(p) · GW(p)] in how much cultures care about bad intentions.

comment by Richard_Kennaway · 2023-03-01T08:19:51.378Z · LW(p) · GW(p)

IANAL, but I believe that the doctrine of mens rea is different from what is suggested here, and the difference has application to the larger context.

The mens rea is simply the intention to have done the actus reus, the illegal act. If, for example, a company director puts their signature to a set of false accounts, knowing they are false, then there is mens rea. It will cut no ice in court for them to profess that "my goodness, I didn't know that was illegal!", or "oh, but surely that wasn't really fraud", or "but it was for a vital cause!"

What matters is that they did the thing, intending to do the thing.

I suggest minting a new word, for people who have the effects of malicious behavior

I thought that "toxic" was the usual word these days.

Replies from: Archimedes, going-durden
comment by Archimedes · 2023-03-03T03:05:52.080Z · LW(p) · GW(p)

IANAL either but I do know that certain crimes explicitly do hinge on the perpetrator's knowledge that what they did was illegal, not just that they intended to do it. This isn't common but does apply to some areas with complex legislation like tax evasion and campaign finance. As a high-profile example, Trump Jr. was deemed "too dumb to prosecute" for campaign finance violations.

More generally, there are multiple levels of mens rea. Some crimes require no intent to prosecute ("strict liability"). For those that do, they can be categorized into four levels of increasing severity: acting negligently, acting recklessly, acting knowingly, and acting purposefully. This list is not universal though it is representative. Some US states refer to express/implied "malice".

I understand So8res to be saying that we can treat toxic behavior on a strict liability basis without deciding what level of knowledge and intent to assign the offender.

comment by Going Durden (going-durden) · 2023-03-20T13:16:14.758Z · LW(p) · GW(p)

I think "toxic" is more narrow: it hints at indirect, social, and emotional damage, and does not work well as  term in situations that are just pragmatic in nature.

comment by [DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-03-01T09:19:51.705Z · LW(p) · GW(p)

Copied text from a Facebook post that feels related (separating intent from result):

In Duncan-culture, there are more mistakes you're allowed to make, up-front, with something like "no fault."

e.g. the punch bug thing—if you're in a context where lots of people play punch bug, then you're not MORALLY CULPABLE if you slug somebody on the shoulder and then they say "Ouch, I don't like that, do not do that."

(You're morally culpable if you do it again, after their clear boundary, but Duncan-culture has more wiggle room for first-trespasses.)

However, Duncan-culture is MORE strict about something like ...

"I hurt people! But it's okay, I patched the dynamic that led to the hurt. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay, because I isolated and fixed that set of mistakes, too. But then I hurt other people! But it's okay..."

In Duncan-culture, you can get away with about two rounds of that. On the third screwup, pretty much everybody joins in to say "no. Stop. You are clearly just capable of inventing new mistakes every time. Cease this iterative process."

And if you don't—if you keep going, making a different error with a similar result every time—

In Duncan-culture, the resulting harm on rounds three and beyond is treated as, essentially, deliberate/intentional. Because the result was predictable, and this fact failed to move you.

This is not, as far as I can tell, robustly/reliably true in the broader culture I'm currently a part of.

EDIT: More disambiguation:

We give people protection, socially speaking, when we consider them to have had good intentions, but to have made a mistake with tragic results.

In Duncan-culture, you can't really get that protection three times in a row for three similar results. If you do A and it leads to X, that's just a mistake and we treat you sympathetically/generously. If you then do B and it leads to X, well, plausibly your first patch wasn't good enough, but like, okay, things are hard, your good intentions shine through, fair game. But if you then do C and it leads to X, all future X's resulting from D and E and so on are considered "your fault" in the not-excusable-as-a-mistake way. Good intentions cease to matter after three different Xings; your job now is to do whatever it takes to avoid more X, or to accept full responsibility for all future X, approximately as if you caused X on purpose/decided X was a side effect you felt worth causing.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2023-03-01T20:34:16.988Z · LW(p) · GW(p)

In Duncan-culture, when people say "no. Stop", what's the thing that they're saying should stop?

Replies from: Duncan_Sabien
comment by [DEACTIVATED] Duncan Sabien (Duncan_Sabien) · 2023-03-01T20:43:07.922Z · LW(p) · GW(p)

In this specific case, I was writing about a colleague who kept hurting people in their attempts to help them with rationality. They kept managing to hurt people in novel and interesting ways, every time they patched the previous failure mode. "No. Stop." would be in reference to "stop fiddling with people's brains in this way."

Similarly, Brent Dill had in fact been doing different damages to each of his romantic partners, but eventually the Berkeley community was like "no, we are horrified, we don't care if you're not making those specific mistakes anymore, we do not trust you to not make new ones." In that case "No. Stop." was in reference to "dating any of the women in our community."

comment by Linch · 2023-03-01T01:15:50.261Z · LW(p) · GW(p)

One hypothesis I have for why people care so much about some distinction like this is that humans have social/mental modes for dealing with people who are explicitly malicious towards them, who are explicitly faking cordiality in attempts to extract some resource. And these are pretty different from their modes of dealing with someone who's merely being reckless or foolish. So they care a lot about the mental state behind the act.

[...]

On this theory, most people who are in effect trying to exploit resources from your community, won't be explicitly malicious, not even in the privacy of their own minds. (Perhaps because the content of one’s own mind is just not all that private; humans are in fact pretty good at inferring intent from a bunch of subtle signals.) Someone who could be exploiting your community, will often act so as to exploit your community, while internally telling themselves lots of stories where what they're doing is justified and fine.

I note that while I find both paragraphs individually reasonable [and I find myself nodding along to them], there seems to be a soft contradiction between them that needs explanation.

Namely, why is human (whether genetic or cultural) evolution maladaptive? "Which humans are bad allies" seems to be close to centrally the problems we should expect evolution in a social context to be good at, so I feel like the burden of proof is on whoever is positing a local deviance to explain why the features are off in this case. Some possibilities:

1. "Our" community is different [why?]

2. People in history are in fact object-level wrong about the existence (or at least prevalence) of evil actors. In reality "Almost no one is evil, almost everything is broken." A possible evolutionarily concordant just-so story here is something in the direction of rational irrationality, perhaps humans are better at tribal ostracism etc if they collectively pretend (and/or genuinely believe) other humans who do bad things are genuinely evil and thus worthy of ostracism. 

3.???

Both explanations are possible but I don't know which one is right (or both, or neither); I just want to highlight there there is something left to be explained in your model so far. 

Replies from: maia, Korz
comment by maia · 2023-03-01T12:45:26.572Z · LW(p) · GW(p)

There's no contradiction. There are two competing sides of the evolutionary process: one side is racing to understand intentions as well as possible, the other side is racing to obscure its intentions, in this case by not having them consciously.

comment by Mart_Korz (Korz) · 2023-03-01T21:16:52.680Z · LW(p) · GW(p)

I think one aspect which softens the discrepancy is that our intuitions here might not be adapted to large-scale societies. If everyone really lives mainly with one's own tribe and has kind of isolated interactions with other tribes and maybe tribe-switching people every now and then (similar to village-life compared to city-life), I could well imagine that "are they truly part of our tribe?" actually manages to filter out a large portion of harmful cases.

Also, regarding 2): If indeed almost no one is evil, almost everyone is broken: there are strong incentives to make sure that the social rules do not rule out your way of exploiting the system. Because of this I would not be surprised if "common knowledge" around these things tends to be warped by the class of people who can make the rules. Another factor is that as a coordination problem, using "never try to harm others" seems like a very fine Schelling point to use as common denominator.

Replies from: Linch
comment by Linch · 2023-03-22T03:57:42.081Z · LW(p) · GW(p)

It's possible, but I would previously have assumed that sociopathy/intentional maleficence etc to be less common in the ancestral environment relative to other harmful social situations. My own just-so story would suggest that people's intuitions from a tribal context are maladaptive in underpredicting sociopathy or deliberate deception. 

Replies from: Korz
comment by Mart_Korz (Korz) · 2023-03-24T21:10:04.569Z · LW(p) · GW(p)

I am not sure we disagree with regards to the prevalence of maleficience. One reason why I would imagine that

"are they truly part of our tribe?" actually manages to filter out a large portion of harmful cases.

works in more tribal contexts would be that cities provide more "ecological" niches (would the term be sociological here?) for this type of behaviour.

intuitions [...] are maladaptive in underpredicting sociopathy or deliberate deception

Interesting. I would mostly think that people today are way more specialized in their "professions" such that for any kind of ability we will come into contact with significantly more skilled people than a typical ancestor of ours would have. If I try to think about examples where people are way too trusting, or way too ready to treat someone as an enemy, I have the impression that for both mistakes examples come to mind quite readily. Due to this, I think I do not agree with "underpredict" as a description and instead tend to a more general "overwhelmed by reality".

comment by Raemon · 2023-03-15T23:24:53.291Z · LW(p) · GW(p)

Curated. 

In some sense, I knew all this 10 years ago when I first started community-organizing and running into problems with various flavors of deception, manipulation, and people-hurting-each-other. 

But, I definitely struggled to defend my communities against people who didn't quite match my preconception of what "a person I would need to defend against" looked like. My sympathy and empathy for some people made me more hesitant to enforce my boundaries.

I don't know that I'm thrilled with "malefactor" or "maleficence" as words (they seem too similar to "malicious" and don't think they convey the right set of things), but, I very much agree with the distinction being useful.

comment by weft · 2023-03-03T23:54:52.798Z · LW(p) · GW(p)

Interpersonal abuse (eg parental, partner, etc) has a similar issue. People like to talk as if the abuser is twirling their mustache in their abuse-scheme. And while this is occasionally the case, I claim that MOST abuse is perpetrated by people with a certain level of good intent. They may truly love their partner and be the only one who is there for them when they need it, BUT they lack the requisite skills to be in a healthy relationship.

Sadly this is often due to a mental illness, or a history of trauma, or not getting to practice these skills growing up until there was a huge gulf between where they are and where they need to be.

This makes it extra difficult for the victim, because the abuser is sympathetic and seemingly ACTUALLY TRYING. Trying to get advice from the internet may not help when everyone paints your abuser as a scheming villain and you can tell they're not. They're just broken.

I've really appreciated the media that shows a more realistic picture of abusers as people who love you, but are too fucked up to not hurt you. I think more useful advice would acknowledge this harsh reality

comment by Harold (harold-1) · 2023-03-02T01:44:39.202Z · LW(p) · GW(p)

I don't have any terminological suggestions that I love

Following on my prior comment, the actual legal terms used for the (oxymoronic) "purposeless and unknowing mens rea" might provide an opening for the legal-social technologies to provide wisdom on operationizing these ideas -  "negligent" at first, and "reckless" when it's reached a tipping point.

comment by localdeity · 2023-03-01T02:50:34.632Z · LW(p) · GW(p)

When dealing with someone who's doing something bad, and it's not clear whether they're conscious of it or not, one tactic is to tell them about it and see how they respond.  (It is the most obviously prosocial approach.)  Ideally, this will either fix the situation or lead towards establishing that they are, at the very least, reprehensibly negligent, and then you can treat them as malicious.  (In principle, the difference between a malicious person and one who accidentally behaves badly is that, if both of them come to understand that their behavior causes bad results, the latter will stop while the former will keep going.  Applying this to the real world can be messy.)

To take an easy example [LW(p) · GW(p)], if the scenario involves a friend repeatedly doing something that hurts you, then probably you should tell them about it.  If they apologize and try to stop, this is good; if their attempts to stop fail, then you can tell them that too, and take it from there.  If, contrariwise, they insist "this can't actually be hurting you", or deny that it happened, or otherwise reject your feedback, then I'd consider this evidence that they're not such a good friend.

In the case of a non-friend, there is less of a presumption of good faith.  Since the effect of them agreeing with you would mean they have to restrict their behavior or otherwise do stuff they'd rather not, they may be reluctant to agree, and further they might take it as you attempting to grab power or bully them.  Which are things that people sometimes do, and so the details matter: exactly what evidence there is, the relation between them and the person(s) raising the issue, etc.

Suppose the issue involves subjective judgments of how someone behaved in 1:1 contexts.  If one person thought you behaved badly in a situation, and you think differently, maybe you're right.  If, the last N times you were in that type of situation, with N different people, they all thought you behaved badly, then that gets to be strong evidence, as N increases, that your approach is wrong.  (Depending on the issue, it's possible that all N people believe the wrong philosophy—e.g. if the interaction was that they said "Praise Jesus!" and you replied "Sorry, but I'm an atheist".  Though one then asks, why are you getting into all these situations that you can predict will go badly?  Are you doing what you should do to avoid them?)

At a certain point, as the evidence mounts, a responsible person in your position, when confronted with the evidence, should say, "Ok, I still don't agree, but I have to admit there's an X% chance I'm wrong, and if I am wrong and continue like this, then the impact of being wrong is Y; meanwhile, there are certain safeguards, up to and including "stop it completely", which have their own expected values, and at this point safeguards A and B are reasonable and worth doing."  (A truly mature person in certain situations might even say, "I know I'm innocent, but I also know that others have no way of verifying this, and from their perspective there's an X% chance I'm guilty, and I'm in favor of the general policy of responding with these countermeasures to that level of evidence of this crime, and I'm not going to fight them on this.")

A certain kind of narcissist would completely reject the feedback and say they're being unjustly persecuted, and (assuming our evidence is in fact good) we can condemn them here.  Depending on the situation, some predators would say, "Hmmph, those safeguards prevent me from doing the fun stuff or make it unacceptably risky; I'll agree and then just quietly leave the community".  Some others would pretend to agree and then try to continue misbehaving in whatever way they can.  There's always the possibility of an intelligent psychopath behaving exactly like an innocent person.

(If you want to get advanced about it, you could try having the "confronting" be initially done by some person who looks sane but not powerful, to maximize the likelihood that the "prideful narcissist" would openly reject it while the "reasonable, accidental misbehaver" would accept it; or, if the safeguard you have in mind is highly effective but is a major concession, you might have it be done by people who are officially "in charge" (e.g. with the power to ban people from events) so as to pressure cowardly offenders to agree.)

If you don't have enough evidence to be confident that the guy who rejects the feedback and insists he's correct is in fact wrong... Well, at the very least, by telling him, (a) if he's good but misguided, he should at least be more cautious in the future, and there is a chance you've helped; (b) if he's bad and cowardly, he knows that official eyes are on him and he'll have less benefit of the doubt in the future, which may dissuade him.  (This is conventionally known as a "warning".)  Having the right person tell him in the right way may help with (a) and possibly (b).

There may be circumstances in which you don't want to tell him about the evidence you do have.  (Maybe it would break a confidence; maybe it would teach predator-him how to hide his behavior in the future; maybe predator-he would know who snitched on him and take revenge [though my brain volunteers that this would be an excellent way to expose him, if you can protect the witness].)  There are also plenty in which this isn't a problem.

Overall, this is such a large topic, and appropriate responses depend so much on the details, that I think it would help to be more specific.

[edit: fixed link]

Replies from: metacoolus
comment by metacoolus · 2023-06-06T23:13:26.685Z · LW(p) · GW(p)

Yes! This is an excellent approach. Rather than focusing only on whether there is malicious intent, keeping in mind the more practical goal of wanting bad behavior to *stop* and seeking to understand how it might play out over time is a much more effective way of resolving the problem. Using direct communication to try and fix the situation or ascertain a history of established negligent or malicious behavior is very powerful.

comment by Harold (harold-1) · 2023-03-02T01:40:45.829Z · LW(p) · GW(p)

(As an example, various crimes legally require mens rea, lit. “guilty mind”, in order to be criminal. Humans care about this stuff enough to bake it into their legal codes.)

Even in the law of mental states, intent follows the advice in this post. U.S. law commonly breaks down the 'guilty mind' into at least four categories, which, in the absence of a confession, all basically work by observing the defendant's patterns of behaviour. There may be some more operational ideas in the legal treatment of reckless and negligent behaviour.

  1. acting purposely - the defendant had an underlying conscious object to act
  2. acting knowingly - the defendant is practically certain that the conduct will cause a particular result
  3. acting recklessly - The defendant consciously disregarded a substantial and unjustified risk
  4. acting negligently - The defendant was not aware of the risk, but should have been aware of the risk
comment by Slimepriestess (Hivewired) · 2023-03-01T02:02:15.625Z · LW(p) · GW(p)

this might be a bit outside the scope of this post, but it would probably help if there was a way to positively respond to someone who was earnestly messing up in this manner before they cause a huge fiasco. If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked

Replies from: Lukas_Gloor
comment by Lukas_Gloor · 2023-03-01T16:48:54.809Z · LW(p) · GW(p)

If there's a legitimate belief that they're trying to do better and act in good faith, then what can be done to actually empower them to change in a positive direction? That's of course if they actually want to change, if they're keeping themselves in a state that causes harm because it benefits them while insisting its fine, well, to steal a sith's turn of phrase: airlocked

I agree that it's important to give people constructive feedback to help them change. However, I see some caveats around this (I think I'm expanding on the points in your comment rather than disagreeing with it). Sometimes it's easier said than done. If part of a person's "destructive pattern" is that they react with utter contempt when you give them well-meant and (reasonably-)well-presented feedback, it's understandable if you don't want to put yourself in the crossfire. In that case, you can always try to avoid contact with someone. Then, if others ask you why you're doing this, you can say something that conveys your honest impressions while making clear that you haven't given this other person much of a chance.

Just like it's important to help people change, I think it's also important to seriously consider the hypothesis that some people are so stuck in their destructive patterns that giving constructive feedback is no longer justifiable in terms of social opportunity costs. (E.g., why invest 100s of hours helping someone become slightly less destructive if you can promote social harmony 50x better by putting your energy into pretty much anyone else.) 

Someone might object as follows. "If someone is 'well-intentioned,' isn't there a series of words you* can kindly say to them so that they'll gain insight into their situation and they'll be able to change?" 

I think the answer here is "no" and I think that's one of the saddest things about life. Even if the answer was, "yes, BUT, ...", I think that wouldn't change too much and would still be sad.

*(Edit) Instead of "you can kindly say to them," the objection seems stronger if this said "someone can kindly say to them." Therapists are well-positioned to help people because they start with a clean history. Accepting feedback from someone you have a messy history with (or feel competitive with, or all kinds of other complications) is going to be much more difficult than the ideal scenario.

One data point that seems relevant here is success probabilities for evidence-based treatments of personality disorders. I don't think personality disorders capture everything about "destructive patterns" (for instance, one obvious thing that they miss is "person behaves destructively due to an addiction"), nor do I think that personality disorders perfectly carve reality at its joints (most traits seem to come on a spectrum!). Still, it seems informative that the treatment success for narcissistic personality disorder seems comparatively very low (but not zero!) for people who are diagnosed with it, in addition to it being vastly under-diagnosed since people with pathological narcissism are less likely to seek therapy voluntarily. (Note that this isn't the case for all personality disorders – e.g., I think I read that BPD without narcissism as a comorbidity has something like 80% chance of improvement with evidence-based therapy.) These stats are some indication that there are differences in people's brain wiring or conditioned patterns that are deep enough that they can't easily be changed with lots of well-intentioned and well-informed communication (e.g., trying to change beliefs about oneself and others). 

So, I think it's a trap to assume that being 'well-intentioned' means that a person is always likely to improve with feedback. Even if, from the outside, it looks as though someone would change if only they could let go of a particular mindset or set of beliefs that seems to be the cause behind their "destructive patterns," consider the possibility that this is more of a symptom rather than the cause (and that the underlying cause is really hard to address). 

comment by Marcello · 2023-03-03T23:28:22.104Z · LW(p) · GW(p)

I know this post was chronologically first, but since I read them out of order my reaction was "wow, this post is sure using some of the notions from the Waluigi Effect mega-post [LW · GW], but for humans instead of chatbots"!  In particular, they're both pointing at the notion that an agent (human or AI chatbot) can be in something like a superposition between good actor and bad actor unlike the naive two-tone picture of morality one often gets from children's books.

comment by Aleksey Bykhun (caffeinum) · 2023-03-18T02:00:12.873Z · LW(p) · GW(p)

After a recent article in NY Times, I realized that it's a perfect analogy. The smartest people, when motivated by money, get so high that they venture into unsafe territory. They kinda know its unsafe, but even internally it doesn't feel like crossing the red line.

It's not even about the strength of characters, when incentives are aligned 99:1 against your biology, you can try to work against it, but you most probably stand no chance.

It takes enormous willpower to quit smoking explicitly because the risks are invisible and so "small". It's not only you have to fight against this irresistible urge, BUT there's also nobody on "your side", except for intellectual realization, of which you're not even so sure of.

In the same vein, being a CEO of a big startup, being able to single-handedly choose direction, and getting used to people around you being less smart, less hard-working, less competitive, you start trusting your own decision-process much more. That's when incentives start to water down through the cracks in the shell. You don't even remember what feels right anymore, the only thing you know is taking bold actions brings you more power, more money, more dukka. And you do those.

comment by NickGabs · 2023-03-01T18:29:01.487Z · LW(p) · GW(p)

Strong upvote. A corollary here is that a really important part of being a “good person” is being good at being able to tell when you’re rationalizing your behavior/otherwise deceiving yourself into thinking you’re doing good. The default is that people are quite bad at this but as you said don’t have explicitly bad intentions, which leads to a lot of people who are at some level morally decent acting in very morally bad ways.

comment by LVSN · 2023-03-01T02:49:08.313Z · LW(p) · GW(p)

Very excited for there to be definitely no differences between stereotypical malefactors and actual malefactors; no differences between stereotypical maleficence and actual maleficence; very excited for there to be no gameable cultural impressions about what makes a person a probable malefactor

... Not to imply that any gaming that would take place would be intentional, of course.

This isn’t to say no coordination happens. I expect a little coordination happens openly, through prosocial slogans, just to overcome free rider problems. Remember Trivers’ theory of self-deception [LW · GW] – that if something is advantageous to us, we naturally and unconsciously make up explanations for why it’s a good prosocial policy, and then genuinely believe those explanations. If you are rich and want to oppress the poor, you can come up with some philosophy of trickle-down or whatever that makes it sound good. Then you can talk about it with other rich people openly, no secret organizations in smoke-filled rooms necessary, and set up think tanks together. If you’re in the patriarchy, you can push nice-sounding things about gender roles and family values. There is no secret layer beneath the public layer – no smoke-filled room where the rich people get together and say “Let’s push prosocial slogans about rising tides, so that secretly we can dominate everything”. It all happens naturally under the hood, and the Basic Argument isn’t violated."

https://slatestarcodex.com/2019/01/14/too-many-people-dare-call-it-conspiracy/

comment by Trevor Fordsman Weston · 2023-03-01T00:27:45.649Z · LW(p) · GW(p)

I agree with this very intensely. I strongly regret unilaterally promoting the CFAR Handbook on various groups on Facebook; I thought that it was critical to minimize the number of AI safety and adjacent people using Facebook and that spreading the CFAR handbook was the best way to do that, and I mistakenly believed that CFAR was bad at marketing their material instead of choosing not to in order to avoid overcomplicating things. I had no way of knowing about the long list of consequences for CFAR for spreading their research in the wrong places, and CFAR had no way of warning me because they had no idea who I was and what I would do in response to their request. Hopefully, this won't make it harder for CFAR to post helpful content to Lesswrong in the future.

There are too many outside-the-box thinkers, the chaos factor is so high that it's like herding cats even when 99% of agents want to be cooperative. There needs to be defense mechanisms that take confusion into account so that well-intentioned unilateralists don't get tangled up in systems meant for deliberate, consistently strategic harm-maximizers (who very clearly and unambiguously exist). The only thing I can think of is finding ways to discourage every cooperative person from acting unilaterally in the first place, but I agree with So8res that I can't think of good ways to do that.

Replies from: Linch, SaidAchmiz
comment by Linch · 2023-03-01T07:07:35.729Z · LW(p) · GW(p)

I thought that it was critical to minimize the number of AI safety and adjacent people using Facebook and that spreading the CFAR handbook was the best way to do tha

Wait your TOC for spreading the CFAR handbook on Facebook was that doing so would be so annoying that it'd get people to quit Facebook? If true, this is rather surprising to me and I did not predict this.

Replies from: dmitriy
comment by Dmitriy (dmitriy) · 2023-03-02T07:22:01.898Z · LW(p) · GW(p)

I read his thesis as

  1. FB use reduces the effectiveness of AI safety researchers and
  2. the techniques in the CFAR handbook can help people resist attention hijacking schemes like FB, therefore
  3. a FB group for EAs is a high leverage place to spread the CFAR handbook
comment by Said Achmiz (SaidAchmiz) · 2023-03-02T04:24:33.744Z · LW(p) · GW(p)

the long list of consequences for CFAR for spreading their research in the wrong places

What are these consequences? Is this “long list” published anywhere?

comment by Richard_Kennaway · 2023-03-06T16:09:12.748Z · LW(p) · GW(p)

Here is a fictional, but otherwise practical example: the attempted rape that sets in motion the action of "Thelma and Louise". Here on YouTube. Notice what Harlan says at 0:50: "I'm not gonna hurt you".

How does he experience his intentions at that moment? At the moment after Thelma slaps him and he beats her?

Does it matter?

comment by Mary Chernyshenko (mary-chernyshenko) · 2023-03-04T09:34:09.587Z · LW(p) · GW(p)

Yeah, we don't know if the people who sent the Boy Who Had Cried Wolf to guard the sheep were stupid or evil. But we do know they committed murder.

comment by SomeoneYouOnceKnew · 2023-03-01T02:11:23.514Z · LW(p) · GW(p)

What material policy changes are being advocated for, here? I am having trouble imagining how this won't turn into a witch-hunt.

comment by Tristan Miano (tristan-miano-2) · 2023-03-06T13:24:00.622Z · LW(p) · GW(p)

Harmful people often lack explicit malicious intent.

I was having a discussion with ChatGPT where it also claimed to believe the same thing as this. I asked it to explain why it thinks this. It's reasoning was that well-intentioned people often make mistakes, and that malign actors do not always succeed in their aims. I'll say!

I disagree completely with the idea that well-intentioned people can actually cause any harm, but even if you presume that they could, it isn't clear to me how malign actors being unable to succeed in their aims is enough to balance out the consequences such that more negativity falls on the well-intentioned. Perhaps the unsuccess of malign actors is due to correctly narrowing our focus onto them only?

Also, in my experience, I think if we follow the advice to focus on effects only, that if we were well-intentioned about doing this, we'd end up focusing on only the truly malign actors anyway. "Deploying defenses" against honest mistake-making just doesn't intuitively result in actions that don't seem a bit cartoonishly ironically villainous. 

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-03-11T16:16:48.399Z · LW(p) · GW(p)

A version of this tends to happen with rather unintelligent or incompent people placed in positions of power over other people, who can unintentionally harm people without having any intention to harm them.

Probably the best example here is the Great Chinese Famine, and the Holodomor to a lesser extent. One of the major problems was that the leadership had set severely unrealistic goals because they didn't know enough and combined with incompetence, caused catastrophes on the scale of millions to tens of millions of lives.

comment by Noosphere89 (sharmake-farah) · 2023-03-01T00:38:51.977Z · LW(p) · GW(p)

As a former EA, I basically agree with this, and I definitely agree that we should start shifting to a norm that focuses on punishing bad actions, rather than trying to infer their mental state.

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

Replies from: aphyer, Kenoubi, localdeity
comment by aphyer · 2023-03-01T02:25:14.088Z · LW(p) · GW(p)

Even if you accept that all cryptocurrency is valueless, it is possible to operate a crypto-related firm that does what it says it does or one that doesn't.  

For example, if two crypto exchanges accept Bitcoin deposits and say they will keep the Bitcoin in a safe vault for their customers, and then one of them keeps the Bitcoin in the vault while the other takes it to cover its founder's personal expenses/an affiliated firm's losses, I think it is fair to say that the second of these has committed fraud and the first has not, regardless of whether Bitcoin has anything 'real' about it or whether it disappears into a puff of smoke tomorrow.

comment by Kenoubi · 2023-03-01T01:23:07.098Z · LW(p) · GW(p)

On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.

I don't deny that the cryptocurrency "industry" has been a huge magnet for fraud, nor that there are structural reasons for that, but "there was nothing real about crypto" is plainly false. The desire to have currencies that can't easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by governments or other centralized organizations and that can be transferred without physical presence is real. So is the desire for self-executing contracts. One might believe those to be harmful abilities that humanity would be better off without, but not that they're just nothing.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-03-01T01:58:26.896Z · LW(p) · GW(p)

More specifically, the issue with crypto is that the benefits are much less than promised, and there's a whole lot of bullshit claims on crypto like it being secure or not manipulatable.

On one example of why cryptocurrencies fail as an a currency, one of it's problems is that it's fixed supply and no central entity means the value of that currency swings wildly, which is a dealbreaker for any currency.

Note, this is just one of the many, fractal problems here with crypto.

Crypto isn't all fraud. There's reality, but it's built out of unsound foundations and trying to sell a fake castle to others.

comment by localdeity · 2023-03-01T03:19:32.398Z · LW(p) · GW(p)

I definitely agree that we should start shifting to a norm that focuses on punishing bad actions, rather than trying to infer their mental state.

Do you have limitations to this in mind?  Consider the political issue of abortion.  One side thinks the other is murdering babies; the other side thinks the first is violating women's rightful ownership of their own bodies.  Each side thinks the other is doing something monstrous.  If that's all you need to justify punishment, then that seems to mean both sides should fight a civil war.

("National politics?  I was talking about..."  The one example the OP gives is SBF, and other language alludes to sex predators and reputation launderers, and the explicit specifiers in the first few paragraphs are "harmful people" and "bad behavior"; it's such a wide range that it seems hard to declare anything offtopic.)

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2023-03-01T19:15:31.354Z · LW(p) · GW(p)

You've actually mentioned a depressing possibility around morality, and it's roughly that without shared ethical assumptions, conflict is the default, and there's nothing imposing any constraints except social norms, which can break down.

My answer for people in general is: Try to see what others think, but remember that sometimes, bad outcomes will happen to stop worse outcomes, and you should always focus on your own values to decide the answers.

comment by cousin_it · 2023-03-01T18:14:31.260Z · LW(p) · GW(p)