Against empathy-by-default

post by Steven Byrnes (steve2152) · 2024-10-16T16:38:49.926Z · LW · GW · 11 comments

Contents

  tl;dr
  1. What am I arguing against?
  2. Why I don’t buy it
    2.1 Tofu versus feta part 1: the common-sense argument
    2.2 Tofu versus feta part 2: The algorithm argument
      Start with the tofu versus feta example:
      The case of me-eating-tofu versus Ahmed-eating-tofu:
  3. Kernels of truth in the original story
    3.1 By default, we can expect transient spillover empathy … before within-lifetime learning promptly eliminates it
    3.2 The semantic overlap is stable by default, even if the motivational overlap (from reward model spillover) isn’t
None
11 comments

tl;dr

Section 1 presents an argument that I’ve heard from a couple people, that says that empathy[1] happens “for free” as a side-effect of the general architecture of mammalian brains, basically because we tend to have similar feelings about similar situations, and “me being happy” is a kinda similar situation to “someone else being happy”, and thus if I find the former motivating then I’ll tend to find the latter motivating too, other things equal.

Section 2 argues that those two situations really aren’t that similar in the grand scheme of things, and that our brains are very much capable of assigning entirely different feelings to pairs of situations even when those situations have some similarities. This happens all the time, and I illustrate my point via the everyday example of having different opinions about tofu versus feta.

Section 3 acknowledges a couple kernels of truth in the Section 1 story, just to be clear about what I’m agreeing and disagreeing with.

1. What am I arguing against?

Basically, the proposal (as I understand it) is that things-involving-me and corresponding things-involving-other-people wind up close to each other in the latent space, and thus the “learnt reward model”, being smooth and continuous by default, assigns values that spill over from the former to the latter.   

Here’s Beren Millidge (@beren [LW · GW]), “Empathy as a natural consequence of learnt reward models” (2023) [LW · GW]:

…Here, I want to argue a different case. Namely that the basic cognitive phenomenon of empathy -- that of feeling and responding to the emotions of others as if they were your own, is not a special cognitive ability which had to be evolved for its social benefit, but instead is a natural consequence of our (mammalian) cognitive architecture and therefore arises by default. Of course, given this base empathic capability, evolution can expand, develop, and contextualize our natural empathic responses to improve fitness. In many cases, however, evolution actually reduces our native empathic capacity -- for instance, we can contextualize our natural empathy to exclude outgroup members and rivals.

The idea is that empathy fundamentally arises from using learnt reward models[2] to mediate between a low-dimensional set of primary rewards and reinforcers and the high dimensional latent state of an unsupervised world model. In the brain, much of the cortex is thought to be randomly initialized and implements a general purpose unsupervised (or self-supervised) learning algorithm such as predictive coding to build up a general purpose world model of its sensory input. By contrast, the reward signals to the brain are very low dimensional (if not, perhaps, scalar). There is thus a fearsome translation problem that the brain needs to solve: learning to map the high dimensional cortical latent space into a predicted reward value. Due to the high dimensionality of the latent space, we cannot hope to actually experience the reward for every possible state. Instead, we need to learn a reward model that can generalize to unseen states. Possessing such a reward model is crucial both for learning values (i.e. long term expected rewards), predicting future rewards from current state, and performing model based planning where we need the ability to query the reward function at hypothetical imagined states generated during the planning process. We can think of such a reward model as just performing a simple supervised learning task: given a dataset of cortical latent states and realized rewards (given the experience of the agent), predict what the reward will be in some other, non-experienced cortical latent state.

The key idea that leads to empathy is the fact that, if the world model performs a sensible compression of its input data and learns a useful set of natural abstractions, then it is quite likely that the latent codes for the agent performing some action or experiencing some state, and another, similar, agent performing the same action or experiencing the same state, will end up close together in the latent space. If the agent's world model contains natural abstractions for the action, which are invariant to who is performing it, then a large amount of the latent code is likely to be the same between the two cases. If this is the case, then the reward model might 'mis-generalize'[3] to assign reward to another agent performing the action or experiencing the state rather than the agent itself. This should be expected to occur whenever the reward model generalizes smoothly and the latent space codes for the agent and another are very close in the latent space. This is basically 'proto-empathy' since an agent, even if its reward function is purely selfish, can end up assigning reward (positive or negative) to the states of another due to the generalization abilities of the learnt reward function. …

Likewise, I think @Marc Carauleanu [LW · GW] has made similar claims (e.g. here [LW · GW], here [LW · GW]), citing (among other things) the “perception-action model for empathy”, if I understood him right.

Anyway, this line of thinking seems to me to be flawed—like, really obviously flawed. I’ll try to spell out why I think that in the next section, and then circle back to the kernels of truth at the end.

2. Why I don’t buy it

2.1 Tofu versus feta part 1: the common-sense argument

Sources: 1,2

Tofu and feta are similar in some ways, and different in other ways. Let’s make a table!

Tofu versus Feta
SimilaritiesDifferences
They’re both foodThey taste different
They look pretty similarThey’re made of different things
You can pick up both with a forkThey have different nutritional profiles

OK, next, let’s compare “me eating tofu” with “my friend Ahmed eating tofu”. Again, they’re similar in some ways and different in other ways:

“Me eating tofu” versus “Ahmed eating tofu”
SimilaritiesDifferences
They both involve tofu being eatenThe person eating the tofu is different
 One will lead to me tasting tofu and feeling full; the other will lead to me tasting nothing at all and remaining hungry
 In one case, I should chew; in the other case, I shouldn’t

Now, one could make an argument, in parallel with the excerpt at the top, that tofu and feta have some similarities, and so they wind up in a similar part of the latent space, and so the learnt reward model will assign positive or negative value in a way that spills over from one to the other.

But—that argument is obviously wrong! That’s not what happens! Nobody in their right minds would like feta because they like tofu, and because tofu and feta have some similarities, causing their feelings about tofu to spill over into their feelings about feta. Quite the contrary, an adult’s feelings about tofu have no direct causal relation at all with their feelings about feta. We, being competent adults, recognize that they are two different foods, about which we independently form two different sets of feelings. It’s not like we find ourselves getting confused here.

So by the same token, in the absence of any specific evolved empathy-related mechanism, our strong assumption should be that an adult’s feelings (positive, negative, or neutral) about themselves eating tofu versus somebody else eating tofu should have no direct causal relation at all. They’re really different situations! Nobody in their right minds would ever get confused about which is which!

And the same applies to myself-being-happy versus Ahmed-being-happy, and so on.

2.2 Tofu versus feta part 2: The algorithm argument

Start with the tofu versus feta example:

The latent space that Beren is talking about needs to be sufficiently fine-grained to enable good understanding of the world and good predictions. Thus, given that tofu versus feta have lots of distinct consequences and implications, the learning algorithm needs to separate them in the latent space sufficiently to allow for them to map into different world-model consequences and associations. And indeed, that’s what happens: it’s vanishingly rare for an adult of sound mind to get confused between tofu and feta in the middle of a conversation.

Next, the “reward model” is a map from this latent space to a scalar value. And again, there’s a learning algorithm sculpting this reward model to “notice” “edges” where different parts of the latent space have different reward-related consequences. If every time I eat tofu, it tastes bad, and every time I eat feta, it tastes good, then the learning algorithm will sculpt the reward model to assign a high value to feta and low value to tofu.

So far this is all common sense, I hope. Now let’s flip to the other case:

The case of me-eating-tofu versus Ahmed-eating-tofu:

All the reasoning above goes through in the same way.

Again, the latent space needs to be sufficiently fine-grained to enable good understanding of the world and good predictions. Thus, given that me-eating-tofu versus Ahmed-eating-tofu have lots of distinct consequences and implications, the learning algorithm needs to separate them in the latent space sufficiently to allow for them to map into different world-model consequences and associations. And indeed, no adult of sound mind would get confused between one and the other.

Next, the “reward model” is a map from this latent space to a scalar value. And again, there’s a learning algorithm sculpting this reward model to “notice” “edges” where different parts of the latent space have different reward-related consequences. If every time I eat tofu, it tastes yummy and fills me up (thanks to my innate drives [LW · GW] / primary rewards), and if every time Ahmed eats tofu, it doesn’t taste like anything, and doesn’t fill me up, and hence doesn’t trigger those innate drives, then the learning algorithm will sculpt the reward model to assign a high value to myself-eating-tofu and not to Ahmed-eating-tofu.

And again, the same story applies equally well to myself-being-comfortable versus Ahmed-being-comfortable, etc.

3. Kernels of truth in the original story

3.1 By default, we can expect transient spillover empathy … before within-lifetime learning promptly eliminates it

If a kid really likes tofu, and has never seen or heard of feta before, then the first time they see feta they might well have general good feelings about it, because they’re mentally associating it with tofu.

This default basically stops mattering at the same moment that they take their first bite of feta. In fact, it can largely stop mattering even before they taste or smell it—it can stop mattering as soon as someone tells the kid that it’s not in fact tofu but rather an unrelated food of a similar color.

But still. It is a default, and it does have nonzero effects.

So by the same token, one might imagine that, in very early childhood, a baby who likes to be hugged might mentally lump together me-getting-hugged with someone-else-getting-hugged, and thereby have positive feelings about the latter. This is a “mistake” from the perspective of the learning algorithm for the reward model, in the sense that hug has high value because (let us suppose) it involves affective touch inputs that trigger primary reward via some innate drive in the brainstem [LW · GW], and somebody else getting hugged will not trigger that primary reward. Thus, this “mistake” won’t last. The learnt reward model will update itself. But still, this “mistake” will plausibly happen for at least one moment of one day in very early childhood.

Is that fact important? I don’t think so! But still, it’s a kernel of truth in the story at the top.

(Unless, of course, there’s a specific evolved mechanism that prevents the learnt reward model from getting updated in a way that “corrects” the spillover. If that’s the hypothesis, then sure, let’s talk about it! But let’s focus the discussion on what exactly that specific evolved mechanism is! Incidentally, when I pushed back in the comments section of Beren’s post, his response [LW(p) · GW(p)] was I think generally in this category, but a bit vague.)

3.2 The semantic overlap is stable by default, even if the motivational overlap (from reward model spillover) isn’t

Compare the neurons that activate when I think about myself-eating-tofu, versus when I think about Ahmed-eating-tofu. There are definitely differences, as I argued above, and I claim that these differences are more than sufficient to allow the reward model to fire in a completely different way for one versus the other. But at the same time, there are overlaps in those neurons. For example, both sets of neurons probably include some neurons in my temporal lobe that encode the idea of tofu and all of its associations and implications.

By the same token, compare the neurons that activate when I myself feel happy, versus when I think about Ahmed-being-happy. There are definitely differences! But there’s definitely some overlap too.

The point of this post is to argue that this overlap doesn’t give us any empathy by itself, because the direct motivational consequence (from spillover in the learnt reward model) doesn’t even last five minutes, let alone a lifetime. But still, the overlap exists. And I think it’s plausible that this overlap is an ingredient in one or more specific evolved mechanisms that lead to our various prosocial and antisocial instincts. What are those mechanisms? I have ideas! But that’s outside of the scope of this post. More on that in the near future, hopefully.

  1. ^

    The word “empathy” typically conveys a strongly positive, prosocial vibe, and that’s how I’m using that word in this post. Thus, for example, if Alice is very good at “putting herself in someone else’s shoes” in order to more effectively capture, imprison, and torture that someone, that’s NOT usually taken as evidence that Alice is a very “empathetic” person! (More discussion here [LW · GW].) If you strip away all those prosocial connotations, you get what I call “empathetic simulation”, a mental operation that can come along with any motivation, or none at all. I definitely believe in “empathetic simulation by default”, see §3.2 at the end. 

  2. ^

    Steve interjection: What Beren calls “learnt reward model” is more-or-less equivalent to what I call “valence guess”; see for example this diagram [LW · GW]. I’ll use Beren’s terminology for this post.

  3. ^

    Steve interjection: The word “misgeneralization” is typically used in a specific way in AI alignment (cf. here, here [LW(p) · GW(p)]), which isn’t a perfect match to how Beren is using it here, so in the rest of the post I’ll talk instead about value “spillover” from one thing to another.

11 comments

Comments sorted by top scores.

comment by Jan_Kulveit · 2024-10-16T19:58:36.426Z · LW(p) · GW(p)

I expected quite different argument for empathy

1. argument from simulation: most important part of our environment are other people; people are very complex and hard to predict; fortunately, we have a hardware which is extremely good at 'simulating a human' - our individual brains. to guess what other person will do or why they are doing what they are doing, it seems clearly computationally efficient to just simulate their cognition on my brain. fortunately for empathy, simulations activate some of the same proprioceptive machinery and goal-modeling subagents, so the simulation leads to similar feelings

2. mirror neurons: it seems we have powerful dedicated system for imitation learning, which is extremely advantageous for overcoming genetic bottleneck. mirroring activation patterns leads to empathy  

Replies from: nathan-helm-burger, steve2152
comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-10-17T00:14:40.516Z · LW(p) · GW(p)

When I've been gradually losing at a strategic game where it seems like my opponent is slightly stronger than me, but then I have a flash of insight and turn things around at the last minute.... I absolutely model what my opponent is feeling as they are surprised by my sudden comeback. My reaction to such an experience is usually to smile, or (if I'm alone playing the game remotely) perhaps chuckle with glee at their imagined dismay. I feel proud of myself, and happy to be winning.

On the other hand, if I'm beating someone who is clearly trying hard but outmatched, I often feel a bit sorry for them. In such a case my emotions maybe align somewhat with theirs, but I don't think my slight feeling of pity, and perhaps superiority, is in fact a close match for what I imagine them feeling.

And both these emotional states are not what I'd feel in a real life conflict. A real life conflict would involve much more anxiety and stress, and concern for myself and sometimes the other. 

I don't just automatically feel what the simulated other person in my mind is feeling. I feel a reaction to that simulation, which can be quite different from what the simulation is feeling! I don't think that increasing the accuracy and fidelity of the simulation would change this.

comment by Steven Byrnes (steve2152) · 2024-10-16T20:06:28.999Z · LW(p) · GW(p)
  • I added a footnote at the top clarifying that I’m disputing that the prosocial motivation aspect of “empathy” happens for free. I don’t dispute that (what I call) “empathetic simulations” are useful and happen by default.
  • A lot of claims under the umbrella of “mirror neurons” are IMO pretty sketchy, see my post Quick notes on “mirror neurons” [LW · GW].
  • You can make an argument: “If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.” But you can equally well make an argument: “If I’m thinking about how to pick up tofu with a fork, I might analogize to how I might pick up feta with a fork, and so if tofu is yummy then I’ll get a yummy vibe and I’ll wind up feeling that feta is yummy too.” The second argument is counter to common sense; we are smart enough to draw analogies between situations while still being aware of differences between those same situations, and allowing those differences to control our overall feelings and assessments. That’s the point I was trying to make here.
Replies from: ben-lang
comment by Ben (ben-lang) · 2024-10-17T10:21:27.620Z · LW(p) · GW(p)

“If I’m thinking about what someone else might do and feel in situation X by analogy to what I might do and feel in situation X, and then if situation X is unpleasant than that simulation will be unpleasant, and I’ll get a generally unpleasant feeling by doing that.”

I think this is definitely true. Although, sometimes people solve that problem by just not thinking about what the other person is feeling. If the other person has ~no power, so that failing to simulate them carries ~no costs, then this option is ~free.

This kind of thing might form some kind of an explanation for Stockholm Syndrome. If you are kidnapped, and your survival potentially depends on your ability to model your kidnapper's motivations, and you have nothing else to think about all day, then any overspill from that simulating will be maximised. (Although from the wikipedia article on Stockholm syndrome it looks like it is somewhat mythical  https://en.wikipedia.org/wiki/Stockholm_syndrome)

comment by Gunnar_Zarncke · 2024-10-17T13:01:55.859Z · LW(p) · GW(p)

I think the steelmaned version of beren's argument is 

The potential for empathy is a natural consequence of learned reward models

That you indeed get for free. It will not get you far, as you have pointed out, because once you get more information, the model will learn to distinguish the cases precisely. And we know from observation that some mammals (specifically territorial ones) and most other animals do not show general empathy.

But there are multiple ways that empathy can be implemented with small additional circuitry. I think this is the part of beren's comment that you were referring to:

For instance, you could pass the RPE through to some other region to detect whether the empathy triggered for a friend or enemy and then return either positive or negative reward, so implementing either shared happiness or schadenfreude. Generally I think of this mechanism as a low level substrate on which you can build up a more complex repertoire of social emotions by doing reward shaping on these signals.

But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn't seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.

This might not be stable because free-loading might evolve, but this is then secondary.

I wonder which of these cases this comment of yours is:

consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult.

Replies from: steve2152, steve2152
comment by Steven Byrnes (steve2152) · 2024-10-17T14:17:42.216Z · LW(p) · GW(p)

But it might even be possible that no additional circuitry is required if the environment is just right. Consider the case of a very social animal in an environment where individuals, esp. young ones, rarely can take care of themselves alone. In such an environment, there may be many situations where the well-being of others predicts your own well-being. For example, if you give something to the other (and that might just be smile) that makes it more likely to be fed. This doesn't seem to necessarily require any extra circuits, though it might be more likely to bootstrap off some prior mechanisms, e.g., grooming or infant care.

This might not be stable because free-loading might evolve, but this is then secondary.

I don’t really buy this. For my whole childhood, I was in an environment where it was illegal, dangerous, and taboo for me to drive a car (because I was underage). And then I got old enough to drive, and so of course I started doing so without a second thought. I had not permanently internalized the idea that “Steve driving a car” is bad. Instead, I got older, my situation changed, and my behavior changed accordingly. Likewise, I dropped tons of other habits of childhood—my religious practices, my street address, my bedtime, my hobbies, my political beliefs, my values, etc.—as soon as I got older and my situation changed.

So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?

Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P

(And generalizing across people seems equally implausible to generalizing across time. I called my parents “mom and dad”, but I didn’t generalize that to calling everyone I met “mom and dad”. So why assume that my brain would generalize being-nice-to-parents to being-nice-to-everyone?)

It’s true that sometimes childhood incentives lead to habits that last through adulthood, but I think that mainly happens via (1) the adult independently assesses those habits as being more appealing than alternatives, or (2) the adult continues the habits because it’s never really occurred to them that there was any other option.

As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city, where they have atheist roommates and coworkers and friends. And at that point, they’ll probably at least imagine the possibility of becoming atheist. And they might or might not find that possibility appealing, based on their personality and so on.

But (2) doesn’t particularly apply to the idea of being selfish. I don’t think people are nice because it’s never even crossed their mind, not even once in their whole life, that maybe they could not do a nice thing. That’s a very obvious and salient idea! :)

[More on this in Heritability, Behaviorism, and Within-Lifetime RL [LW · GW] :) ]

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2024-10-17T20:53:15.992Z · LW(p) · GW(p)

I think the point we agree on is

habits that last through adulthood [because] the adult independently assesses those habits as being more appealing than alternatives,

I think that the habit of being nice to people is empathy.

So by the same token, when I was a little kid, yes it was in my self-interest (to some extent) for my parents to be healthy and happy. But that stopped being true as soon as I was financially independent. Why assume that people would permanently internalize that, when they fail to permanently internalize so many other aspects of childhood?

I'm not claiming that they "permanently internalize" but that they correctly (well, modulo mistakes) predict that it is their interests. You started driving a car because you correctly predicted that the situation/environment had changed. But across almost all environments, you get positive feedback from being nice to people and thus feel or predict positive valence about these.  

Actually it’s worse than that—adolescents are notorious for not feeling motivated by the well-being of their parents, even while such well-being is still in their own narrow self-interest!! :-P

That depends on the type of well-being and your ability to predict it. And maybe other priorities get in the way during that age. And again, I'm not claiming unconditional goodness. The environment of young adults is clearly different from that of children, but it is comparable enough to predict positive value from being nice to your parents. 

Actually, psychopaths prove this point: The anti-social behavior is "learned" in many cases during abusive childhood experiences, i.e., in environments where it was exactly not in their interest to be nice - because it didn't benefit them. And on the other side, psychopaths can, in many cases, function and show prosocial behaviors in stable environments with strong social feedback. 

This also generalizes to the cultures example.

As an example of (2), a religious person raised in a religious community might stay religious by default. Until, that is, they move to the big city

I agree: In the city, many of their previous predictions of which behaviors exactly lead to positive feedback ("quoting the Bible") might be off and they will quickly learn new behaviors. But being nice to people in general, will still work. In fact, I claim, it tends to generalize even more, which is why people who have been around more varied communities tend to develop more generalized morality (higher Kegan levels).

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2024-10-18T01:53:31.816Z · LW(p) · GW(p)

I’m not too sure what you’re arguing.

I think we agree that motivations need to ground out directly or indirectly with “primary rewards” from innate drives (pain is bad, eating-when-hungry is good, etc., other things equal). (Right?)

And then your comment kinda sounds like you’re making the following argument:

There’s no need to posit the existence of an innate drive / primary reward that ever makes it intrinsically rewarding to be nice to people, because “you get positive feedback from being nice to people”, i.e. you will notice from experience that “being nice to people” will tend to lead to (non-social) primary rewards like eating-when-hungry, avoiding pain, etc., so the learning algorithm in your brain will sculpt you to have good feelings around being nice to people.

If that’s what you’re trying to say, then I strongly disagree and I’m happy to chat about that … but I was under quite a strong impression that that’s not what you believe! Right?

I thought that you believed that there is a primary reward / innate drive that makes it feel intrinsically rewarding for adults to be nice (under certain circumstances); if so, why bring up childhood at all?

Sorry if I’m confused :)

comment by Steven Byrnes (steve2152) · 2024-10-17T14:56:13.595Z · LW(p) · GW(p)

I wonder which of these cases this comment of yours is:

consider “seeing someone get unexpectedly punched hard in the stomach”. That makes me cringe a bit, still, even as an adult.

  • One thing is, I think the brain invests like 10,000× more neurons into figuring out whether a thought is good vs bad (positive vs negative valence [LW · GW]) as figuring out whether a thought is or is not a good time to cringe. So I think the valence calculation can capture subtleties and complexities that the simpler cringe calculation can’t. This especially includes things properly handling complex thoughts with subordinate clauses and so on. For example, in the thought “I’ll do X in order to avoid Y”, the more negative the valence of Y is, the more positive the valence of the whole thought is. So the hypothesis “our brains are unable to learn a strong valence-difference between two vaguely-related situations” is (even?) more implausible than the hypothesis “our brains are unable to learn a strong stomach-cringe-appropriateness-difference between two vaguely-related situations”.
  • Another thing is, I obviously do think there are specific evolved mechanisms at play here, even if I didn’t talk about them in this post.
  • Another thing is, occasionally lightly tensing my stomach, in situations where I don’t need to, just isn’t the kind of high-stakes mistake that warrants a strong update in any brain learning algorithm. Like, if some flash in the corner of your eye has a 2% chance of preceding getting hit in the stomach, it’s still the right move to cringe every time—I’m happy to trade 50 false positives where I tense my stomach unnecessarily, in exchange for 1 true positive where I protect myself from serious injury. So presumably the brain learning algorithm is tuned to update only very weakly on false positives. Now, I don’t normally see people get punched in the stomach, up close and personal. I can’t even remember the last time that happened. If I saw that every day, I might well get desensitized to it. I do seem to be pretty well desensitized to seeing people get punched on TV.
comment by Foyle (robert-lynn) · 2024-10-17T04:51:49.502Z · LW(p) · GW(p)

"In many cases, however, evolution actually reduces our native empathic capacity -- for instance, we can contextualize our natural empathy to exclude outgroup members and rivals."

Exactly as it should be.

Empathy is valuable in close community settings, a 'safety net' adaption to make the community stronger with people we keep track of to ensure we are not being exploited by people not making concomitant effort to help themselves.  But it seems to me that it is destructive at wider social scales enabled by social media where we don't or can't have effective reputation tracking to ensure that we are not being 'played' for the purpose of resource extraction by people making dishonest or exaggerated representations.

In essence at larger scales the instinct towards empathy rewards dishonest, exploitative, sociopathic and narcissistic behavior in individuals, and is perhaps responsible for a lot of the deleterious aspects of social media amongst particularly more naturally or generally empathic-by-default women.  Eg 'influencers' (and before them exploitative televangelists) cashing in on follower empathy.  It also rewards misrepresentations of victimhood/suffering for attention and approval - again in absence of more in depth knowledge of the person that would exist in a smaller community - that may be a source of rapid increase in 'social contagion' mental health pathologies amongst particularly young women instinctually desirous of attention most easily attained by inventing of exaggerating issues in absence of other attributes that might garner attention.

In short the empathic charitable instinct that works so well in families and small groups is socially destructive and dysfunctional at scales beyond community level.

comment by ZY (AliceZ) · 2024-10-17T02:13:33.897Z · LW(p) · GW(p)

Would agree with most of the posts; To me, humans have some general shared experiences that may activate empathy related to those experiences, but the the numerous small differences in experience make it very hard to know exactly what the others would think/feel, even if in exactly the same situations.  We could never really model the entire learning/experience history from another person. 

My belief/additional point I want to add/urge is that this should not be interpreted as say empathy is not needed because we don't get it right anyways - it is more to recognize what we are not naturally good at empathy(or less well than we thought), and thus create mindsets/systems (such as asking and promoting on gathering more information about the other person) that encourage empathy consciously (when needed).