Signaling isn't about signaling, it's about Goodhart

valentine

Signaling isn't about signaling, it's about Goodhart

post by Valentine · 2022-01-06T18:49:48.534Z · LW · GW · 31 comments

31 comments

Epistemic status: Fuzzy conjecture in a faintly mathematically flavored way. Clear intuitions about Gears and a conclusion, but nothing like a formal proof or even formal definitions. Anecdotes offered to clarify the intuition rather than as an attempt at data. Plenty of room for development and increased rigor if so desired.

Suppose that for whatever reason, you want to convince someone (let's call them "Bob") that they can trust you.

I'd like to sketch two different strategy types for doing this:

You can try to figure out how Bob reads trust signals. Maybe you recognize that Bob is more likely to trust someone who brings a bottle of his favorite wine to the meeting because it signals thoughtfulness and attention. Maybe revealing something vulnerably helps Bob to relax. You're not really trying to deceive Bob per se here, but you recognize that in order for him to trust you you need to put some energy into showing him that he can trust you.
You make a point within yourself to be in fact worthy of Bob's trust. Then, without knowing how Bob will take it, you drop all attempts to signal anything about your trustworthiness or lack thereof. Instead you just let Bob come to whatever conclusion he's going to come to.

That second strategy might sound nuts.

Despite that, I claim it's actually almost strictly more effective.

If you see why, you probably have the bulk of my point.

I'll say a few more things to spell this out, together with some Gears I see and some implications.

A rephrasing of Goodhart's Law goes something like this:

The more explicit attention a signal gets, the more pressure there is to decouple it from what it's a signal of.

The mechanism is basically analogous to wireheading. If you get a reward for a signal happening, you're incentivized to find the cheapest way to make that signal happen.

Like when someone's trying to lose weight, so they make a point of weighing themselves first thing in the morning before drinking water and after using the toilet.

This might accidentally create some kind of standard baseline, but that isn't what's motivating the person to do this. They're trying to make the scale's numbers be lower.

Even weirder is when they stop drinking as much water because the scales reward them for that.

An often missed corollary of Goodhart — and basically the center of what I want to point at here — is this:

If you want a signal to retain its integrity, minimize attention on the signal.

To be maybe just a little more formal, by "attention" I mean something like incentive structures.

For instance, maybe the person who's trying to lose weight wants to live longer. In which case, inner work they can put into viewing the scales at an emotional/intuitive level as a flawed window into their health (instead of as a signal to optimize for) will help to ameliorate Goodhart drift.

And in fact, if they don't do this, they'll start to do crazy things like drink too little water, losing track of the "why". They'll hurt their health for the sake of a signal of health.

This means that stable use of signals of what you care about requires that you not care about the signal itself.

What's required for this person to be able to use the scales, recognizing that the number relates to something they care about, but without caring about the number itself?

That's a prerequisite question to answer for sober use of that tool.

Back to Bob.

Suppose I'm trying to sell Bob a used car. This introduces the classic "lemons problem".

In strategy #1, where I try to signal as clearly as I can to Bob that the car is good, maybe I show him papers from the mechanic I had check out the car. I let him look under the hood. I try to connect with him to show him that I'm relatable and don't have anything to hide.

Of course, Bob knows I'm a used car salesman, so he's suspicious. Did the paper come from a trustworthy mechanic? Would he be able to notice the real problem with the car by looking under the hood? Maybe I'm just being friendly in order to get him to let his guard down. Etc.

So if I notice this kind of resistance in Bob, I have to find ways to overcome them. Maybe I reassure him that the mechanic has been in business for decades, and that he can call them at this number right here and now if he likes.

But I know that if Bob leaves the lot without buying the car, he probably won't come back. So in fact I do want Bob to buy the car right now. And, I tell myself [LW · GW], Bob is in fact looking for a car, and I know this one to be good! So it's a good deal for both of us if I can just convince him!

Bob of course picks up on this pressure and resists more. I try to hide it, knowing this, although Bob intuitively knows that both the pressure and the attempt to hide it are things that a sleazy used car salesman would do too.

The problem here is Goodhart: to the extent that signals have decoupled from what they're "supposed to" signal, Bob can't trust that the signals aren't being used to deceive.

But I have a weird incentive here to get him to trust the signals anyway.

Maybe I bias toward signals that (a) are harder for a dishonest version of me to send and (b) that Bob can tell are harder for sleazy-me to send.

I just have to find those signals.

Right?

Here's strategy #2:

I know the car is good.

I look to Bob and say something like this:

"Hey. I know the car is good. I know you don't know that, and you don't know if you can trust me. Let me know what you need here to make a good decision. I'll see what I can do."

And I drop all effort to convince him.

All.

(How? By the same magic inner move that the person aiming for ~~weight loss~~ health improvement uses to drop caring about their scales' numbers. It's doable, I promise.)

If he has questions about the car, I can honestly just answer them based on whatever caused me to believe it's a good car.

This means that I and the car will incidentally offer immensely clear signals of the truth of the situation to Bob.

One result is that those signals that would be costly to sleazy-me to send would appear much, much more effortlessly here.

They just happen, because the emphasis is on letting truth speak simply for itself.

In the standard culture of business, this is less effective at causing purchases. Maybe more energy put into digging out what inspires my customers to buy would cause them to get excited more reliably.

But focusing on whether the person buys the car puts me in a Goodhart-like situation. I start attending to the signals Bob needs, which is the same kind of attention that sleazy-me would put into those same signals.

I'm not trying to give business advice per se. I have reason to think this actually works better in the long run for business, but that's not a crux for me.

Much more interesting to me is the way that lots of salespeople are annoying. People know this.

How do you be a non-annoying salesperson?

By dropping the effort to signal.

This also has a nice coordination effect:

If there's an answer to the lemons problem between me and Bob, it'll be much, much easier to find. All signals will align with cooperation because we will in fact be cooperating.

And if there isn't a solution, we correctly conclude that much, much more quickly and effortlessly.

No signaling arms races needed.

In practice, signal hacking just can't keep up with this kind of honest transparency.

If I want my girlfriend's parents to think I'll be good to her… well, I can just drop all attempts to convince them one way or the other and just be honest. If I'm right, they'll conclude the truth if they were capable of it.

…or I could go with the usual thing of worrying about it, coming up with a plan about what I'm going to tell them, hoping it impresses them, maybe asking her about what will really impact them, etc.

Even if this latter scenario works, it can't work as efficiently as dropping all effort to signal and just being honest does. The signals just automatically reflect reality in the latter case. Whereas I have to try to make the signals reflect the reality I want her parents to believe in, which I assume is the truth, in the former method.

The real cost (or challenge rather) of the "drop signaling" method is that in order for me to do it, I have to be willing to let her parents conclude the worst. I have to prefer that outcome if it's the natural result of letting reality reflect the truth without my meddling hands distorting things.

And that might be because maybe I'm actually bad for her, and they'll pick up on this.

Of course, maybe they're just pigheaded. But in which case I've just saved myself a ton of effort trying to convince them of something they were never going to believe anyway.

"But wait!" a thoughtful person might exclaim. "What if the default thing that happens from this approach isn't clear communication? What if because of others running manipulative strategies, you have to put some energy into signals in order for the truth to come out?"

Well, hypothetical thoughtful exclaimer, let me tell you:

I don't know.

…but I'm pretty sure this is an illusion.

This part is even fuzzier than the rest. So please bear with me here.

If I have to put effort into making you believe a signal over what directly reflects reality, then I'm encouraging you to make the same mistake that a manipulator would want you to make.

This means that even if this kind of move were necessary to get through someone's mental armor, on net it actually destabilizes the link between communication and grounded truth.

In a sense, I'm feeding psychopaths. I'm making their work easier.

Because of this, the person I'm talking to would be correct to trust my communication a little less just because of the method employed.

So on net, I think you end up quite a bit ahead if you let some of these communications fail instead of sacrificing pieces of your integrity to Goodhart's Demon.

The title is a tongue-in-cheek reference to the bit of Robin Hanson's memetic DNA that got into Less Wrong from the beginning:

"X isn't about X. X is about signaling."

I think this gives some wonderful insight into situations when examined from the outside.

I think it's often toxic and anti-helpful when used as an explicit method of navigating communication and coordination attempts. It usually introduces Goodhart drift.

Imagine I went to a used car sales lot and told the salesperson something like this:

"I'm interested in this car. I might buy it if you can convince me it's not a lemon even though I have reason not to trust you."

This seems very sensible on the surface. Maybe even honest and straightforward.

But now you've actually made it harder for the salesperson to drop focusing on signals. Most people have close zero idea that focusing on signals creates Goodhart drift (other than in platitudes like "Just be yourself"). So now you're in a signaling-and-detection arms race where you're adversely trying to sort out whether you two sincerely want to cooperate.

Compare with this:

"Hi! I'm interested in this car. Tell me about it?"

I think it's pretty easy to notice attempts to manipulate signals. If I were in this situation, I'd just keep sidestepping the signal manipulations and implicitly inviting (by example only!) the salesperson to meet me in clear honesty. If they can't or won't, then I'd probably decline to do business with them. I'd very likely be much more interested in living in this kind of clear integrity than I would be in the car!

(Or maybe I'd end up very confident I can see the truth despite the salesperson's distortions and feeling willing to take the risk. But that would be in spite of the salesperson, and it sure wouldn't have been because I invited them into a signaling skirmish.)

This picture suggests that what others choose to signal just isn't any of your business.

If you focus on others' signals, you either Goodhart yourself or play into signaling arms races.

Far, far simpler and more reliable is just trusting reality to reflect truth. You just keep looking at reality.

This might sound abstract. For what it's worth, I think Jacob Falkovich might be saying the same thing in his sequence on selfless dating [LW · GW]. The trend where people optimize for "fuckability instead of fucking" and end up frustrated that they're not getting sex is an example of this. Goodhart drift engendered by focusing on the signals instead of on reality.

(My understanding of) Jacob's solution is also a specific example of the general case.

If you try to signal "Hey, I'm hot!" in the language you think will be attractive to the kind of person you think will be attracted to that signal…

…well, the sort of person you'll draw is the one who needs you to put effort into that kind of signal.

(Here I'm assuming for simplicity that the goal is a long-term relationship.)

So now, every ounce of energy you put into sending that signal falls into one of two buckets:

It reflects reality, meaning you effortlessly would send that signal just by being transparently yourself. So the energy put into sending the signal is simply wasted and possibly anti-helpful (since it encourages you to mask the truth a little).
It's a bit off from reality, meaning you have to keep hiding the parts of you that don't match what your new partner thinks of you. (In practice this is rarely sustainable.)

So the solution is…

*drumroll*

…drop all effort to signal!

Yes, you might end up not attracting anyone. But if so, that is a correct reflection of you relative to the dating market. To do better you'd have to trick a potential partner (and possibly yourself).

Of course, maybe you'd rather be in a relationship made of signaling illusions than be alone.

That's up to you.

I'm just pointing out a principle.

What exactly does it mean to "drop all effort to signal"?

Honestly, I'm not sure.

I have a very clear intuition of it. I can feel it. I can notice cases where it happens and where it's not happening, and I can often mentally transform one into the other. I know a bunch of the inner work needed to do it.

But I don't know how to define it.

Hence the epistemic status of "fuzzy conjecture".

My hope is that this brings some thoughtfulness to discourse about "social signaling" and "social status" and all that. I keep seeing Goodhart drift in those areas due to missing this vision. Hopefully this will bring a little more awareness to those corners of discussion.

It's also something I'm working on embodying. This ties clearly to how much care and thoughtfulness goes into communication: "Oh dear, what will people think of this?" That seems like it can be helpful for making communication clearer — but it also acts as bait for Goodhart's Demon.

I don't know how to resolve that just yet.

I hope I will soon.

31 comments

Comments sorted by top scores.

comment by Raymond D · 2022-01-06T22:21:25.817Z · LW(p) · GW(p)

My main takeaway from this post is that it's important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.

It's tricky, though, because obviously you want to be paying attention to what signals you're giving off, and how they differ from the signals you'd like to be giving off, and sometimes you do just have to try to change them.

For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.

But I think what you're talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere. So I guess the big question is, which things do you stop trying to do?

(Also, I notice I'm now overthinking editing this comment because I've switched gears from 'what am I trying to say' to 'what will people interpret from this'. Time to submit, I guess.)

Replies from: Valentine

↑ comment by Valentine · 2022-01-07T13:36:57.900Z · LW(p) · GW(p)

My main takeaway from this post is that it's important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.

That is a wonderful summary.

For instance, I make more of an effort now than I used to, to notice when I appreciate what people are doing, and tell them, so that they know I care. And I think this has basically been very good. This is very much not me dropping all effort to signal.
But I think what you're talking about is very applicable here, because if I were just trying to maximise that signal, I would probably just make up compliments, and this would probably be obviously insincere.

Yep.

There's an area of fuzz for me here that matters. I don't intellectually know how to navigate it.

A much more blatant example is with choosing a language. Right now I'm in Mexico. Often I'll talk to the person behind the counter in Spanish. Why? Because they'll understand me better. If they don't speak English, it's sort of pointless to try to communicate in English.

This is totally shaping my behavior to impact the other person.

But it's… different. It's really different. I can tell the difference intuitively. I just don't know what the difference really is.

I notice that your example absolutely hits my sense of "Oh, no, this is invoking the Goodhart thing." It seems innocent enough… but where my eyes drift to is: Why do you have to "make more of an effort now than [you] used to"? If I feel care for someone, and I notice that my sharing it lets them feel it more readily, and that strikes me as good, then I don't have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I'm thirsty just happens.

I would interpret that effort as maintaining behavior in the face of not having taken the truth all the way into your body. Something like… you understand that people need to hear your appreciation in order to feel your care, but you haven't grokked it yet. You can still manipulate your own behavior without grokking, but it really is self-manipulation based on a mental idea of how you need to behave in order to achieve some imagined goal.

(I want to acknowledge that I'm reading a lot into a short statement. If I've totally misread you here, please take this as a fictional example. I don't mean any of this as a critique of your behavior or choices.)

I'd like to extend your example a bit to point out what I can see going wrong here.

Suppose a fictional version of you in fact doesn't care about these others and is only interested in how he benefits from others' actions. And maybe he recognizes that his "appreciation", if nakedly seen, would cause these people to (correctly!) feel dehumanized. This fictional you would therefore need to control his signals and make his appreciation come across as genuine in order to get the results he wants.

If he could, he might even want to convince himself of his sincerity so that his signal hacking is even harder to detect.

(I think of that as "Newcomblike self-deception".)

The fact that fictional you could be operating like this means that hacking your own signal is itself a subtle meta-signal that you might be this fictional version of you. The default thing people seem to try to do to get around this is to distract people with the volume of the signal. ("Oh, wow! This is sooo amazing! Thank you so, so, SO much!") This is the "feeding psychopaths" thing I mentioned.

If you happen to never notice and fear this, and the people you're expressing appreciation for never pick up on this, then you accidentally end up in a happy equilibrium.

(…although I think people pick up on this stuff pretty automatically and just try to be numb to it. Most people seem to be manipulating their signals at one another all the time, which sometimes requires signaling that they're not noticing what the other is signaling.)

It's just very unstable. All it takes is one misstep somewhere. One flicker of worry. And if it happens to hit someone where they're emotionally sensitive… KABLOOEY! Signaling arms race.

Whereas if you put your attention on grokking the thing and then letting people have whatever impression of you they're going to have, you end up in an immensely stable equilibrium. Your appreciation becomes transparent because you are transparent and you in fact appreciate them.

(…with a caveat here around the analog of learning Spanish. Which, again, I can feel but don't understand yet.)

So I guess the big question is, which things do you stop trying to do?

I agree. That's the big question. I don't know. But I like you bringing it up explicitly.

Replies from: AllAmericanBreakfast

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2022-01-07T14:56:44.724Z · LW(p) · GW(p)

First, I will summarize what seems to be your core thesis.

In a comment below, Dagon said:

I don't think you're "dropping all effort" to signal, you're rather getting good at signaling, by actually being truthful and information-focused.

You reply:

I agree with what I think you're saying. I think there's been a definitional sliding here. When I say "Drop all effort to signal", I'm describing the experience on the inside. I think you're saying that from the outside, signaling is still happening, and the benefits of "dropping all effort to signal" can be understood in signaling terms.
I agree with that.

So you seem to be focused in this post on ways to generate signals. You seem to suggest that there are two broad strategies, and that we have a choice about which to engage in:

Authentic ("drop all effort to signal!", "I feel care for someone", "I don't have to put in effort. It just happens, kind of like drinking water from my cup in my hand when I'm thirsty just happens.", "having taken the truth all the way into your body", "grokked it", "transparent").
Shallow acting ("the Goodhart thing", "seems innocent enough", "make more of an effort", "in fact doesn't care about these others and is only interested in how he benefits from others' actions", ""appreciation"", "(correctly!) feel dehumanized", "control his signals", "come across as genuine", ""feeding psychopaths"", "manipulating their signals").
Method acting. "Hacking your own signal" can allow acting to perhaps ("people pick up on this stuff pretty automatically") accurately produce the same results as authentic signal generation, as long as "you happen to never notice and fear this, and the people you're expressing appreciation for never pick up on this." This is probably a temporary success at best ("It's just very unstable").

You also claim that authentic signal generation is "almost strictly more effective," because "If there's an answer... it'll be much, much easier to find. If there isn't a solution, we correctly conclude that much, much more quickly and effortlessly," and also because "... you end up quite a bit ahead if you let some of these communications fail instead of sacrificing pieces of your integrity to Goodhart's Demon."

It feels like there's also an implication that authentic signal generation is more virtuous than acting, but that's never explicitly stated.

So as a summary of these claims:

Signals can be generated via authenticity, method acting, or shallow acting. Authenticity is typically more effective and reliable than either form of acting at achieving the results you want, and is also perhaps more virtuous.

This tends to frame authenticity and acting as opposites ("just be yourself"). Others, of course, frame acting as a means by which we can achieve authenticity ("fake it 'til you make it"). Here's the first convenient post on Psychology Today, which says "However, it turns out that the relationship between your emotions and your behavior is a little more reciprocal than that. This means that if you force a smile when you are feeling down, you will lift your mood, and alternatively, if you frown when you are happy, you will feel down."

So a more charitable explanation for Raymond's response is that Raymond is trying to "fake it 'til he makes it." By "making more of an effort," he is trying to cultivate authenticity. It might be wise to affirm that possibility, or to make an argument directly against the "fake it 'til you make it" strategy if that is your intention.

Of course, "authenticity" and "caring" are ill-defined terms, referring both to a short-term emotional state and a longer-term intention or sense of meaning about a relationship. Pinning down which is meant might allow us to draw upon the psychology literature to see if there is any strong consensus on whether or not "fake it 'til you make it" is an effective way to alter one's own internal state or as a method of creating perceptions in other people. My prior is that it's unlikely that a sufficiently broad and deep evidence base exists to conclusively answer these questions conclusively, but that there's at least some evidence in favor of some versions of the "fake it 'til you make it" strategy in some contexts.

Replies from: ChristianKl, Valentine

↑ comment by ChristianKl · 2022-01-08T09:52:10.020Z · LW(p) · GW(p)

Method acting. "Hacking your own signal" can allow acting to perhaps ("people pick up on this stuff pretty automatically") accurately produce the same results as authentic signal generation, as long as "you happen to never notice and fear this, and the people you're expressing appreciation for never pick up on this." This is probably a temporary success at best ("It's just very unstable").

I know one local rationalist who does method acting all the time. One of the main reasons he does it is because it's a way to disassociate chronic physical pain. That means he does it very more consistently than someone who just does it in some social situations to get signaling benefits. I'll call him Bob for this story.

One time I had a conversation with Alice and Bob and Alice remarked that Bob is hard to read and mysterious because the emotions she reads in him don't seem to translate into direct action. Then I said "Of course not, he does his acting thing [I had deeper conversations with him before]" and then Bob clarified and said, "It's method acting".

Alice and Bob later got together into a relationship.

↑ comment by Valentine · 2022-01-14T14:23:55.995Z · LW(p) · GW(p)

So you seem to be focused in this post on ways to generate signals.

No. I'm focused on how attention to signals tends to create Goodhart drift.

Replies from: AllAmericanBreakfast

↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2022-01-14T15:01:50.696Z · LW(p) · GW(p)

Right, I am including that aspect in my summary. I put it in different words ("shallow acting" vs. "attention to signals") to make the concepts a little easier to work with for my style of writing.

My impression was that you agreed that dropping efforts to signal brought the main benefits you're concerned with here, by changing the signals you're sending, typically in ways that come across as more trustworthy. Here are some additional quotes that gave me this impression:

Maybe I bias toward signals that (a) are harder for a dishonest version of me to send and (b) that Bob can tell are harder for sleazy-me to send.
One result is that those signals that would be costly to sleazy-me to send would appear much, much more effortlessly here... They just happen, because the emphasis is on letting truth speak simply for itself.
Even if this latter scenario works, it can't work as efficiently as dropping all effort to signal and just being honest does. The signals just automatically reflect reality in the latter case.
Raymond D: My main takeaway from this post is that it's important to distinguish between sending signals and trying to send signals, because the latter often leads to goodharting.
You: That is a wonderful summary.

As such, I interpret "drop all efforts to signal," which I labeled as "authenticity," as an approach to generating signals, which you're claiming is morally and instrumentally better than signal manipulation (labeled "shallow acting"). You claim that what makes shallow acting/attention to signals problematic is that it "tends to create Goodhart drift," and alleviating this problem is what makes dropping efforts to signal a superior way to generate signals. The brevity of your response makes me think you perceive me as having fundamentally misunderstood your post, but it doesn't give me a lot to go on as far as updating my understanding, if so.

comment by Odd anon · 2022-01-06T23:43:58.385Z · LW(p) · GW(p)

I am strongly reminded of the descriptions of the "upper class" in ACX's review of Fussel: "[T]he upper class doesn't worry about status because that would imply they have something to prove, which they don't.", and therefore they are extremely meticulous in making sure that nothing they do looks like signalling, ever, because otherwise people might think they have something to prove (which they don't). Boring parties, specifically non-ostentatious mansions, food which is just bland enough to avoid being too good and look like they're trying to show something, etc.

This kind of thing does happen. A group thinks they can be above signalling, and starts avoiding any attempts to signal, and then everyone notices that visible attempts to signal are bad signalling. Good signalling is looking like you're not trying to signal. And then the game starts all over again, only with yet another level of convoluted rules.

Replies from: Valentine

↑ comment by Valentine · 2022-01-07T14:18:41.243Z · LW(p) · GW(p)

That's a really good point. It's like stealth obsession with signaling, because there's a need to not signal.

This in turn reminds me of how beginning statistics students often confuse independence and anti-correlation. I'm trying to point at the analog of independence, but if folk who feel compelled that I'm pointing at something real don't grok what I'm pointing at, they're likely to land on the analog of anti-correlation.

comment by Dagon · 2022-01-06T20:53:14.189Z · LW(p) · GW(p)

I don't think you're "dropping all effort" to signal, you're rather getting good at signaling, by actually being truthful and information-focused. The useful signals are those which are difficult/expensive to fake and cheap(er) to display truthfully.

When you say to Bob "Let me know what you need here to make a good decision. I'll see what I can do", THAT is a great signal, in that it's a request for Bob to tell you what further signals he wants, and an indication that you intend to provide them, even if they'd be difficult to fake.

I really like the insight that signaling is related to goodheart - both are problematic due to the necessity of using proxies for the actual desired outcomes. I don't think we can go so far as to say they're equivalent, just that signaling is yet another domain subject to goodheart's law.

Replies from: Valentine

↑ comment by Valentine · 2022-01-07T13:13:35.867Z · LW(p) · GW(p)

I don't think you're "dropping all effort" to signal, you're rather getting good at signaling, by actually being truthful and information-focused.

…which is much more likely to fail if I think of it like this while doing it.

I agree with what I think you're saying. I think there's been a definitional sliding here. When I say "Drop all effort to signal", I'm describing the experience on the inside. I think you're saying that from the outside, signaling is still happening, and the benefits of "dropping all effort to signal" can be understood in signaling terms.

I agree with that.

I'm just suggesting that in practice, the experience on the inside is of turning attention away from signals and entirely toward a plain and simple attention on what is.

I don't think we can go so far as to say they're equivalent, just that signaling is yet another domain subject to goodheart's law.

I agree. I didn't mean to imply otherwise.

(I imagine this is a reaction to the title? That was tongue-in-cheek. I said so, though maybe you missed it. It was meant to artistically gesture at the thesis in an entertaining way rather than as a truth statement accurately summarizing the point.)

Replies from: Dagon

↑ comment by Dagon · 2022-01-07T17:05:23.332Z · LW(p) · GW(p)

Ah, that's a very helpful clarification, and a distinction that I missed on first reading. I absolutely agree that a focus on the underlying behaviors and true good intents yields better results (both better signals and better outcomes, and most importantly, for many, is personally more satisfying) than trying to consciously work out the best signals.

I'm not sure it's feasible to totally forget the signaling portion of your interactions - knowing about it MAY be helpful in choosing some marginal actions or statements, and it's certainly valuable in interpreting others' statements and behaviors. But I'm with you that the vast majority of your life should be driven by actual good intents.

I kind of wonder how much of this (and other goodheart-related topics) is a question of complexity vs legibility, seen vs unseen. When you introspect and consider signaling, there's a pretty limited set of factors you can model and consider. When you just try to do something, there's a lot of unspecified consideration that goes into it.

comment by hath · 2022-01-07T00:25:36.115Z · LW(p) · GW(p)

Cal Newport has a similar stance on college admissions. The main point of his book on high school is that pursuing the things that you're actually interested in, and accomplishing something noteworthy with that interest, is a significantly better strategy for getting into college than loading your resume with extracurriculars/APs. That's effectively dropping all effort to signal your ability to Do Things and focusing instead on actually Doing Things. Zvi's post on The Thing and the Symbolic Representation of The Thing is relevant here as well.

The book also advocated some particular signaling strategies--a memorable one was "accomplishments that another person doesn't immediately see how they themselves could do, like writing a book, are especially well-regarded".

comment by Raemon · 2022-01-07T20:15:37.273Z · LW(p) · GW(p)

I like the general direction here. The exact wording of the post feels... a little wish-fulfilly to me.

Three different things that seem noteworthy:

I think there are some reasons to think "just doing your thing and letting people decide how much to trust you" should work at least pretty well. It's possible that it actually works better than actively strategically signaling, but that's a pretty strong claim.
Separately, there are reasons that "just doing your thing and letting people decide how much to trust you" contributes to a kind of epistemic commons.
Thirdly, I sure do have a strong aesthetic preference for this way of doing things.

But I think the arguments for those are fairly separate, and I'm worried about bundling them up the way they seem to be here.

comment by Alex Vermillion (tomcatfish) · 2022-01-08T06:06:15.598Z · LW(p) · GW(p)

This feels clearly wrong to me, so I think I've either misunderstood you or you are trusting honesty too much.

It might be easier to be honest than to signal correctly, but it can't be "almost strictly more effective."

To be honest and give the presentation of being "good", you have to do both of those things, which aren't perfectly correlated. To be dishonest and give the presentation of being "good", you only need to lie well. The skill "give the presentation of being 'good'" is in both sets, only the honest person also has to actually be "good". They're at a strict _dis_advantage.

In another sense, you can note that the dishonest person can be arbitrarily dishonest, meaning that they should be capable of any signal the honest person is, if they have enough resources. In that sense, it may not be cost effective to pretend to have a time machine, but it's easier than having a time machine.

Also, it may be that all other "actually good" people reliably signal. In that sense, you are signaling goodness worse than a liar.

I think these are 3 good reasons to suspect that this post isn't pointing at something as solid as it claims.

comment by Gordon Seidoh Worley (gworley) · 2022-01-08T04:54:31.575Z · LW(p) · GW(p)

I think a lot of the reason people are desperate to signal is because they are desperate: they need something to happen to feel safe, secure, fulfilled, or whatever and so they greedily grasp for it. Not doing that requires being sufficiently content with the world being as it is to not try to force it to be a particular way, but getting to a place where one has that kind of security is quite hard.

comment by knite · 2022-01-12T21:48:24.143Z · LW(p) · GW(p)

TLDR: Honesty is the best policy, and don't be a try-hard.

comment by Richard_Ngo (ricraz) · 2022-01-06T19:38:20.182Z · LW(p) · GW(p)

My immediate reaction is that there's no strong distinction between "signalling" and, say, "politeness" or "friendliness", and so it'll be quite difficult to drop all effort to signal (and probably lead to a bunch of bad consequences, like hostility). But I'm pretty uncertain, and I like the considerations the post brings up.

Replies from: Valentine

↑ comment by Valentine · 2022-01-07T14:13:43.175Z · LW(p) · GW(p)

Yep, I'm pretty uncertain too.

I think that at least some politeness falls more under the category of language. Like, I'm in Mexico, and it's often helpful for me to switch to Spanish. I'm totally manipulating my signals there, but it seems… fine? Like I just don't see the Goodhart pressure appearing there at all. Saying "Gracias, ¡hasta luego!" instead of "Thank you, have a good day!" seems perfectly fine.

But some politeness very much does introduce Goodhart drift. "How dare you say that?! That's so rude!" This is a weird signal suppression system that introduces what some folks near Toronto coined as "untalkaboutability" (read as: "un-talk-about-ability").

Likewise with pretending to be friendly. Lots of shop owners here will call out to me as I pass saying something like "Hey! Hey there my friend! Tell me, where are you from?" The context makes it pretty obvious that they're being friendly to hook me into their shop. But the reason the hook works at all is because of the plausible deniability that that's their purpose. "Oh, don't be like that! I'm just being friendly!" This is weaponization of signals of friendliness, which is possible because of the Goodhart drift applied to those signals.

But yeah, I have a question around language here, and cultural standards. Like shaking hands in North America vs. bowing in Japan. This is actually a better edge case than is Spanish: It seems fine to recognize and act on the cultural difference…

…unless I switch because I'm trying to make others feel more comfortable. At that point I'm focusing on the signal in order to manipulate the other, which starts to introduce Goodhart drift. The fact that my intentions are good or that this is common doesn't save the signal from Goodhart's Demon.

Whereas if I can focus on grokking the cultural difference, and then set that entirely aside and do what I feel like doing… I think something like that naturally results in the politeness that matters.

Replies from: ricraz

↑ comment by Richard_Ngo (ricraz) · 2022-01-07T14:45:58.549Z · LW(p) · GW(p)

It seems fine to recognize and act on the cultural difference…unless I switch because I'm trying to make others feel more comfortable.

Isn't the main point of acting on cultural differences to make others feel more comfortable? Or to show that you're interested in/you care about their culture?

Also, another weird case I just thought of: one of the biggest functions of clothing is signalling. Probably most people should lean towards wearing what they feel like more, but having this as a general policy might be quite costly, because people judge a lot based on clothing.

I wonder if the underlying disagreement here is: you're saying not to do things which consciously seem like signalling. And you say things like:

I have a very clear intuition of it. I can feel it. I can notice cases where it happens and where it's not happening, and I can often mentally transform one into the other. I know a bunch of the inner work needed to do it.

But I don't believe this claim from you, because I think that a large proportion of signalling involves unconscious calculations or self-deception, and it takes a huge amount of work to make those explicit. So the category of "signalling" may, because of that, seem more pervasive and deeper-rooted to me than it does to you.

Replies from: ChristianKl, Valentine

↑ comment by ChristianKl · 2022-01-08T10:29:55.058Z · LW(p) · GW(p)

But I don't believe this claim from you, because I think that a large proportion of signalling involves unconscious calculations or self-deception, and it takes a huge amount of work to make those explicit.

Valentine is someone who spent a huge amount of work on that. He was CFAR's head of curriculum. Later, he spent a lot of time meditating. Valentine is not someone who speaks here from a place of not having put in the work.

↑ comment by Valentine · 2022-01-07T16:03:24.758Z · LW(p) · GW(p)

Isn't the main point of acting on cultural differences to make others feel more comfortable? Or to show that you're interested in/you care about their culture?

As viewed from the outside, yes.

I think navigating this truthfully feels different from that analysis on the inside though.

If I think "I'm going to make these people feel comfortable by matching their cultural norms", this can often create the opposite effect. I described the dynamics of this in the OP.

The reason those norms help put people at ease is because of what they imply (signal) about a certain quality of attention and compatibility you're bringing. If you just are attentive then that'll emerge naturally. No reason to think explicitly about the norms.

This is a little like noticing how all things about love and romance are ultimately about sex, but how thinking about it that way can actually jam their ability to function properly. This isn't to deny the centrality of evolutionary forces. It's noticing how thinking about those forces while inside them can create loops that bring in influences you may not want. Hence the "Just be yourself" advice.

Probably most people should lean towards wearing what they feel like more, but having this as a general policy might be quite costly, because people judge a lot based on clothing.

Yep. And if you focus your attention on other people's judgments this way, you totally summon Goodhart's Demon.

So which do you want? The risk of paying a social cost for a while, or the risk of floating along in Goodhart drift?

[…] I think that a large proportion of signalling involves unconscious calculations or self-deception, and it takes a huge amount of work to make those explicit. So the category of "signalling" may, because of that, seem more pervasive and deeper-rooted to me than it does to you.

That's not what's going on here.

I'm guessing you think I'm talking about actually in fact dropping all signaling.

That's definitely not what I mean. That doesn't make sense to me. It'd be on par with "Stop being affected by physics."

When I say "Drop attempts to signal", I'm describing the subjective experience of enacting this shift as I currently understand it.

I mean the thing where, when sitting across from someone on a first date, I can track the thoughts that are about "making a good impression" and either lean into them or sort of drop them. The first one structurally creates problems. The second is less likely to.

On the inside it feels like going in the direction of just not caring about what impressions I do or don't give her. Which is to say, on the inside it feels like dropping all attempts to signal.

But of course my body language and word choice and dress and so on will signal all kinds of things to her. I haven't actually dropped all signaling, or even subconscious attempts to signal.

It's just that by pointing this optimization force away from those signals, I can encourage them to reflect reality instead of the (possibly false) image of myself a part of me wants her to see.

And by holding such a policy in myself, the signals I end up sending will always systematically (at least in the limit) align with the truth of this transparency. Signaling non-deception by not deceiving. Focus — even subconscious — on signals just can't beat this strategy for fidelity of transmission best as I can tell.

Which is to say, the strategy of "Drop all attempts to signal" is a signaling strategy.

…at least in one analysis. Because thinking of it that way makes it harder to use, it helps to reframe it.

But my guess is that this resolves the difference in perspective here between you and me. Yes?

Replies from: jimmy

↑ comment by jimmy · 2022-01-07T20:04:17.102Z · LW(p) · GW(p)

>>…unless I switch because I'm trying to make others feel more comfortable.
>Isn't the main point of acting on cultural differences to make others feel more comfortable? Or to show that you're interested in/you care about their culture?

As viewed from the outside, yes.

I think a better way of saying it is "...unless I switch because I'm trying to avoid making people uncomfortable".

There are all sorts of instrumental goals like "making people feel comfortable" which can be valid to focus on in the moment, provided that it's context appropriate and appropriately delimited. For example, to hit your target with a rifle you might focus on bringing the cross hairs over the bullseye... unless your scope isn't sighted in, or you're far enough that there's windage and drop to account for, etc. If you're familiar enough with long range rifle shooting you'll factor in drop and "kentucky windage" intuitively, yet at close range your mind is going to be only on bringing the cross hairs to the target, and that's fine. Similarly, you can absolutely aim to "make people comfortable", so long as you are aware of the limitations of this alignment and don't get stuck when it doesn't fit. So long as you're happy to provoke temporary discomfort when it's necessary AND so long as "make them comfortable" translates automatically with "be nonthreatening" AND "be nonthreatening" includes the self awareness that if the other person looks afraid, your self perception of "non-threatening" can't be trusted and you have to actually look inwards to address potential threats until either you find a problem to fix or else seeing you do so causes them to feel comforted and stop giving you that error signal... then you're fine.

The problem comes in when you try to "get away", because almost everything that succeeds at "getting away" from a particular stimulus fails to get towards the actual goal. As a physical analogy, it's hard to "push rope" because there's nothing constraining it to that particular away and it just buckles towards any of the easier ones. You actually can push on similarly flexible throttle cables, but only to the extent that they're tightly encased in stiff shrouds which restrict the directions of "away" that work. It's a fundamentally unstable thing, and if you try to "get away from them being uncomfortable", you have to be damn sure you're restraining the buckling mode of "get away from them showing discomfort, by hiding it instead", and any other potential buckling modes. That's not what you want anyway, so better just to pull towards the actual goal, as best as you can identify it. You don't actually know what this is, and trying to get towards an ill defined thing helps you notice when it's not defined enough, and that further focusing on what you want is needed.

Dacyn's comment and your distillation is relevant here: "would you rather self deceive or [have unacceptably bad thing happen]?"

To the extent that the consequence is actually unacceptably bad, and the hypothetical actually free of third options, then you gotta choose to not die. In everything else, it's an open question of whether you can afford enough slack to accept risking the bad thing, and whether you can find a non-self-deceptive option that runs the risk down low enough. This gets especially bad when people lack the concept that "discomfort can be necessary and good", because then you can't even run the calculation and have to always err on the side of "not taking risks" and never getting the second marshmallow. If this is the case, then the more sensitive you are (in the "instrumentation" sense, where "sensitive" is good), then the more pathological this becomes.

For example, if you're sensitive to disapproval from your hypothetical girlfriend's parents, you try to "stay polite" and "not be rude or disrespectful" which totally sound like good things and you can easily convince yourself that they are unalloyed good... except that "treating someone like they can't handle a little offense" is actually pretty disrespectful too, and being the kind of person who is afraid to say necessary things when they're "slightly uncomfortable" isn't how you take care of your girlfriend and isn't how you gain her parents respect and approval. Crank this to 11, and what do you see happening?

In real life, there's no such thing as "you either have to self deceive or you die", but there are situations where figuring out how stay honest and not die is beyond your ability, or "not worth the effort", or maybe just "not something you currently see how to do". Having your cake and eating it too is always better though, and these skills do generalize a good deal, so its worth putting some work into holding yourself to "hard mode" and developing both the skills to make it economical and the mental fortitude to keep the option on the table.

I generally refer to this as being "security limited". If you're insecure and "need" approval of your girlfriend's parents, then you can't do things that risk not getting it, and you're running away from disapproval ("death"). If you're secure enough to take these risks, then you can remain faithful to your goals of being good for your girlfriend (even when it risks her parents' disapproval, at least in the short term), and also your goals of being properly recognized as such because those two goals are not fundamentally and unchangeably misaligned. It applies far beyond what most people would recognize as "insecurity driven", but it's actually the same damn thing, and without a good understanding of the pattern you're trying to match to (and a way of handling this information that doesn't make awareness costly), it often flies beneath detection.

comment by ChristianKl · 2022-01-08T09:30:53.530Z · LW(p) · GW(p)

One problem is that there are political environments where people care about their allies signaling the right things. If you are in one of those environments you become untrustworthy if you don't play the signaling game. This goes for moral mazes, party politics, and also various other social spheres.

When I was at Toastmasters one of the people was regularly giving workshops. After she was given a workshop one of the people that paid her complained that the car she was driving was too cheap for the status that would be expected for her role. In that environment is just not possible to easily opt-out of the signaling games.

One interesting question is whether people can be successful within the EA ecosystem without engaging in signaling EA things.

comment by Rudi C (rudi-c) · 2022-01-08T01:30:32.648Z · LW(p) · GW(p)

This post is simplistic and vague. E.g., does the OP think dressing in dirty, shabby clothes (which, from a nonsignalling perspective, aren’t a negative in our environment) is not an obvious failure of marketing that leads to lost opportunities?

Anyhow, caring about signaling is a priority in most cultures that I have glimpsed, almost all businesses, and human nature. Against such a strong prior, the OP fails to provide any strong updates to the contrary.

comment by Dave Lindbergh (dave-lindbergh) · 2022-01-06T22:14:01.806Z · LW(p) · GW(p)

In competitive situations where there's lots of optimization experience this tends to be a good strategy. People have been selling and buying used cars for 100 years - all the tricks and counter-tricks have been worked out, and pretty much cancel each other out.

So by not making any special attempt to signal and just being honest, you save the costs of all that signaling. And the other party saves costs protecting themselves from the false signals. Putting you both ahead.

comment by MSRayne · 2022-07-27T19:02:24.839Z · LW(p) · GW(p)

I think this is very virtuous but also utterly useless. If "just tell the honest truth and don't signal" worked, Andrew Yang would be president. People prefer simple, pleasant lies to complicated, unpleasant truths. The only kind of person who benefits from this not-signalling is someone whose only goal is to be honest. If you want to actually do things, you have to compete in the mind manipulation war with everyone else.

comment by Dacyn · 2022-01-07T14:22:59.832Z · LW(p) · GW(p)

This is all true.

And yet.

Very rarely, but I would guess at least once in your life, you will be faced with a decision whose outcome is so important that all of this is stripped away, like rationalization. At which point you are faced with the decision: to signal, or not to signal. But it is not clear which choice corresponds to which outcome: does treating it as a signal correspond to signalling more, or to signalling less, as suggested by Dagon's comment?

Would you rather be trustworthy, or trusted?

The OP suggests that maybe we can have both, but what if that's not always the case? And what if the outcome you get, is the exact opposite of the outcome you choose?

I have no good solutions here, my stopgaps are "double-check important decisions through people you trust" and "cooperate with yourself, even if you seem like a selfish person".

Replies from: ckai, Valentine

↑ comment by ckai · 2022-01-08T16:52:58.809Z · LW(p) · GW(p)

If I understand the sort of thing you're talking about correctly, I like Miles Vorkosigan's solution (from Memory, by Lois McMaster Bujold):

"The one thing you can't trade for your heart's desire is your heart."

↑ comment by Valentine · 2022-01-07T16:14:43.052Z · LW(p) · GW(p)

A different frame on what I see as the same puzzle:

If faced with the choice, would you rather self-deceive, or die?

It sure looks like the sane choice is self-deception. You might be able to unwind that over time, whereas death is hard to recover from.

Sadly, this means you can be manipulated and confused via the right kind of threat, and it'll be harder and harder for you over time to notice these confusions.

You can even get so confused you don't actually recognize what is and isn't death — which means that malicious (to you) forces can have some sway over the process of your own self-deception.

It's a bit like the logic of "Don't negotiate with terrorists":

The more scenarios in which you can precommit to choosing death over self-deception, the less incentive any force will have to try to present you with such a choice, and thus the more reliably clear your thinking will be (at least on this axis).

It just means you sincerely have to be willing to choose to die.

Replies from: Dacyn

↑ comment by Dacyn · 2022-01-07T17:03:33.613Z · LW(p) · GW(p)

Hmm, I am trying to see if it is really the same puzzle?? The self-deception I see, since if you get the opposite of whatever you choose then it motivates you to self-deceive so that you'll choose the opposite of whatever you want to get. But then why is the alternative death? Ah well, maybe it'll make sense to me later.

comment by avturchin · 2022-01-07T10:14:04.412Z · LW(p) · GW(p)

What is rational behaviour for a rich person – is signalling for a poor. Imaging that a rich person chose the best car for his needs and it is, say, a 50K car. A poor person who want to look rich will also buy the same car in leasing.

Signaling isn't about signaling, it's about Goodhart

Contents

31 comments