Adversarial Priors: Not Paying People to Lie to You

post by eva_ · 2022-11-10T02:29:12.708Z · LW · GW · 9 comments

Contents

  Assertion: An Ideal Agent never pays people to lie to them.
None
9 comments

Reply to Desiderata for an Adversarial Prior [LW · GW]

 

Assertion: An Ideal Agent never pays people to lie to them.

This seems sensible, only a very foolish person would knowingly incentivise dishonesty in others, but what does it actually mean in practice?

  1. You can't use unverifiable information obtained from a single person or from a faction of possibly-conspiring people in any way that benefits that person or faction in the hypothetical where the information is false. Otherwise, they're incentivised to give you the unverifiable and false information to motivate you to do that, and so you'd be paying them to lie to you.
  2. You can't use any information, even verifiable information, obtained from a single person or from a faction of actually-conspiring people in any way that harms that person or faction in the hypothetical where it is true. Otherwise they'd just not tell you, and you'd be paying them to dishonestly shut up.

If everyone followed (2), you could freely go around saying the truth no matter what and expect no personal negative consequences. This would maximise public knowledge (your information can still be used to the benefit or detriment of other people), and people would be better off:

If everyone followed (1), lying would be pointless because even though everyone believes you, they'll never believe you in a way that corresponds to doing something that benefits you. This condition imposes a lot of much stranger outcomes:

This seems particularly terrible, and the whole "refusing to be exploitable regardless of prior probability" seems like a step way too far. It's the kind of logic that leads to saying "since the murderer won't confess, and I want the murderer to be executed, we'll just have to execute everyone to make sure he didn't benefit by not confessing". That's a lot of utility you're destroying just to avoid ever paying people to lie to you.

If we consider the two piles of utility:

It would seem like there's some ideal resistence-to-exploitation threshold that minimises total expected utility lost. If someone is sufficiently unable to verify things themselves, the price of believing anyone ever is the corresponding utility of setting a threshold that lets you believe them, and the expected exploitation you'll be exposed to as a result.

Naturally, other people can't help you pick this threshold except with arguments you can personally verify, because they're obviously incentivised to convince you to be more trusting so that you'll believe / be exploitable by them.

9 comments

Comments sorted by top scores.

comment by Sinclair Chen (sinclair-chen) · 2023-07-14T21:35:35.877Z · LW(p) · GW(p)

Interesting. In practice I try (not that successfully) to not punish people who tell me the truth. It requires reframing insults and bad news in a positive light, which is hard.

I think in practice people don't really listen to most sales pitches, and pitches that involve something objective and verifiable do better.  Alex Hormozi talks about a method of putting metrics into advertising like "X% of our customers last month increased their revenue by Y%" - it's just literally true, can be checked, and cannot be copied by your competitors unless they are actually better than you.

comment by ZT5 · 2022-11-11T19:44:16.484Z · LW(p) · GW(p)

Interesting post, thank you!

So, just to be clear.

This post explores the hypothetical/assertion that "an ideal agent shouldn't incentivize other agents to lie to it, by believing their lies".

However, it turns out that the costs of being completely inexploitable so are higher than the costs of being (at least somewhat) exploitable.

It all adds up to normality [? · GW]: there is a rational reason we, at least occasionally, believe what other people say. The assertion above has been disproven. 

Have I correctly understood what you are saying?

(I apologize if this conclusion is meant to be obvious; I found myself somewhat confused whether that indeed is the conclusion, so I would like to verify my understanding)

Replies from: eva_
comment by eva_ · 2022-11-12T02:56:45.458Z · LW(p) · GW(p)

Yes, that's the intended point, and probably a better way of phrasing it. I am concluding against the initial assertion, and claiming that it does make sense to trust people in some situations even though you're implementing a strategy that isn't completely immune to exploitation.

comment by ShowMeTheProbability · 2022-11-10T02:50:34.928Z · LW(p) · GW(p)

Assertion: An Ideal Agent never pays people to lie to them.

 

What if an agent has built a lie-detector and wants to test it out? I expect thats a circumstance where you want somone to lie to you consistently and on demand.

Whats the core real-world situation you are trying to address here?

Replies from: gwern, eva_
comment by gwern · 2022-11-10T03:11:50.008Z · LW(p) · GW(p)

I can think instantly of at least two useful cases where a fully rational intelligent person fully informed of the situation and premeditating it, would nevertheless still want to pay people to lie to them; and not in any tendentious meaning of 'lie' ("you pay artists to lie to you!"), but full outright deception in causing you to believe false facts about them, which you will then always believe*: pentesting and security testing where they deceive you into thinking they're authorized personnel etc, and 'randomized response technique' survey techniques on dangerous questions where a fraction of respondents are directed to eg flip a coin & lie to you in their response so you have false beliefs about each subject but can form a truthful aggregate.

* the pen testers might tell you their real names in the debrief, but don't have to and might not bother since it doesn't matter and you have bigger fish to fry; the survey-takers obviously never will. In neither case do you necessarily ever find out the truth, nor do you need to in order to benefit from the lies.

Replies from: eva_
comment by eva_ · 2022-11-10T04:05:37.199Z · LW(p) · GW(p)

I don't consider the randomized response technique lying, it's mutually understood that their answer means "either I support X or both coins came up heads" or "either I support Y or both coins came up tails". There's no deception because you're not forming a false belief and you both know the precise meaning of what is communicated.

I don't consider penetration testing lying, you know that penetration testers exist and have hired them. It's a permitted part of the cooperative system, in a way that actual scam artists aren't.

What's a word that means "antisocially decieve someone in a way that harms them and benefits you" such that everyone agrees it's a bad thing for people to be incentivised to do? I want to be using that word but don't know what it is.

Replies from: Gunnar_Zarncke
comment by Gunnar_Zarncke · 2022-11-12T12:26:10.529Z · LW(p) · GW(p)

"Deceive" sounds fine. I think the anti-social is implied - in fact, I have trouble coming up with an example of pro-social deceiving. Well, maybe variants of the old hiding Jews from Nazis" example.

comment by eva_ · 2022-11-10T03:01:38.707Z · LW(p) · GW(p)

Not sure what's unclear here? I mean that you'd generally prefer not to have incentive structures where you need true information from other people and they can benefit at your loss by giving you false information. Paying someone to lie to you means creating an incentive for them to actually decieve you, not merely giving them money to speak falsehoods.

Replies from: jacopo
comment by jacopo · 2022-11-12T11:51:09.083Z · LW(p) · GW(p)

They commented without reading the post I guess...