Formalizing Deception

post by JamesH (AtlasOfCharts) · 2022-06-26T17:39:01.390Z · LW · GW · 2 comments

Contents

  Introduction
  Interrogation Investigation
  Formalization
  Comments on the Formalization
  Updates and Repetition
  Flaws
  Remaining Questions
None
2 comments

An attempt at formalizing deception, seeing what goes right, what goes wrong, and what compromises have to be made.

Introduction

In a police interrogation one of the key challenges is working out whether a suspect is being deceptive. In a game of poker, one of the main goals is figuring out whether your opponents are being deceptive. But what are we talking about when we talk about deception, like, mathematically? I will propose a definition and give an example that illustrates how this definition operates.

Interrogation Investigation

Take a police officer, Alice, who is interrogating a suspect, Bob, in order to determine whether or not he is guilty of murder. Alice can either convict Bob or let him go, and Bob is either guilty or innocent. The payoff matrix is as follows:

         Bob:

Alice:

GuiltyInnocent
Convict(1,-1)(-1,-1)
Let go(-1,1)(1,1)

Whether or not Bob is guilty, he would prefer to be let go; on the other hand, Alice only wants to convict if Bob is guilty, and she wants to let him go otherwise.

When Bob enters the room, it is predetermined that he is either guilty or innocent, Bob does not get to make a decision about this in advance. If Bob is guilty he will behave slightly differently under interrogation than if he isn't: he may have a weaker alibi, there may be inconsistencies in his story, he may exhibit physical symptoms associated with lying, but all of these things can also happen if Bob is innocent; sometimes innocent people have bad alibis, are inconsistent, and look nervous.

Now in order to define deception, we introduce Judy, an omniscient observer of the interrogation. Judy has the same goal as Alice, but unlike Alice, Judy also knows whether Bob is guilty or not. Further, Judy knows exactly how Bob's behaviors impact the likelihood of Alice convicting Bob (i.e. Judy can read both Alice's and Bob's minds). We call any subset of observations made by Alice deceptive if Judy would rather Alice make her decision to convict with this set of observations removed from her consideration (as though any memory of seeing these observations had been completely wiped from Alice's mind).

Formalization

Now let's try and formalize this story of deception. We'll describe Alice as a function from a series of observations to a probability space over actions:

with the series of observations  given by:

 defines a series of functions where each function  represents the decision algorithm Alice uses. This function takes the series of observations Alice makes, , as inputs and returns a probability distribution representing the likelihood Alice takes any action. In the interrogation context, this would return the probability that Alice convicts after receiving  observations. For example:  represents how likely Alice is to convict based on just one observation of Bob. 

For some of our later considerations we will want to repeat the game and have Alice update her decision algorithm between repetitions. For this reason we track which round of interrogation we are in by the superscript on the function, so that  represents how Alice makes her decision in the first interrogation,  represents how Alice makes her decision in the second interrogation, and so on.

Now introduce Judy into the formalization. If Judy would rather Alice had not made some number  of observations, then we define these observations to be deceptive. Alice does not actually get to decide whether to convict or not based on the observations that have been pruned by Judy, instead this pruning process is only present to define what we mean when we say observations are deceptive (this should remove the possibility of Alice metagaming, where e.g. Judy might always remove all observations whenever Bob is guilty, and remove none whenever Bob is innocent).

Comments on the Formalization

This definition has a few curious properties:

Despite these weird properties, the value of this definition is that observations will be classified as deceptive if and only if they make it more likely for Alice to make the wrong decision. Another way of saying this is: when no observations or set of observations are classified as deceptive then Alice is very likely to make the right decision, and as the number of deceptive observations increases, so does the likelihood that Alice makes the wrong decision.

The strange properties of the definition above seem to me to be more likely to manifest themselves in a simplified toy example, such as the one we're using. If we instead took the example of a game of poker, where both players are relatively adept (e.g. Alice is not a hash function, or some such other 'insane' policy) then the observations Judy classifies as deceptive will more closely match our intuitions, because Alice making good decisions in poker depends more explicitly on Alice having an accurate model of when (and why) Bob is deceiving her.

Updates and Repetition

Now we repeat this interrogation process  times, so that Alice interrogates Bob, Bob, Bob, who are each randomly either guilty or innocent with equal probability. At the end of each interrogation, Judy will tell Alice whether she made the right decision (but not what observations Judy classified as deceptive), and allow Alice to update her decision algorithm based on this information. Then we would expect, on average, fewer sets of observations to be classified as deceptive when Alice is interrogating Bob than when Alice interrogated Bob (i.e. Alice's ability to determine Bob's guilt will improve over time). Of course in a real interrogation setting, Alice would also update her interrogation procedure itself, but for simplicity we have removed this possibility.

Each Bob is drawn randomly from one of two distributions over 'behavior-space.' We have a guilty-Bob distribution of behaviors and an innocent-Bob distribution of behaviors. For Alice's updating to work at all, we would need these distributions to be distinct in behavior-space, otherwise there would be no possible way for Alice to reliably determine Bob's guilt even in principle. In fact, if there is any overlap between these two distributions, then even the theoretically optimal Alice will not be able to perfectly distinguish guilty-Bobs from innocent-Bobs.

More generally, whenever two agents are playing an incomplete information game, and one agent might try to deceive the other, it will be harder for an agent to repeatedly be deceived when they can update on the ground-truth of how their opponents have behaved in the past. (This is, of course, assuming that the behavior of their opponents is in some way correlated with the ground-truth.)

Flaws

A key point is the method by which Alice updates her decision algorithm. If Bob's actions are always entirely uncorrelated with his guilt then we would hope that Alice would converge to understanding this and, in the limit, being maximally unconfident about whether she should convict. Despite this, if we end the process of updating at any finite step, there is always some chance that Alice sees some correlation which gives her a 'superstitious belief' (e.g. up to Bob, whenever Bob was guilty he has always yawned and scratched his ear, by random chance). If Alice starts using these 'superstitious beliefs' to make decisions, then as we repeat interrogations, we could see Judy categorize more observations as deceptive, not less. This is a problem of making sure that Alice's updating procedure doesn't overfit hypotheses to past observations, or equivalently that Alice has good priors for what could possibly be associated with lying.

As noted above, Judy only categorizes observations as deceptive by the effect they have on Alice. It would be nice to have a definition of deceptive behavior that tracks whether Bob is representing his best understanding of the ground-truth (i.e. whether Bob is knowingly lying). It, however, seems to me as though these two notions are in tension with each other, and there simply might not exist one definition that captures both. If this is indeed the case, consider that Alice cares more about making the wrong decision than having an accurate picture of the world, so I would argue this definition of deceptive observation is more robust. (Also, when Alice needs a more accurate picture of the world to make better decisions, this definition will be responsive to that).

Remaining Questions

2 comments

Comments sorted by top scores.

comment by RHollerith (rhollerith_dot_com) · 2022-06-27T11:33:40.528Z · LW(p) · GW(p)

A false statement can cause a reasoner's beliefs to become more accurate.

Suppose for example that Alice believes falsely that there is an invisible dragon in her garage, but then Bob tells her falsely that all dragons, invisible or not, cannot tolerate the smell of motor oil. Alice decides to believe that, notes that there is a big puddle of motor oil in the center of her garage (because her car leaks oil) and stops believing there is an invisible dragon in her garage.

But by your definition of deception, what Bob told Alice just now is not deceptive because it made Alice's beliefs more accurate, which is all that matters by your definition.

It would be reasonable for Alice to want Bob never to lie to her even when the lie would make her beliefs more accurate, but there is no way for Alice to specify that desire with your formalism. And no way to for Alice to specify the opposite desire, namely, the fact that a lie would be okay with her as long as it makes her beliefs more accurate. And I cannot see a way to improve your definition to allow her to specify that desire.

In summary, although there might be some application, some special circumstance that you did not describe and that I have been unable to imagine, in which it suffices, your definition does not capture all the nuances of deception in human affairs, and I cannot see a way to make it do so without starting over.

But that is not surprising because formalizing things that matter to humans is really hard. Mathematics progresses mainly by focusing on things that are easy to formalize and resigning itself to having only the most tenuous connection to most of the things humans care about.

comment by niederman · 2022-06-27T05:43:46.192Z · LW(p) · GW(p)

It seems to me that there are two distinct things which the English word 'deceptive' describes:

  1. Information which leads someone to believe something false.
  2. An action performed with the intent to present someone with deceptive information (in the first sense).

Your formalism is of the first sense, which is why it's unintuitive that it does not take Bob's beliefs into account.

Following is the outline of a simple formalism for the second sense:

  • Suppose we have two agents, Alice and Bob. Alice's payoff is determined by the information she knows. For example, maybe she's playing a game of poker.
  • Bob transmits some set of information  to Alice. Alice recieves all but some subset  of . Bob knows the value of , but cannot transmit any further information to Alice. 
  • Then, Bob predicts Alice's total utility over the rest of her life; we call the value of this prediction . Finally, he predicts what Alice's total utility would be had all of  been transmitted (i.e. if ); this prediction is called .
  • The information which Bob attempted and failed to transmit to Alice, , is deceptive (2) with respect to Alice if and only if .

I don't think any one formalism can cover both senses. For one thing, it would need to represent the four possible states of "deceptiveness" as a binary "deceptive" property.

I think that it might be best to consider different terms to describe the two different senses of "deceptive."