Algorithms of Deception!

zack_m_davis

Algorithms of Deception!

post by Zack_M_Davis · 2019-10-19T18:04:17.975Z · LW · GW · 7 comments

  Commentary
None
7 comments

I want you to imagine a world consisting of a sequence of independent and identically distributed random variables , and two computer programs.

The first program is called Reporter. As input, it accepts a bunch of the random variables $X_{i}$ . As output, it returns a list of sets whose elements belong to the domain of the $X_{i}$ .

The second program is called Audience. As input, it accepts the output of Reporter. As output, it returns a probability distribution.

Suppose the $X_{i}$ are drawn from the following distribution:

$P (X = x) = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ \begin{matrix} 1 / 2 & x = 1 1 / 4 & x = 2 3 / 16 & x = 3 1 / 16 & x = 4 \end{matrix}$

We can model drawing a sample from this distribution using this function in the Python programming language:

import random

def x():
    r = random.random()
    if 0 <= r < 1/2:
        return 1
    elif 1/2 <= r < 3/4:
        return 2
    elif 3/4 <= r < 15/16:
        return 3
    else:
        return 4

For compatibility, we can imagine that Reporter and Audience are also written in Python. This is just for demonstration in the blog post that I'm writing—the real Reporter and Audience (out there in the world I'm asking you to imagine) might be much more complicated programs written for some kind of alien computer the likes of which we have not yet dreamt! But I like Python, and for the moment, we can pretend.

So pretend that Audience looks like this (where the dictionary, or hashmap, that gets returned represents a probability distribution, with the keys being random-variable outcomes and the values being probabilities):

from collections import Counter

def audience(report):
    a = Counter()
    for sight in report:
        for possibility in sight:
            a[possibility] += 1/len(sight)            
    d = sum(a_j - len(a) for a_j in a.values())
    return {x: (a_i - 1)/d for x, a_i in a.items()}

Let's consider multiple possibilities for the form that Reporter could take. A particularly simple implementation of Reporter (call it reporter_0) might look like this:

def reporter_0(xs):
    output = []
    for x in xs:
        output.append({x})
    return output

The pairing of audience and reporter_0 has a Very Interesting Property! When we call our Audience on the output of this Reporter, the probability distribution that Audience returns is very similar to the distribution that our random variables are from!^[1]

>>> audience(reporter_0([x() for _ in range(100000)]))
{1: 0.5003300528084493, 2: 0.2502900464074252, 3: 0.1873799807969275, 4: 0.062119939190270444}

# Compare to P(X) expressed as a Python dictionary—
>>> {1: 1/2, 2: 1/4, 3: 3/16, 4: 1/16}
{1: 0.5, 2: 0.25, 3: 0.1875, 4: 0.0625}

Weird, right?!

Of course, there are other possible implementations of Reporter. For example, this choice of Reporter (reporter_1) does not result in the Very Interesting Property—

def reporter_1(xs):
    output = []
    for _ in range(len(xs)):
        output.append({4})
    return output

It instead induces Audience to output a very different (and rather boring) distribution. It doesn't even matter how the $X_{i}$ turned up; the result will always be the same:

>>> audience(reporter_1([x() for _ in range(100000)]))
{4: 1.0}

We could go on imagining other versions of Reporter, like this one (reporter_2)—

def reporter_2(xs):
    output = []
    for x in xs:
        if x == 4 or random.random() < 0.2:
            output.append({x})
        else:
            continue
    return output

While the distribution that reporter_2 makes Audience output isn't as boring as the one we saw for reporter_1, it still doesn't result in the Very Interesting Property of matching the distribution of the $X_{i}$ . It comes closer than reporter_1 did—notice how the ratios of probabilities assigned to the first three outcomes is similar to that of the original distribution—but it's assigning way too much probability-mass to the outcome "4":

>>> audience(reporter_2([x() for _ in range(100000)]))
{1: 0.3971289947471831, 2: 0.20309555314968522, 3: 0.14860259032038173, 4: 0.2516540358474678}

So far, all of the Reporters we've imagined are still only putting one element in the inner sets of the list-of-sets that they return. But we could imagine reporter_3—

def reporter_3(xs):
    output = []
    for x in xs:
        if x == 1 or x == 4:
            output.append({1, 4})
        else:
            output.append({x})
    return output

Unlike reporter_2 (which typically returned a list with fewer elements than it received as input), the list returned by reporter_3 has exactly as many elements as the list it took in. Yet this Reporter still prompts Audience to return a distribution with too many "4"s—and unlike reporter_2, it doesn't even get the ratio of the other outcomes right, yielding disproportionately fewer "1"s compared to "2"s and "3"s than the original distribution—

>>> audience(reporter_3([x() for _ in range(100000)]))
{1: 0.2808949431909106, 2: 0.24795967354776766, 3: 0.19037045927348376, 4: 0.2808949431909106}

Again, I've presented Audience and various possible Reporters as simple Python programs for illustration and simplicity, but the same input-output relationships could be embodied as part of a more complicated system—perhaps an entire conscious mind which could talk.

So now imagine our Audience as a person with her own hopes and fears and ambitions ... ambitions whose ultimate fulfillment will require dedication, bravery—and meticulously careful planning based on an accurate estimate of $P (X)$ , with almost no room for error.

So, too, imagine each of our possible Reporters as a person: loyal, responsible—and, entirely coincidentally, the supplier of a good that Audience's careful plans call for in proportion to the value of $P (X = 4)$ .

When the expected frequency of "4"s fails to appear, Audience's lifework is in ruins. All of her training, all of her carefully calibrated plans, all the interminable hours of hard labor, were for nothing. She confronts Reporter in a furor of rage and grief.

"You lied," she says through tears of betrayal, "I trusted you and you lied to me!"

The Reporter whose behavior corresponds to reporter_2 replies, "How dare you accuse me of lying?! Sure, I'm not a perfect program free from all bias, but everything I said was true—every outcome I reported corresponded to one of the $X_{i}$ . You can't call that misleading! [LW · GW]"

He is perfectly sincere. Nothing in his consciousness reflects intent to deceive Audience, any more than an eight-line Python program could be said to have such "intent." (Does a for loop "intend" anything? Does a conditional "care"? Of course not!)

The Reporter whose behavior corresponds to reporter_3 replies, "Lying?! I told you the truth, the whole truth, and nothing but the truth: everything I saw, I reported. When I said an outcome was a oneorfour, it actually was a oneorfour. Perhaps you have a different category system, such that what I think of as a 'oneorfour', appears to you to be any of several completely different outcomes, which you think my 'oneorfour' concept is conflating. If those outcomes had wildly different probabilities, if one was much more common than fou—I mean, than the other—then you'd have no way of knowing that from my report. But using language in a way you dislike, is not lying. I can define a word any way I want! [LW · GW]"

He, too, is perfectly sincere.

Commentary

Much has been written on this website about reducing mental notions of "truth", "evidence", &c. to the nonmental [LW · GW]. One need not grapple with tendentious mysteries [LW · GW] of "mind" or "consciousness", when so much more can be accomplished by considering systematic cause-and-effect processes that result in [LW · GW] the states of one physical system becoming correlated with the states of another—a "map" that reflects a "territory."

The same methodology that was essential for studying truthseeking, is equally essential for studying the propagation of falsehood. If true "beliefs" are models that make accurate predictions [LW · GW], then deception would presumably be communication that systematically results in less accurate predictions (by a listener applying the same inference algorithms that would result in more accurate predictions when applied to direct observations or "honest" reports).

In a peaceful world where most falsehood was due to random mistakes, there would be little to be gained by studying processes that systematically create erroneous maps. In a world of conflict, where there are forces trying to slash your tires [LW · GW], one would do well do study these—algorithms of deception!

But only "very" similar: the code for audience is not the mathematically correct thing to do in this situation; it's just an approximation that ought to be good enough for the point I'm trying to make in this blog post, for which I'm trying to keep the code simple. (Specifically, the last two lines of audience are based on the mode of the Dirichlet distribution, but, firstly, that part about increasing the hyperparameters fractionally when you're uncertain about what was observed (a[possibility] += 1/len(sight)) is pretty dodgy, and secondly, if you were actually going to try to predict an outcome drawn from a categorical distribution like $P (X)$ using the Dirichlet distribution as a conjugate prior, you'd need to integrate over the Dirichlet hyperparameters; you shouldn't just pretend that the mode/peak represents the true parameters of the categorical distribution—but as I said, we are just pretending.) ↩︎

7 comments

Comments sorted by top scores.

comment by Pongo · 2020-05-06T05:38:20.476Z · LW(p) · GW(p)

If true "beliefs" are models that make accurate predictions [LW · GW], then deception would presumably be communication that systematically results in less accurate predictions (by a listener applying the same inference algorithms that would result in more accurate predictions when applied to direct observations or "honest" reports).

This helped me clarify that these algorithms of deception are not just adversarially attempting to deceive, but in fact adversarially crafted for one's belief-forming mechanisms.

comment by romeostevensit · 2019-10-20T02:11:39.629Z · LW(p) · GW(p)

I might summarize the hansonian/elephant in the brain position thusly: sincerity is selected for.

comment by Isnasene · 2019-10-20T07:42:04.049Z · LW(p) · GW(p)

This is the kind of issue that being really specific [LW · GW] can be used to resolve. If the Reporters were specific about what they were trying to do in their reporting, the Audience would not have used their reports. If this is unfeasible, the Audience could still specifically ask the Reporters to clarify whether they think she will be able to figure out the specific things she is trying to figure out using their reports. If they mislead her here, Audience can credibly claim that either she was lied to or that the reporters are not competent enough to be relied on for information. Pragmatically, both these possibilities imply a course-of-action that involved not using the reporters as reporters.

To illustrate in detail:

So, too, imagine each of our possible Reporters as a person: loyal, responsible—and, entirely coincidentally, the supplier of a good that Audience's careful plans call for in proportion to the value of P(X=4).

In this context, the audience has an explicit goal of estimating how frequently X=4 occurs. Keeping this in mind, the problem the audience is having can be understood in the context of interacting with reports with misaligned goals.

Sure, I'm not a perfect program free from all bias, but everything I said was true—every outcome I reported corresponded to one of the Xi. You can't call that misleading! [LW · GW]"

Reporter 2's goal seems to be "report values that correspond to one of the Xi." This is misaligned with a goal associated with estimating the distribution of X.

I told you the truth, the whole truth, and nothing but the truth: everything I saw, I reported. When I said an outcome was a oneorfour, it actually was a oneorfour. Perhaps you have a different category system, such that what I think of as a 'oneorfour', appears to you to be any of several completely different outcomes, which you think my 'oneorfour' concept is conflating.

Reporter 3's goal seems to be "report values but conflate ones and fours into a 'oneorfour' category" (in the actual programming, this is a misrepresentation of course. Reporting a one or a four as a one and a four isn't the same thing). This is misaligned with the goal of figuring out how frequently fours are specifically.

If Reporter 2 or Reporter 3 had explicitly specified what they were trying to do, the Audience wouldn't have used them as an information source.

If Reporter 2 had told Audience that he believed Alice could determine probabilities just from his reporting, he would be lying, or too incompetent to trust.

If Reporter 3 had told Audience that he believed Alice could distinguish the probability of fours separate from the probabilities of ones, he would be lying, or too incompetent to trust.

comment by Matt Goldenberg (mr-hire) · 2019-10-20T01:02:54.274Z · LW(p) · GW(p)

This is interesting! The straightforward research program here seems to just be to study heuristics and biases, yes? I'm curious if you're going in a different direction.

comment by Pattern · 2019-10-20T04:30:51.517Z · LW(p) · GW(p)

very similar

The same for 4 significant digits.

[a] perfectly sincere.

[b] In a peaceful world where most falsehood was due to random mistakes, there would be little to be gained by studying processes that systematically create erroneous maps.

Systematic error is conflated with conflict (in b), following sections (in the vicinity of a) which claim error is not conscious. Even if I accept b,

[c] . In a world of conflict, where there are forces trying to slash your tires [LW · GW], one would do well do study these—algorithms of deception!

why should c follow? Why not tune out what can't be verified, or isn't worth verifying? (I say this as someone who intends to vote on this post only after running the code.)

Replies from: Isnasene

↑ comment by Isnasene · 2019-10-20T07:51:11.152Z · LW(p) · GW(p)

why should c follow? Why not tune out what can't be verified, or isn't worth verifying? (I say this as someone who intends to vote on this post only after running the code.)

Because of things like selective reporting [LW · GW], the vast majority of information reported to us tends to be misleading, but not without some level of useful information (and sometimes a lot of useful information). Instead of tuning out information or spending exhaustive amounts of time attempting to verify it, a faster solution is often to figure out ways to adjust the (inaccurate) information for deception to get only the useful information and a better understanding of uncertainty within it.

Of course, if there is a 100% un-deceptive source for a given piece of information, there's not much value in trying to use deceptive sources for that same piece of information (unless the former source is much more expensive than the latter).

comment by artifex · 2019-10-19T22:00:21.113Z · LW(p) · GW(p)

Category gerrymandering doesn’t seem like a different algorithm from selective reporting. In both cases, the reporter is providing only part of the evidence.

Algorithms of Deception!

Contents

Commentary

7 comments