How to Throw Away Information in Causal DAGs

post by johnswentworth · 2020-01-08T02:40:05.489Z · LW · GW · 2 comments

Contents

  Modifying Children
None
2 comments

When constructing a high-level abstract causal DAG from a low-level DAG, one operation which comes up quite often is throwing away information from a node. This post is about how to do that.

First, how do we throw away information from random variables in general? Sparknotes:

For more explanation of this, see Probability as Minimal Map [LW · GW].

For our purposes, starting from a low-level causal DAG, we want to:

Here denotes all the node indices outside . (Specifying rather than directly will usually be easier in practice, since is usually a small neighborhood of nodes around .) In English: we want to throw away information from , while retaining all information relevant to nodes outside the set .

Two prototypical examples:

In both examples, we’re throwing out “local” information, while maintaining any information which is relevant “globally”. This will mean that local queries - e.g. the voltage in one wire given the voltage in a neighboring wire at the same time - are not supported; short-range correlations violate the abstraction. However, large-scale queries - e.g. the voltage in a wire now given the voltage in a wire a few seconds ago - are supported.

Modifying Children

We still have one conceptual question to address: when we replace by , how do we modify children nodes of to use instead?

The first and most important answer is: it doesn’t matter, so long as whatever they do is consistent with . For instance, suppose ranges over {-1, 0, 1}, and . When , the children can act as though were -1 or 1 - it doesn’t matter which, so long as they don’t act like . As long as the childrens’ behavior is consistent with the information in , we will be able to support long-range queries.

There is one big catch, however: the children do need to all behave as if had the same value, whatever value they choose. The joint distribution (where = children of and = spouses of ) must be equal to for some value consistent with . The simplest way to achieve this is to pick a particular “representative” value for each possible value of , so that .

Example: in the digital circuit case, we would pick one representative “high” voltage (for instance the supply voltage ) and one representative “low” voltage (for instance the ground voltage ). would then map any high voltages to and any low voltages to .

Once we have our representative value function , we just have the children use in place of .

If we want, we could even simplify one step further: we could just choose to spit out representative values directly. That convention is cleaner for proofs and algorithms, but a bit more confusing for human usage and examples.

2 comments

Comments sorted by top scores.

comment by philip_b (crabman) · 2020-01-08T11:03:49.611Z · LW(p) · GW(p)

Instead of saying " contains all information in relevant to ", it would be better to say that, contains all information in that is relevant to if you don't condition on anything. Because it may be the case that if you condition on some additional random variable , no longer contains all relevant information.

Example:

Let be i.i.d. binary uniform random variables, i.e. each of the variables takes the value 0 with probability 0.5 and the value 1 with probability 0.5. Let be a random variable. Let be another random variable, where is the xor operation. Let be the function .

Then contains all information in that is relevant to . But if we know the value of , then no longer contains all information in that is relevant to .

Replies from: johnswentworth
comment by johnswentworth · 2020-01-08T17:58:54.932Z · LW(p) · GW(p)

Good point, thanks.