[LDSL#4] Root cause analysis versus effect size estimation

post by tailcalled · 2024-08-11T16:12:14.604Z · LW · GW · 0 comments

Contents

  Is root cause analysis a special case of effect size estimation?
  Heuristics for direct root cause analysis
None
No comments

Followup to: Information-orientation is in tension with magnitude-orientation [LW · GW]. This post is also available on my Substack.

In the conventional theory of causal inference, such as Rubin’s potential outcomes model or Pearl’s DAG approach, causality is modelled as a relationship of functional determination, X := f(Y). The question of interest becomes to study the properties of f, especially the difference in f across different values of Y. I would call this “effect size estimation”, because the goal is to give quantify the magnitude of an effect of one variable on another.

But as I mentioned in my post on conundrums, people seem to have some intuitions about causality that don’t fit well into effect size estimation, most notably in wanting “the” cause of some outcome when really there’s often thought to be complex polycausality.

Linear diffusion of sparse lognormals provides an answer: an outcome is typically a mixture of many different variables, X := Σi Yi, and one may desire an account which describes how the outcome breaks down into these variables to better understand what is going on. This is “root cause analysis”, and it yields one or a small number of factors because most of the variables tend to be negligible in magnitude. (If the root cause analysis yields a large number of factors, that is evidence that the RCA was framed poorly.)

Is root cause analysis a special case of effect size estimation?

If you know X, Y, and f, then it seems you can do root cause analysis automatically by setting each of the Y’s to zero, seeing how it influences X, and then reporting the Y’s in descending order of influence. Thus, root cause analysis ought to be a special-case of effect size estimation, right?

There are two big flaws with this view:

You can try to use statistical effect size estimation for root cause analysis. However, doing so creates an exponentially strong bias in favor of common things over important things, so it’s unlikely to work unless you can somehow absorb all the information in the system.

Heuristics for direct root cause analysis

I don’t think I have a complete theory of root cause analysis yet, but I know of some general heuristics for root cause analysis which don’t require comprehensive effect size estimation.

These both require a special sort of data, which I like to think of as “accounting [LW · GW] data”. It differs from statistical data in that it needs to be especially comprehensive and quantitative. It would often be hard to perform this type of inference using a small random sample of the system, at least unless the root cause affects the system extraordinarily broadly.

0 comments

Comments sorted by top scores.