[LINK] If correlation doesn’t imply causation, then what does?

post by Strilanc · 2013-07-12T05:39:02.814Z · score: 4 (13 votes) · LW · GW · Legacy · 24 comments

A post about how, for some causal models, causal relationships can be inferred without doing experiments that control one of the random variables.

If correlation doesn’t imply causation, then what does?

To help address problems like the two example problems just discussed, Pearl introduced a causal calculus. In the remainder of this post, I will explain the rules of the causal calculus, and use them to analyse the smoking-cancer connection. We’ll see that even without doing a randomized controlled experiment it’s possible (with the aid of some reasonable assumptions) to infer what the outcome of a randomized controlled experiment would have been, using only relatively easily accessible experimental data, data that doesn’t require experimental intervention to force people to smoke or not, but which can be obtained from purely observational studies.

24 comments

Comments sorted by top scores.

comment by Stuart_Armstrong · 2013-07-12T07:39:39.268Z · score: 8 (10 votes) · LW(p) · GW(p)

Correlation may not imply causation - but it is highly correlated with causation! In fact, the most likely theory is that correlation causes causation... ;-)

comment by Luke_A_Somers · 2013-07-12T13:46:14.878Z · score: 2 (2 votes) · LW(p) · GW(p)

That last sentence? Ow.

comment by IlyaShpitser · 2013-07-15T14:35:47.727Z · score: 0 (0 votes) · LW(p) · GW(p)

If A causes B, then artificially inducing A results in (or increases the frequency of, etc.) B. I am not really sure what you are trying to say, nor what "most likely theory" here means (under what setup?)

I realize you are being glib, but this is important!

comment by RichardKennaway · 2013-07-16T10:52:10.727Z · score: 0 (0 votes) · LW(p) · GW(p)

nor what "most likely theory" here means (under what setup?)

Rupert Sheldrake has a theory of how correlation causes causation.

comment by wedrifid · 2013-07-15T15:29:17.747Z · score: 0 (0 votes) · LW(p) · GW(p)

If A causes B, then artificially inducing A results in (or increases the frequency of, etc.) B. I am not really sure what you are trying to say, nor what "most likely theory" here means (under what setup?)

My impression was that it was an absurdity for the purpose of satire.

comment by Stuart_Armstrong · 2013-07-15T14:42:40.784Z · score: 0 (2 votes) · LW(p) · GW(p)

It is what's known as a joke - it has no hidden wisdom or meaning.

comment by IlyaShpitser · 2013-07-15T17:32:21.164Z · score: 0 (0 votes) · LW(p) · GW(p)

I think part of what is amusing here is that this joke is a serious theory for some folks.

comment by RichardKennaway · 2013-07-16T10:54:45.273Z · score: 0 (0 votes) · LW(p) · GW(p)

I think part of what is amusing here is that this joke is a serious theory for some folks.

Like Rupert Sheldrake?

comment by gwern · 2013-07-12T15:27:56.566Z · score: 6 (6 votes) · LW(p) · GW(p)

Previously submitted: http://lesswrong.com/lw/9jw/michael_nielsen_explains_judea_pearls_causality/

comment by Strilanc · 2013-07-12T16:32:33.633Z · score: 1 (1 votes) · LW(p) · GW(p)

Should I delete? I have no qualms with deleting.

(It's unfortunate that submitting doesn't include the link in a structured way, which would allow duplicate detection.)

comment by tim · 2013-07-12T17:16:16.126Z · score: 4 (4 votes) · LW(p) · GW(p)

Given that the original submission is a year and half old, its likely that enough people are unfamiliar with it that its worth keeping up. (afaik, resubmission isn't a big enough problem in discussion to enact a delete-all-duplicates policy)

comment by laofmoonster · 2014-04-05T02:33:58.130Z · score: 2 (2 votes) · LW(p) · GW(p)

Looks promising, but requiring the graph to be acyclic makes it difficult to model processes where feedback is involved. A workaround would be treat each time stamp of a process as a different event. Have A(0)->B(1), where event A at time 0 affects event B at time 1, B(0)->A(1), A(0)->A(1), B(0)->B(1), A(t)->B(t+1), etc. But this gets unwieldy very quickly.

comment by Anders_H · 2014-04-05T03:55:41.812Z · score: 2 (2 votes) · LW(p) · GW(p)

Your workaround is correct, and not as unwieldy as it may appear at first glance. A lot of people have been using causal diagrams with this structure very successfully in situations where the data generating mechanism has loops. As a starting point, see the literature on inverse probability weighting and marginal structural models.

Processes with feedback loops are, in fact, a primary motivation for using causal directed acyclic graphs. If there are no feedback loops, reasoning about causality is relatively simple even without graphs; whereas if there are loops, even very smart people will get it wrong unless they are able to analyze the situation in terms of the graphical concept of 'collider stratification bias'.

comment by yanavancat · 2013-07-13T19:02:40.425Z · score: 1 (1 votes) · LW(p) · GW(p)

The correlation/causation conundrum is a particularly frustrating one in the social sciences due to the complex interaction of variables related to human experience.

I've found looking at time-order and thinking of variables-as-events is a helpful way to simplify experimental design seeking to get at causal mechanisms in my behavioral research.

Take the smoking example:

I would consider measuring changes in strength of correlation at various points in an ongoing experiment.

Once a baseline measurement is obtained from those already smoking subjects/participants, we measure the correlation between avg. number of cigarettes smoked per weak and lung capacity. This way one doesn't have to randomize or control, unethically asking people to smoke if they don't already. We already have a hypothesis based on the prior that volume of cigarettes smoked has a strong positive correlation with lung damage, and so reducing the number of cigarettes smoked would improve lung functioning in smokers.

But here we assume that the lifestyles of the smokers studied are relatively stable across the span of the experiment.

The researcher must take into account mediating factors that could impact lung functioning outside of smoking - i.e Intermittent exercise and lifestyle improvements.

In any case, following the same group of people over time is a lot easier than matching comparison groups by race/age/gender/education, or any of the other million human variables.

comment by IlyaShpitser · 2013-07-14T17:42:18.020Z · score: 1 (1 votes) · LW(p) · GW(p)

Once a baseline measurement is obtained from those already smoking subjects/participants, we measure the correlation between avg. number of cigarettes smoked per weak and lung capacity. This way one doesn't have to randomize or control, unethically asking people to smoke if they don't already. We already have a hypothesis based on the prior that volume of cigarettes smoked has a strong positive correlation with lung damage, and so reducing the number of cigarettes smoked would improve lung functioning in smokers.

It was not clear from this description what exactly your design was. Is it the case that you find some smokers, and then track the relationship between lung capacity and how much they smoke per week (which varies due to [reasons])? Or do you artificially reduce the nicotine intake in smokers (which is an ethical intervention)? Or what?

comment by Xachariah · 2013-07-12T05:54:47.897Z · score: 1 (5 votes) · LW(p) · GW(p)

Seems like a much longer (and harder to read) version of Eliezer's Causal Model post. What can I expect to get out of this one that I wouldn't find in Eliezer's version?

Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'.

-XKCD

comment by Qiaochu_Yuan · 2013-07-12T08:16:44.246Z · score: 10 (10 votes) · LW(p) · GW(p)

Details? Content? Eliezer doesn't even define d-separation, for starters.

comment by [deleted] · 2013-07-12T15:14:07.061Z · score: 0 (0 votes) · LW(p) · GW(p)

Do you know if there's an efficient algorithm for determining when two subsets of a DAG are d-separated given another? The naive algorithm seems to be a bit slow.

comment by IlyaShpitser · 2013-07-12T16:22:18.649Z · score: 4 (4 votes) · LW(p) · GW(p)

http://www.gatsby.ucl.ac.uk/~zoubin/course05/BayesBall.pdf

Amusing name, linear time algorithm. Also amusingly I happen to have direct line of sight on the author while writing this post :).

In some sense, we know a priori that d-separation has to be linear time because it is a slightly fancy graph traversal. If you don't like Bayes Ball, you can use the moralization algorithm due to Lauritzen (described here:

http://www.stats.ox.ac.uk/~steffen/teaching/grad/graphicalmodels.pdf

see slide titled "alternative equivalent separation"), which is slightly harder to follow for an unaided human, but which has a very simple implementation (which reduces to a simple DFS traversal of an undirected graph you construct).

edit: fixed links, hopefully.

comment by [deleted] · 2013-07-13T03:58:54.313Z · score: 1 (1 votes) · LW(p) · GW(p)

Yeah, sadly both links are broken for me.

comment by Qiaochu_Yuan · 2013-07-13T00:11:20.048Z · score: 1 (1 votes) · LW(p) · GW(p)

Link is broken for me.

comment by RichardKennaway · 2013-07-12T08:24:11.970Z · score: 3 (3 votes) · LW(p) · GW(p)

More detail, more mathematics, more exercises, more references. More, that's what you get. Eliezer's post is only an appetiser, and the XKCD a mere amuse-bouche.

comment by Manfred · 2013-07-12T07:34:25.719Z · score: 3 (3 votes) · LW(p) · GW(p)

What can I expect to get out of this one that I wouldn't find in Eliezer's version?

Some of the useful (if you're going to use it or enjoy it, that is) math from chapters 1-3 of Pearl's book.

comment by JQuinton · 2013-07-12T17:26:33.974Z · score: -2 (2 votes) · LW(p) · GW(p)

Correlation doesn't necessitate causation, but it is certainly (weak?) Bayesian evidence.