“Prediction” and “explanation” are not causation

post by jasoncrawford · 2020-10-24T18:55:55.767Z · LW · GW · 5 comments

This is a link post for https://twitter.com/jasoncrawford/status/1320068209916035072

Everyone knows that correlation is not causation. Many people don't know that in scientific jargon, “predict” and “explain” are also not causation. They are forms of correlation.

(Technically, “association” might be a better term than “correlation”, which can have a narrower technical meaning in statistics. But since I'm writing this for non-experts, I'm going to use the term “correlation” in the colloquial, wider sense.)

These terms can cause extreme miscommunication:

In lay usage, “X predicts Y” implies that X comes before Y. Predictions are about the future. In statistics, there is no time implication at all. It is just a type of correlation. If I said that I could use 2020 data to “predict” things that happened in 2019 (or 1920), most people would laugh at me. But this is a perfectly legitimate usage of statistical “prediction”.

Similarly, the general sense of “explanation” means a conceptual understanding of a phenomenon. In statistics, “explanation” implies no such understanding, only a certain type of correlation.

Because of the time-implication of “prediction” and the conceptual-understanding implication of “explanation”, most people are likely to interpret these as evidence for or even proof of causation. But in many cases, they merely mean correlation.

If you are reading scientific papers, be careful with how you interpret these terms. If you are reading reports about science, know that the reporter might not be clear on this point. If you are reporting science yourself, be responsible with what you write.

(By the way, without @NeuroStats I wouldn't know this stuff either. Any value in this post is thanks to her; errors are mine alone.)


Comments sorted by top scores.

comment by Ericf · 2020-10-25T02:16:44.073Z · LW(p) · GW(p)

I was not vulnerable to this potential confusion, since my internal definition of "causes" is that the result is subject to intervention at the cause point. If I decide to not mow my lawn: A) ice cream sales don't change, even though lawnmower usage predicts Ice Cream sales (correlation) B) my grass doesn't get taller, even though grass height is partially explained by lawnmower usage C) but I won't have a bunch of grass clippings, because mowing causes grass clippings.

comment by NicholasKross · 2020-10-24T20:58:25.630Z · LW(p) · GW(p)

the general sense of “explanation” means a conceptual understanding of a phenomenon. In statistics, “explanation” implies no such understanding

I don't understand what you're saying here. Does statistics use "explanation" as a technical jargon term for something that's not gearsy?

comment by jasoncrawford · 2020-10-24T22:00:34.790Z · LW(p) · GW(p)

Yes. You will hear phrases like “X explains Y% of Z”, and that refers to a statistical association. Examples:

“Micro data show that an aging firm distribution fully explains i) the concentration of employment in large firms, ii) and trends in average firm size and exit rates, key determinants of the firm entry rate. An aging firm distribution also explains the decline in labor’s share of GDP.” https://www.nber.org/papers/w25382

“We found that twelve conditions most responsible for changing life expectancy explained 2.9 years of net improvement (85 percent of the total).” https://www.healthaffairs.org/doi/10.1377/hlthaff.2020.00284

Now, maybe in one or both of these cases, there actually is an explanation. But you can't assume that just because the term “explain” is used.

comment by Stuckwork · 2020-10-25T12:02:40.850Z · LW(p) · GW(p)

This fallacy is known as Post hoc ergo propter hoc and is indeed a mistake that is often made.  However, there are some situations in which we can infer causation from correlation, and where the arrow of time is very useful. These methods are mostly known as Granger causality methods of which the basic premise is: X has a Granger causal influence on Y if the prediction of Y from its own past, and the past of all other variables, is improved by additionally accounting for X.  In practice, Granger causality relies on some heavy assumptions, such as that there are no unobserved confounders. 

comment by jasoncrawford · 2020-10-25T16:06:06.146Z · LW(p) · GW(p)

Yes, @NeuroStats likes to call it “Granger prediction” for this reason