Causal inference for the home gardener

post by braces · 2024-11-27T17:55:52.629Z · LW · GW · 1 comments

Contents

1 comment

Note: This is meant to be an accessible introduction to causal inference. Comments appreciated.

Let’s say you buy a basil plant and put it on the counter in your kitchen. Unfortunately, it dies in a week.

So the next week you buy another basil plant and feed it a special powder, Vitality Plus. This second plant lives. Does that mean Vitality Plus worked? 

Not necessarily! Maybe the second week was a lot sunnier, you were better about watering, or you didn’t grab a few leaves for a pasta. In other words, it wasn’t a controlled experiment. If some other variable like sun, water, or pasta is driving the results you’re seeing, your study is confounded, and you’ve fallen prey to a core issue in science.

When someone says “correlation is not causation,” they’re usually talking about confounding. Here are some examples:

So now you know that you shouldn’t compare plants that you bought at different times, because this risks confounding. One way to address confounding is to try to hold all the important variables constant—a controlled experiment. You buy two plants at the same time from the same store. You put them in the same spot and water them equally, and always pluck the same number of leaves from each. The treated plant survives, and the control plant withers.

Does the powder work? A remaining problem is that even holding constant many of the variables (store, date bought, and so on), there’s still some inherent randomness in the life of a basil plant. 

This randomness could be due to genetics or the soil conditions when it was a wee sprout. With enough plants, it would wash out, with either group as likely to be lucky as unlucky on average. With just two plants, however, it’s likely that random factors would cloud or even exceed the benefit from the powder. When the measured benefit in your study is plausibly just random noise, your study is underpowered. In engineering, this could be seen as a signal-to-noise problem. With only two plants, the noise (random variation) might overwhelm the signal (the effect of Vitality Plus). 

Now you know that you shouldn’t compare plants raised in different conditions (because there could be confounding) and you can’t just compare two plants, even with lots of control over their conditions (because of random variation—one plant could get lucky, independent of Vitality Plus).

We need a large sample of plants with random variation in which one gets treated. What are some of the techniques?

Whether you're testing plant powder, educational methods, or medical treatments, the principles remain the same: Watch out for confounding variables. Use large enough samples to overcome random noise. And create or find random variation in treatment take-up for a reliable estimate. These provide some of the best defense against bad ideas that invariably sprout up. 

1 comments

Comments sorted by top scores.

comment by niplav · 2024-11-27T23:26:38.047Z · LW(p) · GW(p)

I think I'd've wanted to know about tigramite when learning about causal inference, it's a library for doing causal inference on time-series data.