Choosing prediction over explanation in psychology: Lessons from machine learning
post by Kaj_Sotala · 2017-01-17T21:23:55.068Z · LW · GW · Legacy · 6 commentsThis is a link post for https://figshare.com/articles/Choosing_prediction_over_explanation_in_psychology_Lessons_from_machine_learning/2441878
Contents
6 comments
6 comments
Comments sorted by top scores.
comment by Anders_H · 2017-01-18T01:23:22.918Z · LW(p) · GW(p)
I skimmed this paper and plan to read it in more detail tomorrow. My first thought is that it is fundamentally confused. I believe the confusion comes from the fact that the word "prediction" is used with two separate meanings: Are you interested in predicting Y given an observed value of X (Pr[Y | X=x]), or are you interested in predicting Y given an intervention on X (i.e. Pr[Y|do(X=x)]).
The first of these may be useful for certain purposes. but If you intend to use the research for decision making and optimization (i.e. you want to intervene to set the value of X , in order to optimize Y), then you really need the second type of predictive ability, in which case you need to extract causal information from the data. This is only possible if you have a randomized trial, or if you have a correct causal model.
You can use the word "prediction" to refer to the second type of research objective, but this is not the kind of prediction that machine learning algorithms are designed to do.
In the conclusions, the authors write:
"By contrast, a minority of statisticians (and most machine learning researchers) belong to the “algorithmic modeling culture,” in which the data are assumed to be the result of some unknown and possibly unknowable process, and the primary goal is to find an algorithm that results in the same outputs as this process given the same inputs. "
The definition of "algorithmic modelling culture" is somewhat circular, as it just moves the ambiguity surrounding "prediction" to the word "input". If by "input" they mean that the algorithm observes the value of an independent variable and makes a prediction for the dependent variable, then you are talking about a true prediction model, which may be useful for certain purposes (diagnosis, prognosis, etc) but which is unusable if you are interested in optimizing the outcome.
If you instead claim that the "input" can also include observations about interventions on a variable, then your predictions will certainly fail unless the algorithm was trained in a dataset where someone actually intervened on X (i.e. someone did a randomized controlled trial), or unless you have a correct causal model.
Machine learning algorithms are not magic, they do not solve the problem of confounding unless they have a correct causal model. The fact that these algorithms are good at predicting stuff in observational datasets does not tell you anything useful for the purposes of deciding what the optimal value of the independent variable is.
In general, this paper is a very good example to illustrate why I keep insisting that machine learning people need to urgently read up on Pearl, Robins or Van der Laan. The field is in danger of falling into the same failure mode as epidemiology, i.e. essentially ignoring the problem of confounding. In the case of machine learning, this may be more insidious because the research is dressed up in fancy math and therefore looks superficially more impressive.
Replies from: Kaj_Sotala, Vaniver, jacob_cannell↑ comment by Kaj_Sotala · 2017-02-13T12:26:44.442Z · LW(p) · GW(p)
Not entirely sure I understand you; I read the paper mostly as pointing out that current psych methodology tends to overfit, and that psychologists don't even know what overfitting means. This is true regardless of which type of prediction we're talking about.
↑ comment by Vaniver · 2017-01-18T06:20:17.947Z · LW(p) · GW(p)
This is only possible if you have a randomized trial, or if you have a correct causal model.
You can use the word "prediction" to refer to the second type of research objective, but this is not the kind of prediction that machine learning algorithms are designed to do.
I think there are ML algorithms that do figure out the second type. (I don't think this is simple conditioning, as jacob_cannell seems to be suggesting, but more like this.)
Replies from: Anders_H↑ comment by Anders_H · 2017-01-18T13:58:16.346Z · LW(p) · GW(p)
Thank you for the link, that is a very good presentation and it is good to see that ML people are thinking about these things.
There certainly are ML algorithms that are designed to make the second kind of predictions, but generally they only work if you have a correct causal model
It is possible that there are some ML algorithms that try to discover the causal model from the data. For example, /u/IlyaShpitser works on these kinds of methods. However, these methods only work to the extent that they are able to discover the correct causal model, so it seems disingenious to claim that we can ignore causality and focus on "prediction".
↑ comment by jacob_cannell · 2017-01-18T02:43:47.322Z · LW(p) · GW(p)
If you instead claim that the "input" can also include observations about interventions on a variable, t
Yes - general prediction - ie a full generative model - already can encompass causal modelling, avoiding any distinctions between dependent/independent variables: one can learn to predict any variable conditioned on all previous variables.
For example, consider a full generative model of an ATARI game, which includes both the video and control input (from human play say). Learning to predict all future variables from all previous automatically entails learning the conditional effects of actions.
For medicine, the full machine learning approach would entail using all available data (test measurements, diet info, drugs, interventions, whatever, etc) to learn a full generative model, which then can be conditionally sampled on any 'action variables' and integrated to generate recommended high utility interventions.
then your predictions will certainly fail unless the algorithm was trained in a dataset where someone actually intervened on X (i.e. someone did a randomized controlled trial)
In any practical near term system, sure. In theory though, a powerful enough predictor could learn enough of the world physics to invent de novo interventions wholecloth. ex: AlphaGo inventing new moves that weren't in its training set that it essentially invented/learned from internal simulations.
comment by morganism · 2017-01-19T07:15:45.582Z · LW(p) · GW(p)
"Misguided math: Faulty Bayesian reasoning may explain some mental disorders."
"Applying math to mental disorders “is a very young field,” he adds, pointing to Computational Psychiatry, which plans to publish its first issue this summer. “You know a field is young when it gets its first journal.”
"This reckoning requires the brain to give the right amount of weight to prior expectations and current information. Depending on the circumstances, those weights change. When the senses falter, for instance, the brain should lean more heavily on prior expectations."
https://www.sciencenews.org/article/bayesian-reasoning-implicated-some-mental-disorders