comment by alex_hhr ·
2017-12-02T20:39:18.339Z · LW(p) · GW(p)
"Statistical models with fewer assumptions" is a tricky one, because the conditions under which your inferences work is not identical to the conditions you assume when deriving your inferences.
I mostly have in mind a historical controversy in the mathematical study of evolution. Joseph Felsenstein introduced maximum likelihood methods for inferring phylogenetic trees. He assumed a probabilistic model for how DNA sequences change over time, and from that he derived maximum likelihood estimates of phylogenetic trees of species based on their DNA sequences.
Felsenstein's maximum likelihood method was an alternative to another method, the "maximum parsimony" method. The maximum parsimony tree is the tree that requires you to assume the fewest possible sequence changes when explaining the data.
Some people criticized Felsenstein's maximum likelihood method, since it assumed a statistical model, whereas the maximum parsimony method did not. Felsenstein's response was to exhibit a phylogenetic tree and model of sequence change where maximum parsimony failed. Specifically, it was a tree connecting four species. And when you randomly generate DNA sequences using this tree and the specified probability model for sequence change, maximum parsimony gives the wrong result. When you generate short sequences, it may give the right result by chance, but as you generate longer seqences, maximum parsimony will, with probability 1, converge on the wrong tree. In statistical terms, maximum parsimony is inconsistent: it fails in the infinite-data limit, at least when that is the data-generating process.
What does this mean for the criticism that maximum likelihood makes assumptions? Well, it's true that maximum likelihood works when the data-generating process matches our assumptions, and may not work otherwise. But maximum parsimony also works for a limited set of data-generating processes. Can users of maximum parsimony, then, be accused of making the assumption that the data-generating process is one on which maximum parsimony is consistent?
The field of phylogenetic inference has since become very simulation-heavy. They assume a data-generating process, and test the output of maximum likelihood, maximum parsimony, and other methods. The conceern is, therefore, not so much on how many assumptions the statistical method makes, but on what range of data-generating processes it gives correct results.
This is an important distinction because, while we can assume that the maximum likelihood method works when its assumptions are true, it may also work when its assumptions are false. We have to explore with theory and simulations what is the set of data-generating processes on which it is effective, just like we do with "assumption-free" methods like maximum parsimony.
For more info, some of this story can be found in Felsenstein's book "Inferring Phylogenies", which also contains references to many of the original papers.