Why the tails come apart
score: 4 (4 votes) ·
A technical difficulty with saying that overfitting happens when there are "too many parameters" is that the parameters may do arbitrarily complicated things. For example they may encode C functions, in which case a model with a single (infinite-precision) real parameter can fit anything very well! Functions that are linear in their parameters and inputs do not suffer from this problem; the number of parameters summarizes their overfitting capacity well. The same is not true of some nonlinear functions.
To avoid confusion it may be helpful to define overfitting more precisely. The gist of any reasonable definition of overfitting is: If I randomly perturb the desired outputs of my function, how well can I find new parameters to fit the new outputs? I can't do a good job of giving more detail than that in a short comment, but if you feel confused about overfitting, here's a good (and famous) article about frequentist learning theory by Vladimir Vapnik that may be useful: