post by [deleted] · · ? · GW · 0 comments

This is a link post for

0 comments

Comments sorted by top scores.

comment by shminux · 2022-12-23T04:22:54.443Z · LW(p) · GW(p)

I have no idea what you are talking about. Consider adding better examples, references, using more standard terminology.

comment by simon · 2022-12-23T09:31:23.010Z · LW(p) · GW(p)

To summarize my understanding of the post:

You have some data which include some input variables and and outcome. You want to predict the outcome based on the input variables if you get some new data. In order to do this you use a mathematical function (the "model") which takes the input and produces an output. This mathematical function is relatively simple with a few parameters, and assumes a relatively simple relationship between each variable and the output, which you call a "linkage" and is selected typically from a few common relationships like additive, multiplicative or logistic. So, you apply the model to the data and find the variables which make the least bad predictions of the output according to some loss function (least squares perhaps, or whatever). Then you use this model to predict the outcome of new data. 

However, in your opinion, the fact that the model uses a selected one of these few common relationships between each variable and the outcome is a problem, since in reality it will be more complicated, and it incentivizes more complicated models (presumably, with more parameters).

So, you propose to take the output of the first function (first model) you got after optimizing its parameters, and then apply a second function (second model) to the output of the first function. This second model is also optimized to output as close as possible to the actual outcomes presumably using the same loss function. You don't specify exactly how this second function can vary, whether it also has a few parameters or one parameter or many parameters?

Interesting proposal; my comments: 

 Lets call the first function (model) f and the second function (model) g. 

1. Your approach of first optimizing f and then optimizing g, and then taking  g ∘ f as your final model has the obvious alternative of directly optimizing g ∘ f with all parameters of each function optimized together. ("Obvious" only in the sense of after already knowing about your approach.) Do you expect the two-stage process to do enough less overfitting to make up for the loss of accuracy for not optimizing everything together?

2. Overall this seems like complexity laundering to me, your final model is g ∘ f even with the two stage process and is it really simpler than what you would otherwise have used for f?

3. If there are multiple input variables I'm not sure I would conceptualize this as correcting the linkage, since it's correcting the overall output and not specifically the relationship with any one input variable?

Replies from: abstractapplic
comment by abstractapplic · 2022-12-23T12:01:48.451Z · LW(p) · GW(p)

Thanks for putting in the time to make sense of my cryptic and didactic ranting.

You don't specify exactly how this second function can vary, whether it also has a few parameters or one parameter or many parameters?

Segmented linear regression usually does the trick. There's only one input, and I've never seen discontinuities be necessary when applying this method, so only a few segments (<10) are needed.

I didn't specify this because almost any regression algorithm would work and be interpretable, so readers can do whatever is most convenient to them.

Your approach of first optimizing f and then optimizing g, and then taking  g ∘ f as your final model has the obvious alternative of directly optimizing g ∘ f with all parameters of each function optimized together.

What I actually do is optimize f until returns diminish, then optimize f and g together. I suggested "f then g" instead of "f then f&g" because it achieves most of the same benefit and I thought most readers would find it easier to apply.

(I don't optimize f&g together from the outset because doing things that way ends up giving g an unindicatively large impact on predictions.)

is it really simpler than what you would otherwise have used for f?

Sometimes. Sometimes it isn't. It depends how wrong the linkage is.

If there are multiple input variables I'm not sure I would conceptualize this as correcting the linkage, since it's correcting the overall output and not specifically the relationship with any one input variable?

I would. When the linkage is wrong - like when you use an additive model on a multiplicative problem [LW · GW] - models either systematically mis-estimate their extreme predictions or add unnecessary complexity in the form of interactions between features.

I often work in a regression modelling context where model interpretability is at a premium, and where the optimal linkage is almost but not quite multiplicative: that is, if you fit a simple multiplicative model, you'll be mostly right but your higher predictions will be systematically too low.

The conventional way to correct for this is to add lots of complex interactions between features: "when X12 and X34 and X55 all take their Y-maximizing values, increase Y a bit more than you would otherwise have done", repeated for various combinations of Xes. This 'works' but makes the model much less interpretable, and requires more data to do correctly.