The One Mistake Rule
post by Zvi
Epistemic Status: The Bed of Procrustes
If a model gives a definitely wrong answer anywhere, it is useless everywhere.
This principle is doubtless ancient, and has doubtless gone by many names with many different formulations.
All models are wrong. That does not make them useless. What makes them useless is when they are giving answers that you know are definitely wrong. You need to fix that, if only by having the model more often spit out “I don’t know.”
As an example of saying “I don’t know” that I’m taking from the comments, if you want to use Newtonian Physics, you need to be aware that it will give wrong answers in relativistic situations, and therefore slightly wrong answers in other places, and introduce the relevant error bars.
Of course, a wrong prediction of what is probably going to happen is not definitely wrong, in this sense. An obviously wrong probability is definitely wrong no matter the outcome.
The origin of this particular version of this principle was when me and a partner were, as part of an ongoing campaign of wagering, attempting to model the outcomes of sporting events.
He is the expert on sports. I am the expert on creating models and banging on databases and spreadsheets. My specialty was assuming the most liquid sports betting market odds were mostly accurate, and extrapolating what that implied elsewhere.
First we would talk and he would explain how things worked. Then I would look at the data lots of different ways and create a spreadsheet that modeled things. Then, he would vary the inputs to that spreadsheet until he got it to give him a wrong answer, or at least one that seemed wrong to him.
Then he’d point out the wrong answer and explain why it was definitely wrong. I could either argue that the answer was right and change his mind, or I could accept that it was wrong and go back and fix the model. Then the cycle repeated until he couldn’t find a wrong answer.
Until this cycle stopped, we did not use the new model for anything at all, anywhere, no matter what. If a new wrong answer was found, we stopped using the model in question until we resolved the problem.
Two big reasons:
If we did use the model, even if it was only wrong in this one place, then the one place it was wrong would be the one place we would disagree with the market. Fools and their money would be soon parted.
Also, if the model was obviously wrong here, there’s no reason to trust anything else the model says, either. Fix your model.
This included tail risk style events that were extremely unlikely. If you can’t predict the probability of such events in a reasonable way, even if those outliers won’t somehow bankrupt you directly, you’re going to get the overall distributions wrong.
This also includes the change in predictions between different states of the world. If your model predictably doesn’t agree with itself over time, or changes its answer based on things that can’t plausibly matter much, then it’s wrong. Period. Fix it.
You should be deeply embarrassed if your model outputs an obviously wrong or obviously time-inconsistent answer even in a hypothetical situation. You should be even more embarrassed if it gives such an answer to the actual situation.
The cycle isn’t bad. It’s good. It’s an excellent way to improve your model: Build one, show it to someone, they point out a mistake, you figure out how it happened and fix it, repeat. And in the meantime, you can still use the model’s answers to help supplement your intuitions, as a sanity check or very rough approximation, or as a jumping off point. But until the cycle is over, don’t pretend you have anything more than that.
Comments sorted by top scores.
comment by Donald Hobson (donald-hobson) ·
2020-04-10T16:10:51.739Z · LW(p) · GW(p)
You should be deeply embarrassed if your model outputs an obviously wrong or obviously time-inconsistent answer even in a hypothetical situation.
Suppose you have a particle accelerator that goes up to half the speed of light. You notice an effect whereby faster particles become harder to accelerate.
You curve fit this effect and get that . and both fit the data, well the first one fits the data slightly better. However, when you test your formula on the case of a particle travelling at twice the speed of light, you get back nonsensical imaginary numbers. Clearly the real formula must be the second one. (The real formula is actually the first one)
A good model will often give a nonsensical answer when asked a nonsensical question, and nonsensical questions don't always look nonsensical.Replies from: Dagon
↑ comment by Dagon ·
2020-04-10T17:29:03.568Z · LW(p) · GW(p)
This is worth being explicit about. I took the advice as applying to hypothetical situations consistent with the intended usage of the model. A sports prediction model probably doesn't need to work for low-gravity situations, most physics models don't need to include FTL particles.
It would be nice to say more formally that you can fix it by improving the model OR by specifying which subset of the imagination space the model is intended for.
edit based on replies (thank you for making me think further): A LOT hinges on the meta-model and the theories behind your belief that the model is useful in the first place. It makes sense to be VERY skeptical of correlations found in data with no explanation of what it means. For these, I agree that any failure is grounds to reject. For models that start with a hypothesis, it makes a lot of sense to use the theory to identify reasonable exclusions, where you don't throw out the model for weird results, because you don't expect it to apply.Replies from: donald-hobson, orthonormal
↑ comment by Donald Hobson (donald-hobson) ·
2020-04-10T19:38:03.917Z · LW(p) · GW(p)
I was imagining doing this before the speed of light limit was known. In which case you can find yourself saying that the subset where the model produces sensible results is the subset of imagination space the model is intended for.Replies from: FeepingCreature
↑ comment by FeepingCreature ·
2020-04-11T10:57:55.370Z · LW(p) · GW(p)
But to be fair, if you then fixed the model to output errors once you exceeded the speed of light, as the post recommends, you would have come up with a model that actually communicated a deep truth. There's no reason a model has to be continuous, after all.
comment by orthonormal ·
2020-04-10T19:05:10.811Z · LW(p) · GW(p)
I think the need for this is relatively higher in an anti-inductive environment like trying to beat the market than in an inductive environment like materials science or disease modeling. In the former, you don't dare deploy until you're confident it's missing nothing; in the latter, what you have mid-iteration is still very useful, especially if you're making big time-sensitive decisions.Replies from: Unnamed
↑ comment by Unnamed ·
2020-04-10T21:20:12.055Z · LW(p) · GW(p)
This seems important.
Another feature of competitive markets is that "not betting" is always available as a safe default option. Maybe that means waiting to bet until some unknown future date when your models are good enough, maybe it means never betting in that market. In many other contexts (like responding to covid-19) there is no safe default option. Replies from: Dagon
↑ comment by Dagon ·
2020-04-10T22:26:55.069Z · LW(p) · GW(p)
"Not betting" is an illusion. In all cases, choosing not to take any given bet is itself a bet: you're betting that you'll find something better to do with that chunk of resources. Replies from: orthonormal
↑ comment by orthonormal ·
2020-04-10T23:55:51.792Z · LW(p) · GW(p)
Not betting, in the sense of keeping the money in index funds or somewhere else on the risk/reward Pareto frontier of easy strategies, at least limits your expected downside compared to entering shark-infested waters in an imperfect cage.
comment by Unnamed ·
2020-04-14T09:08:38.033Z · LW(p) · GW(p)
One obviously mistaken model that I got a lot of use out of during a stretch of Feb-Mar is the one where the cumulative number of coronavirus infections in a region doubles every n days (for some globally fixed, unknown value of n).
This model has ridiculous implications if you extend it forward for a few months, as well as various other flaws. I was aware of those ridiculous implications and some of those other flaws, and used it anyways for several days before trying to find less flawed models.
I'm glad that I did, since it helped me have a better grasp of the situation and be more prepared for what was coming. And I don't think it would've made much difference at the time if I'd learned more about SEIR models and so on.
It's unclear how examples like this are supposed to fit with the One Mistake Rule or the exceptions in the last paragraph.
comment by Chris_Leong ·
2020-04-10T21:34:11.637Z · LW(p) · GW(p)
I think you're making this argument a bit strongly. Now, I've written a number of posts arguing that most people are too dismissive of flaws in models that only occur in hypothetical or unrealistic situations, but I don't think perfection is realistic. It seems that a model with no flaws would have to approach infinite complexity in most cases. The only reason why this rule might work is eventually your model will become complex enough that you can't find the mistake. Additionally, you will be limited by the data you have. It's no good knowing that prediction X is wrong because you ignore factor F if you don't have data related to factor F.
comment by Decius ·
2020-04-12T11:51:33.319Z · LW(p) · GW(p)
What are good methods of determining when the model is giving a definitely wrong answer, and when your not-a-model is grossly overconfident and wrong?
comment by jmh ·
2020-04-11T14:45:46.807Z · LW(p) · GW(p)
Thinking about Donald's and Dagon's comments I do think there is something that can be more fully acknowledged (though was also reasonably assumed I think).
I walked away from the OP, but those two comments further clarified for me my own thinking, the one strike rule is a good way to refine our maps when we know the territory we are mapping out.
However, that is not always the case so when making call on a strike we should make sure that we're actually in the territory or home plate and have not wondered somewhere else. I think that was supposed to be address as part of the discussion about was it a mistake or not.
I think this makes for a useful heuristic when modeling but clearly other heuristics should also be applied to ensure you do stay in the territory intended. So is there a good one for helping to keep one on the right sports field? Is that needed, if no why and if so when?