Why we should err in both directions

owencb

Why we should err in both directions

post by owencb · 2014-08-21T11:10:59.654Z · LW · GW · Legacy · 6 comments

  Some trade-offs
  The principle
    Refinements
  Predictions and track records
  Failures
  Conclusions and applications to prioritisation
    Questions for readers
  Appendix: a sketch proof of the principle
None
6 comments

Crossposted from the Global Priorities Project

This is an introduction to the principle that when we are making decisions under uncertainty, we should choose so that we may err in either direction. We justify the principle, explore the relation with Umeshisms, and look at applications in priority-setting.

Some trade-offs

How much should you spend on your bike lock? A cheaper lock saves you money at the cost of security.

How long should you spend weighing up which charity to donate to before choosing one? Longer means less time for doing other useful things, but you’re more likely to make a good choice.

How early should you aim to arrive at the station for your train? Earlier means less chance of missing it, but more time hanging around at the station.

Should you be willing to undertake risky projects, or stick only to safe ones? The safer your threshold, the more confident you can be that you won’t waste resources, but some of the best opportunities may have a degree of risk, and you might be able to achieve a lot more with a weaker constraint.

The principle

We face trade-offs and make judgements all the time, and inevitably we sometimes make bad calls. In some cases we should have known better; sometimes we are just unlucky. As well as trying to make fewer mistakes, we should try to minimise the damage from the mistakes that we do make.

Here’s a rule which can be useful in helping you do this:

When making decisions that lie along a spectrum, you should choose so that you think you have some chance of being off from the best choice in each direction.

We could call this principle erring in both directions. It might seem counterintuitive -- isn’t it worse to not even know what direction you’re wrong in? -- but it’s based on some fairly straightforward economics. I give a non-technical sketch of a proof at the end, but the essence is: if you’re not going to be perfect, you want to be close to perfect, and this is best achieved by putting your actual choice near the middle of your error bar.

So the principle suggests that you should aim to arrive at the station with a bit of time wasted, but not so much that you won’t miss the train even if something goes wrong.

Just saying that you should have some chance of erring in either direction isn’t enough to tell you what you should actually choose. It can be a useful warning sign in the cases where you’re going substantially wrong, though, and as these are the most important cases to fix it has some use in this form.

A more careful analysis would tell you that at the best point on the spectrum, a small change in your decision produces about as much expected benefit as expected cost. In ideal circumstances we can use this to work out exactly where on the spectrum we should be (in some cases more than one point may fit this, so you need to compare them directly). In practice it is often hard to estimate the marginal benefits and costs well enough for this to be useful approach. So although it is theoretically optimal, you will only sometimes want to try to apply this version.

Say in our train example that you found missing the train as bad as 100 minutes waiting at the station. Then you want to leave time so that an extra minute of safety margin gives you a 1% reduction in the absolute chance of missing the train.

For instance, say your options in the train case look like this:

Safety margin (min)	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Chance of missing train (%)	50	30	15	8	5	3	2	1.5	1.1	0.8	0.6	0.4	0.3	0.2	0.1

Then the optimal safety margin to leave is somewhere between 6 and 7 minutes: this is where the marginal minute leads to a 1% reduction in the chance of missing the train.

Predictions and track records

So far, we've phrased the idea in terms of the predicted outcomes of actions. Another more well-known perspective on the idea looks at events that have already happened. For example:

“If you've never missed a flight, you're spending too much time in airports.”
“If your code never has bugs, you’re being too careful.”

These formulations, dubbed 'Umeshisms', only work for decisions that you make multiple times, so that you can gather a track record.

An advantage of applying the principle to track records is that it’s more obvious when you’re going wrong. Introspection can be hard.

You can even apply the principle to track records of decisions which don’t look like they are choosing from a spectrum. For example it is given as advice in the game of bridge: if you don’t sometimes double the stakes on hands which eventually go against you, you’re not doubling enough. Although doubling or not is a binary choice, erring in both directions still works because ‘how often to do double’ is a trait that roughly falls on a spectrum.

Failures

There are some circumstances where the principle may not apply.

First, if you think the correct point is at one extreme of the available spectrum. For instance nobody says ‘if you’re not worried about going to jail, you’re not committing enough armed robberies’, because we think the best number of armed robberies to commit is probably zero.

Second, if the available points in the spectrum are discrete and few in number. Take the example of the bike locks. Perhaps there are only three options available: the Cheap-o lock (£5), the Regular lock (£20), and the Super lock (£50). You might reasonably decide on the Regular lock, thinking that maybe the Super lock is better, but that the Cheap-o one certainly isn’t. When you buy the Regular lock, you’re pretty sure you’re not buying a lock that’s too tough. But since only two of the locks are good candidates, there is no decision you could make which tries to err in both directions.

Third, in the case of evaluating track records, it may be that your record isn’t long enough to expect to have seen errors in both directions, even if they should both come up eventually. If you haven’t flown that many times, you could well be spending the right amount of time -- or even too little -- in airports, even if you’ve never missed a flight.

Finally, a warning about a case where the principle is not supposed to apply. It shouldn’t be applied directly to try to equalise the probability of being wrong in either direction, without taking any account of magnitude of loss. So for example if someone says you should err on the side of caution by getting an early train to your job interview, it might look as though that were in conflict with the idea of erring in both directions. But normally what’s meant is that you should have a higher probability of failing in one direction (wasting time by taking an earlier train than needed), because the consequences of failing in the other direction (missing the interview) are much higher.

Conclusions and applications to prioritisation

Seeking to err in both directions can provide a useful tool in helping to form better judgements in uncertain situations. Many people may already have internalised key points, but it can be useful to have a label to facilitate discussion. Additionally, having a clear principle can help you to apply it in cases where you might not have noticed it was relevant.

How might this principle apply to priority-setting? It suggests that:

You should spend enough time and resources on the prioritisation itself that you think some of time may have been wasted (for example you should spend a while at the end without changing your mind much), but not so much that you are totally confident you have the right answer.
If you are unsure what discount rate to use, you should choose one so that you think that it could be either too high or too low.
If you don’t know how strongly to weigh fragile cost-effectiveness estimates against more robust evidence, you should choose a level so that you might be over- or under-weighing them.
When you are providing a best-guess estimate, you should choose a figure which could plausibly be wrong either way.

And one on track records:

Suppose you’ve made lots of grants. Then if you’ve never backed a project which has failed, you’re probably too risk-averse in your grantmaking.

Questions for readers

Do you know any other useful applications of this idea? Do you know anywhere where it seems to break? Can anyone work out easier-to-apply versions, and the circumstances in which they are valid?

Appendix: a sketch proof of the principle

Assume the true graph of value (on the vertical axis) against the decision you make (on the horizontal axis, representing the spectrum) is smooth, looking something like this:

The highest value is achieved at d, so this is where you’d like to be. But assume you don’t know quite where d is. Say your best guess is that d=g. But you think it’s quite possible that d>g, and quite unlikely that d<g. Should you choose g?

Suppose we compare g to g’, which is just a little bit bigger than g. If d>g, then switching from g to g’ would be moving up the slope on the left of the diagram, which is an improvement. If d=g then it would be better to stick with g, but it doesn’t make so much difference because the curve is fairly flat at the top. And if g were bigger than d, we’d be moving down the slope on the right of the diagram, which is worse for g’ -- but this scenario was deemed unlikely.

Aggregating the three possibilities, we found that two of them were better for sticking with g, but in one of these (d=g) it didn’t matter very much, and the other (d<g) just wasn’t very likely. In contrast, the third case (d>g) was reasonably likely, and noticeably better for g’ than g. So overall we should prefer g’ to g.

In fact we’d want to continue moving until the marginal upside from going slightly higher was equal to the marginal downside; this would have to involve a non-trivial chance that we are going too high. So our choice should have a chance of failure in either direction. This completes the (sketch) proof.

Note: There was an assumption of smoothness in this argument. I suspect it may be possible to get slightly stronger conclusions or work from slightly weaker assumptions, but I’m not certain what the most general form of this argument is. It is often easier to build a careful argument in specific cases.

Acknowledgements: thanks to Ryan Carey, Max Dalton, and Toby Ord for useful comments and suggestions.

6 comments

Comments sorted by top scores.

comment by Alejandro1 · 2014-08-21T15:17:56.521Z · LW(p) · GW(p)

The example of the three locks brings to mind another possible failure of this principle: that it can be exploited by deliberately giving us additional choices. For example, perhaps in this example the cheap lock is perfectly adequate for our needs, but seeing the existence of an expensive lock makes us believe that the regular one is the one that has equal chance of erring in both directions. I believe I read (in LW? or in Marginal Revolution?) that restaurant menus and sales catalogs often include some outrageously priced items to induce customers to buy the second-tier priced items, which look reasonable in comparison, but are the ones where most profit is made. Attempts to shift the Overton Window in politics rely on the same principle.

Replies from: owencb

↑ comment by owencb · 2014-08-22T10:30:18.642Z · LW(p) · GW(p)

Good example. It highlights that although erring on both sides should be a necessary condition for optimality when there's a full spectrum, it certainly isn't sufficient (and so as a fast rule of thumb it can be misled).

comment by Metus · 2014-08-21T14:05:34.454Z · LW(p) · GW(p)

I recall a similar discussion on LW.

But yes, try to find some mapping of physical reality to utility and optimise the latter. E.g. with time spent commuting and not beim late, assign a dollar value both to you being late - it can even be either a discrete or continuous penalty depending on your work - and wasting additional time. As with most advice, the mapping can be highly individual.

comment by Agathodaimon · 2014-08-29T05:14:21.639Z · LW(p) · GW(p)

This sounds like the principle of entropy maximization. I recommend reading wissner - gross ' s causal entropic forces

comment by Stuart_Armstrong · 2014-08-21T15:10:01.354Z · LW(p) · GW(p)

There was an assumption of smoothness in this argument.

And a monomodal assumption as well.

But many real-world distributions are approximately like that, so its good.

Replies from: owencb

↑ comment by owencb · 2014-08-22T10:27:42.930Z · LW(p) · GW(p)

I think the same argument works if there could be multiple peaks (even if my picture doesn't cover that case) -- you just need the local properties around the optimum to run things. But in that case you can't assume a local optimum is a global optimum, so it's harder to apply.

As you say in many cases we don't need to worry about these complications, so I haven't spent too much time on that.

Why we should err in both directions

Contents

Some trade-offs

The principle

Refinements

Predictions and track records

Failures

Conclusions and applications to prioritisation

Questions for readers

Appendix: a sketch proof of the principle

6 comments