Optimizing Multiple Imperfect Filters

post by johnswentworth · 2021-09-15T22:57:16.961Z · LW · GW · 4 comments

Contents

  Sales Funnel
  Why Is This Interesting?
None
4 comments

Imagine we’re in the quality control department of a widget-production company. Our widgets have quite high standards - they go through 15 independent tests, and if a widget fails any test, it’s thrown out. Of course, the tests aren’t perfect: there are false positives and false negatives, and we can adjust the cutoffs for each test to trade off between false positives and false negatives. Make the cutoff very high, and every widget fails the test - no false negatives, but a very high false positive rate. Make the cutoff very low, and we have the opposite problem - every widget passes the test, so we have no false positives but tons of false negatives. Our assignment is to optimize the test pipeline - specifically, to choose the cutoff for each test to yield the lowest overall false negative rate for a given false positive rate, across the whole pipeline.

It turns out there’s a simple principle for this: the marginal trade off should be the same for each test.

Here’s what that means. For each test, we compute the change in both false positives and false negatives from adjusting the cutoff by some small amount. (In other words, we’re estimating the derivative of false positive rate and false negative rate with respect to the cutoff parameter.) Then, take the ratio of those two changes. That’s the marginal trade off. If that ratio is different between the two tests, then we can achieve a Pareto improvement (i.e. improve both false positive and false negative rate) by raising the cutoff on one and lowering it on the other.

Let’s walk through an example to see how that works.

Let’s imagine that Test 1 involves a numerical score, and the score has to be above 36 for the widget to pass. With that cutoff, we find a 6% false positive rate, and a 15% false negative rate. If we increase the cutoff one unit to 37, then the false positive rate goes down to 5.5%, and the false negative rate goes up to 17%. We’ll assume that the change is approximately linear as long as we only change the cutoff by a small amount (let’s say 1 unit or less), so decreasing the cutoff by one unit results in the same size changes but in the opposite direction, increasing the cutoff by one-half unit results in changes half as large, etc.

With these numbers, each one-unit increase in the cutoff decreases false positive rate by 6% - 5.5% = 0.5% and increases false negative rate by 17% - 15% = 2%. So, they trade off at a ratio of 0.5:2 = 0.25.

 

Δ False Positive

Δ False Negative

Ratio

Test 1

0.5%

2%

0.25

Now, let’s add a second test with a different trade off ratio, and see how that allows a Pareto improvement.

Let’s say our second test has a 3% false positive rate and a 6% false negative rate, and we can raise its cutoff a little bit to achieve 2% false positive and 6.5% false negative. Again, we’ll assume linearity for small changes.

 

Δ False Positive

Δ False Negative

Ratio

Test 1

0.5%

2%

0.25

Test 2

1%

0.5%

2

How can we achieve a pareto gain in this example? Well, our ratios say that we can reduce our false positive rate by 1% at a cost of 0.5% more false negatives by increasing the cutoff by a little on test 2. So, how about we adjust test 1 to cancel out those extra false negatives? To reduce false negatives on test 1 by 0.5%, we need to increase the false positives by 0.25 * 0.5% = 0.125%. So, with both of these changes, our false positive rate decreases 1% - 0.125% = 0.875% overall, while the false negative rate stays the same overall.

Conceptually, our two tests are “trading”: test 1 has a lower opportunity cost for reducing false negatives, and test 2 has a lower opportunity cost for reducing false positives. So, test 1 reduces false negatives for a bit, and “trades” some of those reduced false negatives to test 2 in exchange for reduced false positives - which cancel out the extra false positives test 1 created. Test 2 reduces false positives a bit, and “trades” some of those reduced false positives to test 1 in exchange for reduced false negatives - which cancel out the extra false negatives test 2 created. And since the two have different ratios, this “trade” can produce a net pareto improvement. It’s the Principle of Comparative Advantage: each test specializes a little more in the error-type for which it has lower opportunity cost.

In general, if we have more tests, we can do this sort of “trade” between any two tests with different ratios. We reach Pareto optimality when each test has the same marginal trade off ratio as the other tests.

Sales Funnel

Here’s an application of the same idea, to a problem which looks different at first glance.

Let’s imagine a prototypical sales funnel for a car dealership or mortgage company or the like. Customers come in from online ads, fill out some basic info on a website, get pre-screened on a phone call, then spend a bunch of time talking to a sales person. At each step, we have two conflicting goals:

At each step, we can tune how much we optimize for each goal. Our ads could be very narrowly targeted to only the most promising customers, or we could cast a wider net to bring in more people. The website could ask for more or less information, or we could adjust our cutoff for moving a customer through to the pre-screen call. Same with the pre-screen: we could ask for more or less information, or adjust our cutoff for moving a customer on to a salesperson.

As we go down the funnel, number of customers decreases but quality (i.e. probability of buying) goes up.

How do we optimize this?

Just like before, we can compute the marginal trade offs at each stage. For instance, we might tighten our ad targeting a little, and find that the change brings in 5% fewer people, but only 2% fewer actual buyers. So, that’s a 5 - 2 = 3 percentage point decrease in duds in exchange for a 2 percentage point increase in missed opportunities - a ratio of 3:2.

Meanwhile, we might remove one step on the website to allow through 10% more people and increase the number of buyers by 5%. That’s a 10 - 5 = 5 percentage point increase in duds, in exchange for a 5% decrease in missed opportunities - a ratio of 1:1.

 

Δ Duds

Δ Missed Opportunities

Ratio

Ads

3%

2%

1.5

Website

5%

5%

1

One possible Pareto gain: tighten the ad targeting exactly as above, then remove the extra step on the website for only 50% of users. The ad change will decrease duds by 3 percentage points, while the website change will increase “duds by 2.5 percentage points, for 0.5% drop overall in duds. Meanwhile, the ad change will increase missed opportunities by 2 percentage points, and the website change will decrease missed opportunities by 2.5 percentage points, so overall missed opportunities will fall by 0.5%. The total number of people reaching a salesperson will remain the same, but 0.5% more will actually buy.

In general: for each stage, we test how many more duds we need to let through in order to get some amount of additional buyers (i.e. reduction in missed opportunities). From that, we calculate the marginal trade off ratio between duds and missed opportunities.

If those ratios are different across stages, then we can make a “trade” - filter a little harder where it’s “cheap” (i.e. loses fewer buyers), and a little less filtering where it’s “expensive” (i.e. loses more buyers). Once the funnel is fully optimized, we should find that every stage either

As long as all stages can be adjusted in either direction, and the trade-off ratios are different at different stages, we can achieve pareto gains from “trades” between stages.

Why Is This Interesting?

Other than potential usefulness to businesses optimizing their sales funnels, this is a great example of a cool insight from the framing practica [? · GW]. This one came from an in-person version of the practicum; Eli [LW · GW] came up with a sales funnel as an example where we could apply the comparative advantage frame [? · GW], and Ruby [LW · GW] suggested “false positives” vs “false negatives” as a good tradeoff to think about while reviewing a draft [LW · GW] of the post. (Ruby also thinks it would be cool if someone built a widget to play around with this idea; if anyone tries that, leave a comment with a link!)

I spent several years at companies with deep sales funnels (an online car dealership and a mortgage startup). I worked directly on optimizing those funnels, at times. And I never heard of anything like this. It’s simple and intuitive enough to explain to a not-very-technical manager, and the potential gains are obvious - especially in cases where the marginal trade offs differ by an order of magnitude or more. Just calculate a magic number for each stage of the funnel, and if those numbers aren’t equal, adjust the stages for pareto gains! It’s the sort of idea which I’d expect startup-types to love. It’s also exactly the sort of insight which I hope/expect people who use the framing practica will stumble on regularly.

4 comments

Comments sorted by top scores.

comment by Measure · 2021-09-16T01:20:26.681Z · LW(p) · GW(p)

Make the cutoff very high, and every widget fails the test - no false positives, but a very high false negative rate.

This is reversed. (The following sentence is correct.)

even though they would have bought

A couple of errors here.

Replies from: johnswentworth
comment by johnswentworth · 2021-09-16T01:24:46.959Z · LW(p) · GW(p)

Fixed, thank you.

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2021-09-16T10:14:18.516Z · LW(p) · GW(p)

I thought the fix was the wrong way round, but then I realised that you don't say anywhere what you mean by "positive", "negative", or "cutoff"". When a test judges a widget defective, is that called a "positive" or "negative" result? If the test involves calculating some score to be compared against the "cutoff", does exceeding it mean that the widget passes or that it fails?

Replies from: johnswentworth
comment by johnswentworth · 2021-09-16T16:21:55.952Z · LW(p) · GW(p)

Yeah, I didn't want to spend a paragraph on definitions which nobody would be able to keep straight anyway. "False positive" and "false negative" are just very easy-to-confuse terms in general. That's why I switched to "duds" and "missed opportunities" in the sales funnel section.