The Epsilon Fallacy

post by johnswentworth · 2018-03-17T00:08:01.203Z · score: 70 (19 votes) · LW · GW · 7 comments

This is a link post for https://medium.com/@johnwentworth/the-epsilon-fallacy-94184386b1b1

Contents

  Program Optimization
  Carbon Emissions
  The 80/20 Rule
  Conclusion: Profile Your Code
  Footnotes
None
7 comments

Program Optimization

One of the earlier lessons in every programmer’s education is how to speed up slow code. Here’s an example. (If this is all greek to you, just note that there are three different steps and then skip to the next paragraph.)

// Step 1: import
Import foobarlib.*
// Step 2: Initialize random Foo array
Foo_field = Foo[1000]
// Step 3: Smooth foo field
For x in [1...998]:
Foo_field[x] = (foo_field[x+1] + foo_field[x-1])/2

Our greenhorn programmers jump in and start optimizing. Maybe they decide to start at the top, at step 1, and think “hmm, maybe I can make this import more efficient by only importing Foo, rather than all of foobarlib”. Maybe that will make step 1 ten times faster. So they do that, and they run it, and lo and behold, the program’s run time goes from 100 seconds to 99.7 seconds.

In practice, most slow computer programs spend the vast majority of their time in one small part of the code. Such slow pieces are called bottlenecks. If 95% of the program’s runtime is spent in step 2, then even the best possible speedup in steps 1 and 3 combined will only improve the total runtime by 5%. Conversely, even a small improvement to the bottleneck can make a big difference in the runtime.

Back to our greenhorn programmers. Having improved the run time by 0.3%, they can respond one of two ways:

The first response is what I’m calling the epsilon fallacy. (If you know of an existing and/or better name for this, let me know!)

The epsilon fallacy comes in several reasonable-sounding flavors:

The epsilon fallacy is tricky because these all sound completely reasonable. They’re even technically true. So why is it a fallacy?

The mistake, in all cases, is a failure to consider opportunity cost. The question is not whether our greenhorn programmers’ 0.3% improvement is good or bad in and of itself. The question is whether our greenhorn programmers’ 0.3% improvement is better or worse than spending that same amount of effort finding and improving the main bottleneck.

Even if the 0.3% improvement was really easy - even if it only took two minutes - it can still be a mistake. Our programmers would likely be better off if they had spent the same two minutes timing each section of the code to figure out where the bottleneck is. Indeed, if they just identify the bottleneck and speed it up, and don’t bother optimizing any other parts at all, then that will probably be a big win. Conversely, no matter how much they optimize everything besides the bottleneck, it won’t make much difference. Any time spent optimizing non-bottlenecks, could have been better spent identifying and optimizing the bottleneck.

This is the key idea: time spent optimizing non-bottlenecks, could have been better spent identifying and optimizing the bottleneck. In that sense, time spent optimizing non-bottlenecks is time wasted.

In programming, this all seems fairly simple. What’s more surprising is that almost everything in the rest of the world also works like this. Unfortunately, in the real world, social motivations make the epsilon fallacy more insidious.

Carbon Emissions

Back in college, I remember watching a video about this project. The video overviews many different approaches to carbon emissions reduction: solar, wind, nuclear, and bio power sources, grid improvements, engine/motor efficiency, etc. It argues that none of these will be sufficient, on its own, to cut carbon emissions enough to make a big difference. But each piece can make a small difference, and if we put them all together, we get a viable carbon reduction strategy.

This is called the “wedge approach”.

Here’s a chart of US carbon emissions from electricity generation by year, shamelessly cribbed from a Forbes article.

Note that emissions have dropped considerably in recent years, and are still going down. Want to guess what that’s from? Hint: it ain’t a bunch of small things adding together.

In the early 00’s, US oil drilling moved toward horizontal drilling and fracking. One side effect of these new technologies was a big boost in natural gas production - US natgas output has been growing rapidly over the past decade. As a result, natgas prices became competitive with coal prices in the mid-00’s, and electricity production began to switch from coal to natgas. The shift is already large: electricity from coal has fallen by 25%, while natgas has increased 35%.

The upshot: natgas emits about half as much carbon per BTU as coal, and electricity production is switching from coal to natgas en mass. Practically all of the reduction in US carbon emissions over the past 10 years has come from that shift.

Now, back to the wedge approach. One major appeal of the wedge narrative is that it’s inclusive: we have all these well-meaning people working on all sorts of different approaches to carbon reduction. The wedge approach says “hey, all these approaches are valuable and important pieces of the effort, let’s all work together on this”. Kum-bay-a.

But then we look at the data. Practically all the carbon reduction over the past decade has come from the natgas transition. Everything else - the collective effort of hundreds of thousands of researchers and environmentalists on everything from solar to wind to ad campaigns telling people to turn off their lights when not in the room - all of that adds up to barely anything so far, compared to the impact of the natgas transition.

Now, if you’re friends with some of those researchers and environmentalists, or if you did some of that work yourself, then this will all sound like a status attack. We’re saying that all these well-meaning, hard-working people were basically useless. They were the 0.3% improvement to run time. So there’s a natural instinct to defend our friends/ourselves, an instinct to say “no, it’s not useless, that 0.3% improvement was valuable and meaningful and important!” And we reach into our brains for a reason why our friends are not useless-

And that’s when the epsilon fallacy gets us.

“It’s still a positive change, so it’s worthwhile!”

“If we keep generating these small changes, it will add up to something even bigger than natgas!”

“Carbon emissions are huge, so even a small percent change matters a lot!”

This is the appeal of the wedge approach: the wedge approach says all that effort is valuable and important. It sounds a lot nicer than calling everyone useless. It is nicer. But niceness does not reduce carbon emissions.

Remember why the epsilon fallacy is wrong: opportunity cost.

Take solar photovoltaics as an example: PV has been an active research field for thousands of academics for several decades. They’ve had barely any effect on carbon emissions to date. What would the world look like today if all that effort had instead been invested in accelerating the natgas transition? Or in extending the natgas transition to China? Or in solar thermal or thorium for that matter?

Now, maybe someday solar PV actually will be a major energy source. There are legitimate arguments in favor.¹ Even then, we need to ask: would the long-term result be better if our efforts right now were focussed elsewhere? I honestly don’t know. But I will make one prediction: one wedge will end up a lot more effective than all others combined. Carbon emission reductions will not come from a little bit of natgas, a little bit of PV, a little bit of many other things. That’s not how the world works.

The 80/20 Rule

Suppose you’re a genetic engineer, and you want to design a genome for a very tall person.

Our current understanding is that height is driven by lots of different genes, each of which has a small impact. If that’s true, then integral(epsilon) isn’t a fallacy. A large number of small changes really is the way to make a tall person.

On the other hand, this definitely is not the case if we’re optimizing a computer program for speed. In computer programs, one small piece usually accounts for the vast majority of the run time. If we want to make a significant improvement, then we need to focus on the bottleneck, and any improvement to the bottleneck will likely be significant on its own. “Lots of small changes” won’t work.

So… are things usually more like height, or more like computer programs?

A useful heuristic: the vast majority of real-world cases are less like height, and more like computer programs. Indeed, this heuristic is already well-known in a different context: it’s just the 80/20 rule. 20% of causes account for 80% of effects.

If 80% of any given effect is accounted for by 20% of causes, then those 20% of causes are the bottleneck. Those 20% of causes are where effort needs to be focused to have a significant impact on the effect. For examples, here’s wikipedia on the 80/20 rule:

You can go beyond wikipedia to find whole books full of these things, and not just for people-driven effects. In the physical sciences, it usually goes under the name “power law”.

(As the examples suggest, the 80/20 rule is pretty loose in terms of quantitative precision. But for our purposes, qualitative is fine.)

So we have an heuristic. Most of the time, the epsilon fallacy will indeed be a fallacy. But how can we notice the exceptions to this rule?

One strong hint is a normal distribution. If an effect results from adding up many small causes, then the effect will (typically) be normally distributed. Height is a good example. Short-term stock price movements are another good example. They might not be exactly normal, or there might be a transformation involved (stock price movements are roughly log-normal). If there’s an approximate normal distribution hiding somewhere in there, that’s a strong hint.

But in general, omitting some obvious normal distribution, our prior assumption should be that most things are more like computer programs than like height. The epsilon fallacy is usually fallacious.

Conclusion: Profile Your Code

Most programmers, at some point in their education/career, are given an assignment to speed up a program. Typically, they start out by trying things, looking for parts of the code which are obviously suboptimal. They improve those parts, and it does not seem to have any impact whatsoever on the runtime.

After wasting a few hours of effort on such changes, they finally “profile” the code - the technical name for timing each part, to figure out how much time is spent in each section. They find out that 98% of the runtime is in one section which they hadn’t even thought to look at. Of course all the other changes were useless; they didn’t touch the part where 98% of the time is spent!

The intended lesson of the experience is: ALWAYS profile your code FIRST. Do not attempt to optimize any particular piece until you know where the runtime is spent.

As in programming, so in life: ALWAYS identify the bottleneck FIRST. Do not waste time on any particular small piece of a problem until you know which piece actually matters.

Footnotes

¹The Taleb argument provides an interesting counterweight to the epsilon fallacy. If we’re bad at predicting which approach will be big, then it makes sense to invest a little in many different approaches. We expect most of them to be useless, but a few will have major results - similar to venture capital. That said, it’s amazing how often people who make this argument just happen to end up working on the same things as everyone else.

7 comments

Comments sorted by top scores.

comment by gjm · 2018-03-17T22:03:55.732Z · score: 23 (5 votes) · LW · GW

This has nothing at all to do with the point actually under discussion, but my reaction on looking at those three lines of code was: hmmmm, that third line almost certainly isn't doing what its author intends it to do. It replaces entries in order, left to right, and replaces each entry by the average of the new entry on the left and the old entry on the right. But if someone wrote code like that, without a comment saying otherwise, I would bet they meant it just to replace each entry by the average of its two neighbours.

Also, it's a weird sort of smoothing; e.g., if the input is +1, -1, +1, -1, +1, -1, etc., then it won't smooth it at all, just invert it. It would likely be better to convolve with something like [1,2,1]/4 instead of [1,0,1]/2.

comment by johnswentworth · 2018-03-18T14:07:51.784Z · score: 4 (1 votes) · LW · GW

Oh lol I totally missed that. Apparently I've been using numpy for everything so long that I've forgotten how to do it c-style.

comment by gjm · 2018-03-17T23:03:09.273Z · score: 13 (3 votes) · LW · GW

I'm willing to defend the wedge argument a bit. Let's consider those thousands of scientists who worked on solar electricity generation. Clearly, as you say, what they did wasn't useless -- they did produce a technology that does a useful thing. So the question, again as you say, is opportunity costs. What should those scientists have done instead of working on solar electricity generation?

Perhaps they should have gone into hydraulic fracking. But: 1. Presumably these are mostly experts in things like semiconductor physics, the material-science properties of silicon, etc. They'd not be that much use to the frackers. 2. A lot of their work happened before (so far as I know) there was good reason to think that horizontal drilling and fracking would be both effective and politically acceptable. So what's the actual principle these people could and should have followed, that would have led them to do something more effective? I suspect there isn't one.

(Also ... I have the impression, though it's far from an expert one and may be mostly a product of dishonest propaganda, that fracking has a bunch of bad environmental consequences that aren't captured by that graph showing carbon emissions. Unless the only thing we care about is carbon emissions, you can't just go from "biggest reduction in carbon emissions is from fracking" to "fracking should dominate our attempts to reduce carbon emissions" without some consideration of the other effects of fracking and other carbon-emissions-reducing activities.)

comment by johnswentworth · 2018-03-18T14:27:03.683Z · score: 4 (1 votes) · LW · GW

The first objection is particularly interesting, and I've been mulling another post on it. As a general question: if you want to have high impact on something, how much decision-making weight should you put on leveraging your existing skill set, versus targeting whatever the main bottleneck is regardless of your current skills? I would guess that very-near-zero weight on current skillset is optimal, because people generally aren't very strategic about which skills they acquire. So e.g. people in semiconductor physics etc probably didn't do much research in clean energy bottlenecks before choosing that field - their skillset is mostly just a sunk cost, and trying to stick to it is mostly sunk cost fallacy (to the extent that they're actually interested in reducing carbon emissions). Anyway, still mulling this.

Totally agree with the second objection. That said, there are technologies which have been around as long as PV which look at-least-as-promising-and-probably-more-so but receive far less research attention - solar thermal and thorium were the two which sprang to mind, but I'm sure there's more. From an outside view, we should expect this to be the case, because academics usually don't choose their research to maximize impact - they choose it based on what they know how to study. Which brings us back to the first point.

comment by lifelonglearner · 2018-03-17T19:17:59.814Z · score: 9 (2 votes) · LW · GW

The carbon emissions example is a great one that I think people don't take into account that often. EX: *Even if* every recycling / energy reduction campaign worked, i.e. if residential emissions dropped to 0%, this is still only about 12% of the US's overall emissions.

comment by Teja Prabhu (0xpr) · 2018-03-17T05:01:01.518Z · score: 3 (1 votes) · LW · GW
The first response is what I’m calling the epsilon fallacy. (If you know of an existing and/or better name for this, let me know!)

This reminds me of Amdahl's Law. You could call it Amdahl's fallacy, but I'm not sure if it is a better name.

comment by mraxilus · 2018-03-17T17:01:06.019Z · score: 5 (2 votes) · LW · GW

As a fellow programmer, I think the epsilon fallacy is more memorable. If it were Amdahl's fallacy, it would be one of those fallacies I have to constantly lookup the fifty times or so (terrible memory, and not enough slack/motivation for a fallacy memory palace).