Mathematical simplicity bias and exponential functions

taw

Mathematical simplicity bias and exponential functions

post by taw · 2009-08-26T18:34:25.269Z · LW · GW · Legacy · 87 comments

87 comments

One of biases that are extremely prevalent in science, but are rarely talked about anywhere, is bias towards models that are mathematically simple and easier to operate on. Nature doesn't care all that much for mathematical simplicity. In particular I'd say that as a good first approximation, if you think something fits exponential function of either growth or decay, you're wrong. We got so used to exponential functions and how convenient they are to work with, that we completely forgot the nature doesn't work that way.

But what about nuclear decay, you might be asking now... That's as close you get to real exponential decay as you get... and it's not nowhere close enough. Well, here's a log-log graph of Chernobyl release versus theoretical exponential function, plotted in log-log.

Well, that doesn't look all that exponential... The thing is that even if you have perfect exponential decay processes as with single nucleotide decay, when you start mixing a heterogeneous group of such processes, the exponential character is lost. Early in time faster-decaying cases dominate, then gradually those that decay more slowly, somewhere along the way you might have to deal with results of decay (pure depleted uranium gets more radioactive with time at first, not less, as it decays into low half-life nuclides), and perhaps even some processes you didn't have to consider (like creation of fresh radioactive nuclides via cosmic radiation).

And that's the ideal case of counting how much radiation a sample produces, where the underlying process is exponential by the basic laws of physics - it still gets us orders of magnitude wrong. When you're measuring something much more vague, and with much more complicated underlying mechanisms, like changes in population, economy, or processing power.

According to IMF, world economy in 2008 was worth 69 trillion $ PPP. Assuming 2% annual growth and naive growth models, the entire world economy produces 12 cents PPP worth of value in entire first century. And assuming fairly stable population, an average person in 3150 will produce more that the entire world does now. And with enough time dollar value of one hydrogen atom will be higher than current dollar value of everything on Earth. And of course with proper time discounting of utility, life of one person now is worth more than half of humanity millennium into the future - exponential growth and exponential decay are both equally wrong.

To me they all look like clear artifacts of our growth models, but there are people who are so used to them that they treat predictions like that seriously.

In case you're wondering, here are some estimates of past world GDP.

87 comments

Comments sorted by top scores.

comment by byrnema · 2009-08-27T16:35:11.701Z · LW(p) · GW(p)

I have this vague idea that sometime in our past, people thought that knowledge was like an almanac; a repository of zillions of tiny true facts that summed up to being able to predict stuff about stuff, but without a general understanding of how things work. There was no general understanding because any heuristic that would begin to explain how things work would immediately be discounted by the single tiny fact, easily found, that contradicted it. Details and concern with minutia and complexity is actually anti-science for this reason. It’s not that details and complexity aren’t important, but you make no progress if you consider them from the beginning.

And then I wondered: is this knee-jerk reaction to dismiss any challenge of the keep-it-simple conventional wisdom the reason why we’re not making more progress in complex fields like biology?

For classical physics it has been the case that the simpler the hypothetical model you verify, the more you cash out in terms of understanding physics. The simpler the hypothesis you test, the easier it is to establish if the hypothesis is true and the more you learn about physics if it is true. However, what considering and verifying simpler and simpler hypotheses actually does is transfer the difficulty of understanding the real-world problem to the experimental set-up. To verify your super-simple hypothesis, you need to eliminate confounding factors from the experiment. Success in classical physics has occurred because when experiments were done, confounding factors could be eliminated through a well-designed set-up or were small enough to neglect. (Consider Galileo’s purported experiment of dropping two objects from a height – in real life that particular experiment doesn’t work because the lighter object may fall more slowly.)

In complex fields this type of modeling via simplification doesn’t seem to cash out as well, because it’s more difficult to control the experimental set-up and the confounding effects aren't negligible. So while I've always believed that models need to be simple, I would consider a different paradigm if it could work. How could understanding the world work any other way than through simple models?

Some method trends in biology: high through-put, random searches, brute force, etc.

Replies from: taw, Johnicholas, Gavin

↑ comment by taw · 2009-08-28T14:34:01.703Z · LW(p) · GW(p)

I must disagree with premise that biology is not making progress while physics is. As far as I can tell biology is making progress many orders of magnitude larger and more practically significant than physics at the moment.

And it requires this messy complex paradigm of accumulating plenty of data and mining it for complicated regularities - even the closest things biology has to "physical laws" like the Central Dogma or how DNA sequences translate to protein sequences, each have enough exceptions and footnotes to fill a small book.

The world isn't simple. Simple models are usually very wrong. Exceptions to this pattern like basic physics are extremely unusual, and shouldn't be taken as a paradigm for all science.

Replies from: Simon_Jester, byrnema

↑ comment by Simon_Jester · 2009-08-29T10:16:59.350Z · LW(p) · GW(p)

The catch is that complex models are also usually very wrong. Most possible models of reality are wrong, because there are an infinite legion of models and only one reality. And if you try too hard to create a perfectly nuanced and detailed model, because you fear your bias in favor of simple mathematical models, there's a risk. You can fall prey to the opposing bias: the temptation to add an epicycle to your model instead of rethinking your premises. As one of the wiser teachers of one of my wiser teachers said, you can always come up with a function that fits 100 data points perfectly... if you use a 99th-order polynomial.

Naturally, this does not mean that the data are accurately described by a 99th-order polynomial, or that the polynomial has any predictive power worth giving a second glance. Tacking on more complexity and free parameters doesn't guarantee a good theory any more than abstracting them out does.

↑ comment by byrnema · 2009-09-01T00:44:12.156Z · LW(p) · GW(p)

I must disagree with premise that biology is not making progress while physics is. As far as I can tell biology is making progress many orders of magnitude larger and more practically significant than physics at the moment.

I actually entirely agree with you. Biology is making terrific progress, and shouldn't be overly compared with physics. Two supporting comments:

First, when biology is judged as nascent, this may be because it is being overly compared with physics. Success in physics meant finding and describing the most fundamental relationship between variables analytically, but this doesn't seem to be what the answers look like in biology. (As Simon Jester wrote here, describing the low-level rules is just the beginning, not the end.) And the relatively simple big ideas, like the theory of evolution and the genetic code, are still often judged as inferior in some way as scientific principles. Perhaps because they're not so closely identified with mathematical equations.

Further, and secondly, the scientific culture that measures progress in biology using the physics paradigm may still be slowing down our progress. While we are making good progress, I also feel a resistance: the reality of biology doesn't seem to be responding well to the scientific epistemology we are throwing at it. But I'm still open-minded, maybe our epistemology needs to be updated or maybe our epistemology is fine and we just need to keep forging on.

↑ comment by Johnicholas · 2009-08-27T17:19:53.517Z · LW(p) · GW(p)

Rather than describing the difference between physics and biology as "simple models" vs. "complex models", describe them in terms of expected information content.

Physicists generally expect an eventual Grand Unified Theory to be small in information content (one or a few pages of very dense differential equations, maybe as small as this: http://www.cs.ru.nl/~freek/sm/sm4.gif ). On the order of kilobytes, plus maybe some free parameters.

Biologists generally expect an eventual understanding of a species to be much much bigger. At the very least, the compressed human genome alone is almost a gigabyte; a theory describing how it works would be (conservatively) of the same order of magnitude.

All things being equal, would biologists prefer a yottabyte-sized theory to a zettabyte-sized theory? No, absolutely not! The scientific preference is still MOSTLY in the direction of simplicity.

There's a lot of sizes out there, and the fact that gigabyte-sized theories seem likely to defeat kilobyte-sized theories in the biological domain shouldn't be construed as a violation of the general "prefer simplicity" rule.

Replies from: timtyler

↑ comment by timtyler · 2010-08-23T17:51:24.180Z · LW(p) · GW(p)

The uncompressed human genome is about 750 megabytes.

Replies from: Johnicholas

↑ comment by Johnicholas · 2010-08-23T22:17:40.819Z · LW(p) · GW(p)

Thanks, and I apologize for the error.

↑ comment by Gavin · 2009-08-30T08:12:44.369Z · LW(p) · GW(p)

Biology is a special case of physics. Physicists may at some point arrive at a Grand Unified Theory of Everything that theoretically implies all of biology.

Biology is the classification and understanding of the complicated results of physics, so it is in many ways basically an almanac.

Replies from: byrnema, Simon_Jester

↑ comment by byrnema · 2009-09-01T01:00:55.758Z · LW(p) · GW(p)

I hope that when we understand biology better, it won't seem like an almanac. I predict that our understanding of what "understanding" means will shift dramatically as we continue to make progress in biology. For example -- just speculating -- perhaps we will feel like we understand something if we can compute it. Perhaps we will develop and run models of biological phenemena as trivially as using a calculator, so that such knowledge seems like an extension of what we "know". And then understanding will mean identifying the underlying rules, while the almanac part will just be the nitty gritty output; like doing a physics calculation for specific forces. (For example, it's pretty neat that WHO is using modeling in real time to generate information about the H1N1 pandemic.)

Replies from: Gavin

↑ comment by Gavin · 2009-09-01T21:07:01.068Z · LW(p) · GW(p)

My use of the world "almanac" was more of a reference to the breadth of the area covered by biology, rather than a comment on the difficulty or content of the information.

It's funny that you mention predictive modeling--one of the main functions of an Almanac is to provide predictions based on models.

From http://en.wikipedia.org/wiki/Almanac: "Modern almanacs include a comprehensive presentation of statistical and descriptive data covering the entire world. Contents also include discussions of topical developments and a summary of recent historical events."

Replies from: byrnema

↑ comment by byrnema · 2009-09-01T21:43:24.428Z · LW(p) · GW(p)

Yes, I noticed that I was still nevertheless describing biology as an almanac, as a library of information (predictions) that we will feel like we own because we can generate it. I suppose the best way to say what I was trying to say is that I hope that when we have a better understanding of biology, the term "almanac" won't seem pejorative, but the legitimate way of understanding something that has large numbers of similar interacting components.

↑ comment by Simon_Jester · 2009-08-30T11:02:31.378Z · LW(p) · GW(p)

This is profoundly misleading. Physicists already have a good handle on how the things biological systems are made of work, but it's a moot point because trying to explain the details of how living things operate in terms of subatomic particles is a waste of time. Unless you've got a thousand tons of computronium tucked away in your back pocket, you're never going to be able to produce useful results in biology purely by using the results of physics.

Therefore, the actual study of biology is largely separate from physics, except for the very indirect route of quantum physics => molecular chemistry => biochemistry => biology. Most of the research in the field has little to do with those paths, and each step in the indirect chain is another level of abstraction that allows you to ignore more of the details of how the physics itself works.

Replies from: Gavin

↑ comment by Gavin · 2009-08-30T21:53:53.946Z · LW(p) · GW(p)

The ultimate goal of physics is to break things down until we discover the simplest, most basic rules that govern the universe.

The goals of biology do not lead down what you call the "indirect route." As you state, Biology abstracts away the low-level physics and tries to understand the extremely complicated interactions that take place at a higher level.

Biology attempts to classify and understand all of the species, their systems, their subsystems, their biochemistry, and their interspecies and environmental interactions. The possible sum total of biological knowledge is an essentially limitless dataset, what I might call the "Almanac of Life."

I'm not sure quite where you think we disagree. I don't see anything in our two posts that's contradictory--unless you find the use of the word "Almanac" disparaging to biologists? I hope it's clear that it wasn't a literal use -- biology clearly isn't a yearly book of tabular data, so perhaps the simile is inapt.

Replies from: Simon_Jester

↑ comment by Simon_Jester · 2009-08-31T23:15:05.590Z · LW(p) · GW(p)

The way you put it does seem to disparage biologists, yes. The biologists are doing work that is qualitatively different from what physicists do, and that produces results the physicists never will (without the aforementioned thousand tons of computronium, at least). In a very real sense, biologists are exploring an entirely different ideaspace from the one the physicists live in. No amount of investigation into physics in isolation would have given us the theory of evolution, for instance.

And weirdly, I'm not a biologist; I'm an apprentice physicist. I still recognize that they're doing something I'm not, rather than something that I might get around to by just doing enough physics to make their results obvious.

comment by Psychohistorian · 2009-08-27T17:33:44.128Z · LW(p) · GW(p)

I recall a discussion I had with a fellow econ student on the effects of higher taxes. He said something to the effect of, "Higher taxes are inefficient, and all you need to do to prove that is to draw the graph." (Unfortunately the topic changed before I could point out the problems with this statement.)

This (rather common) view reflects two major problems with modeling (particularly in economics): an amoral value (economic efficiency) becomes a normative value because it's relatively easy to understand and (in theory) measure, and, more relevant as an example for this post, the model is seen as demonstrating reality, rather than vice versa. The model thus becomes a complete way of looking at the world, as it is both normative and the world is supposed to conform to it.

I think a lot of scientists see theory as the highest good: reality is defective insofar as it fails to conform to an elegant theory, rather than the other way around. When expressed this way, it's obviously a foolish idea, but it's an insidious one nonetheless. "I'd be right if it weren't for all those confounding variables!" may be true, but you're still wrong.

Replies from: Vladimir_Nesov, PhilGoetz

↑ comment by Vladimir_Nesov · 2009-08-27T19:09:28.279Z · LW(p) · GW(p)

This is related to this post by Katja Grace:

If something complicated is obvious, such as anything that anybody seriously studies, then for it to be simple you must be abstracting it a lot. When people find such things obvious, what they often mean is that the abstraction is so clear and simple its implications are unarguable. This is answering the wrong question. Most of the reasons such conclusions might be false are hidden in what you abstracted away. The question is whether you have the right abstraction for reality, not whether the abstraction has the implications it seems to.

↑ comment by PhilGoetz · 2009-08-29T19:22:46.451Z · LW(p) · GW(p)

I would bet that your fellow econ student became a Republican or Libertarian before convincing himself that higher taxes are provably inefficient. (Higher than what? Failing to have an answer to that proves irrationality.) Confusion induced by ideology is different from confusion induced by math.

Replies from: Douglas_Knight

↑ comment by Douglas_Knight · 2009-08-30T06:40:55.509Z · LW(p) · GW(p)

Most Democratic academic economists agree with the claim that higher taxes are inefficient ("deadweight loss"). That inefficiency is the main cost of taxation, which must be balanced against the good that can be accomplished by the government using the revenue. ("Higher than what" you ask? Almost any increase in tax is inefficient. But DeLong and Mankiw certainly agree that a height tax is efficient.)

Replies from: CronoDAS

↑ comment by CronoDAS · 2009-08-30T07:03:57.975Z · LW(p) · GW(p)

Well, the key is "what kind of taxes, and on what?"

Taxes that distort incentives away from the no-externality, no-taxes perfect competition equilibrium, do create econ-101-style inefficiency, but not all possible taxes distort incentives, not all possible taxes are on things that have no negative externalities, and not all markets are in a perfect competition equilibrium.

In the real world, all else is never equal.

comment by SforSingularity · 2009-08-27T12:25:24.053Z · LW(p) · GW(p)

One of biases that are extremely prevalent in science, but are rarely talked about anywhere, is bias towards models that are mathematically simple and easier to operate on.

I think that this is a heuristic rather than a bias, because favoring simple models over complex ones is generally a good thing. In particular, the complexity prior is claimed by some to be a fundemental principle of intelligence.

Replies from: taw

↑ comment by taw · 2009-08-28T14:38:15.189Z · LW(p) · GW(p)

This is only true as long as difference between simple and complex models are small, and only because simple model avoids overfitting problem. For many orders of magnitude failures choosing a simple and wrong over complex and right is not very intelligent.

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-29T20:58:25.849Z · LW(p) · GW(p)

There is a formal theory describing how to balance model complexity against fit to data: describe the model using a program on a simple, fixed turing machine, and then penalize that model by assigning a prior probability to it of 2^-L, where L is the length of the program...

this has all been worked out.

Replies from: Nick_Tarleton, byrnema, whpearson

↑ comment by Nick_Tarleton · 2009-08-29T23:07:12.422Z · LW(p) · GW(p)

Which Turing machine, though?

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-29T23:38:14.240Z · LW(p) · GW(p)

How about picking one of these?

Replies from: steven0461

↑ comment by steven0461 · 2009-08-29T23:56:35.715Z · LW(p) · GW(p)

unpaywalled version

↑ comment by byrnema · 2009-08-29T22:40:30.416Z · LW(p) · GW(p)

this has all been worked out.

...using a model. (I suppose someone could argue that it's not complex enough.)

↑ comment by whpearson · 2009-08-29T22:21:16.614Z · LW(p) · GW(p)

It can't be bettered on average, assuming that the thing you are modelling is computable.

But I haven't seen any proof to say that any other strategy will do worse on average. Anyone got any links?

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-29T22:23:54.915Z · LW(p) · GW(p)

See Hutter, Legg.

Replies from: whpearson

↑ comment by whpearson · 2009-08-29T23:26:39.496Z · LW(p) · GW(p)

If I understand the maths right the important part of http://www.hutter1.net/ai/paixi.ps for using kolmogorov complexity is the part of section 2 that says

"The SPΘμ system is best in the sense that EnΘμ ≤ Enρ for any ρ."

That doesn't guarantee that this EnΘμ = Enρ for large numbers of different ρ. isn't true which would invalidate any claims of it being the one right way of doing things.

I was interested in links to papers with that theorem disproved.

comment by Mike Bishop (MichaelBishop) · 2009-08-27T13:41:41.961Z · LW(p) · GW(p)

To me, the problem is not "Mathematical Simplicity Bias," but rather, failing to check the model with empirical data. It seems totally reasonable to start with a simple model and add complexity necessary to explain the phenomenon. (Of course it is best to test the model on new data.)

Also, if you're going to claim Mathematical Simplicity Bias is, "One of biases that are extremely prevalent in science," it would help to provide real examples of scientists failing because of it.

Replies from: Daniel_Burfoot

↑ comment by Daniel_Burfoot · 2009-08-27T14:41:15.044Z · LW(p) · GW(p)

It seems totally reasonable to start with a simple model and add complexity necessary to explain the phenomenon.

Careful. It is reasonable to add complexity if the complexity is justified by increased explanatory power on a sufficiently large quantity of data. If you attempt to use a complex model to explain a small amount of data, you will end up overfitting the data. Note that this leaves us in a somewhat unpleasant situation: if there is a complex phenomenon regarding which we can obtain only small amounts of data, we may be forced to accept that the phenomenon simply cannot be understood.

Replies from: MichaelBishop

↑ comment by Mike Bishop (MichaelBishop) · 2009-08-27T17:27:18.074Z · LW(p) · GW(p)

Yes, this is exactly the point I was getting at when I wrote: "Of course it is best to test the model on new data."

comment by Johnicholas · 2009-08-27T03:31:25.582Z · LW(p) · GW(p)

In general, rules of thumb have two dimensions - applicability (that is the size of the domain where it applies) and efficacy (the amount or degree of guidance that the rule provides).

Simplicity, a.k.a Occam's Razor, is mentioned frequently as a guide in these (philosophy of science/atheist/AI aficionado) circles. However, it is notable more for its broad applicability than for its efficacy compared to other, less-broadly-applicable guidelines.

Try formulating a rule for listing natural numbers (positive integers) without repeats that does not generally trend upwards. For example, you could alternate between powers of ten and powers of two: 1, 10, 2, 100, 4, 1000, ... Regardless of the rule, you cannot list the natural numbers from largest to smallest; there is no largest. Whichever you pick as your first, you will eventually be forced past by the "no repeats" clause.

A general learner can be viewed as outputting a list of numbers (coding for hypotheses). Occam's razor is roughly the observation "the numbers will generally trend upwards". There's still a lot up in the air after that observation.

Replies from: timtyler

↑ comment by timtyler · 2009-08-31T13:10:48.331Z · LW(p) · GW(p)

What's with the Occam bashing? Yes, the OP wrote:

"Nature doesn't care all that much for mathematical simplicity."

...but that doesn't make it true: Occam's Razor is great!

comment by CronoDAS · 2009-08-26T22:05:18.935Z · LW(p) · GW(p)

A discharging capacitor is a pretty good fit for exponential decay. (At least, until it's very very close to being completely discharged.)

Replies from: Bo102010, taw

↑ comment by Bo102010 · 2009-08-27T02:31:04.599Z · LW(p) · GW(p)

This is consistent with my experience...

I've always been skeptical of very simple models to describe large behaviors, but in my first EE labs classes I was astounded at how well a very simple superposition of functions described empirical measurements.

↑ comment by taw · 2009-08-26T22:54:00.026Z · LW(p) · GW(p)

Funny that you mention it, as I remember performing high school physics experiments with discharging a battery (not a capacitor), and due to heating there was very significant deviation from exponential decay - as battery discharges power, it heats up, and it changes it resistance. That's more 10% kind of error than order of magnitude kind of error. (with a capacitor heating will more likely occur inside the load than inside the capacitor, but you might get similar effect)

And of course properties near complete discharge will be very different, what should be very clear on a log-log plot.

comment by fnc · 2009-08-28T22:28:19.878Z · LW(p) · GW(p)

I don't see how they can even try to apply -any- curve to something that has feedbacks over time, like population or gdp. Technology is an obvious confounding variable there, with natural events coming into play as well.

comment by Shalmanese · 2009-08-28T16:08:32.869Z · LW(p) · GW(p)

"All models are wrong, some are useful" - George Box

Replies from: Furcas

↑ comment by Furcas · 2009-08-28T18:33:30.406Z · LW(p) · GW(p)

If model X is more useful than model Y, it's probably because model X is closer to the truth than model Y. "All models are wrong" only means that 100% certainty is impossible.

Replies from: Johnicholas

↑ comment by Johnicholas · 2009-08-28T21:03:38.107Z · LW(p) · GW(p)

"If model X is more useful than model Y, it's probably because model X is closer to the truth than model Y."

What if model X is tractable in some useful way? Box's emphasis on utility over correctness would be nigh-meaningless if they were the same thing.

Replies from: Furcas

↑ comment by Furcas · 2009-08-28T23:35:06.303Z · LW(p) · GW(p)

Sure. To use Eliezer's example, if we want to fire artillery shells, Newtonian mechanics is more useful than general relativity, because we're more interested in computational speed than in accuracy. But that's not the point that the people who say things like the quote above are usually trying to make. When I hear similar statements, it's from people who say they don't believe in a theory because it's true, but because it's useful for making predictions, as if the two concepts were completely disconnected!

That said, after googling George Box, he's certainly not one of those people. Wikiquote gives another quote from him, which I like better: "Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful."

comment by RolfAndreassen · 2009-08-27T17:01:30.218Z · LW(p) · GW(p)

Comparing a made-up exponential to a process that no scientist who knew anything about radioactivity would expect to model with anything but a sum of coupled exponentials is a bit of a straw man. There's a bias to simplicity, certainly, but there's not that much bias!

Replies from: taw

↑ comment by taw · 2009-08-28T14:36:34.956Z · LW(p) · GW(p)

I used radioactivity example because it was painfully (as in smacked in the face by truth) clear what the correct answer is. But people do use stupid models like simple exponential growth for things like population and economic growth all the time.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2009-08-29T19:25:17.628Z · LW(p) · GW(p)

Like certain singulatarian futurists.

Replies from: milindsmart

↑ comment by milindsmart · 2016-08-25T16:12:25.741Z · LW(p) · GW(p)

So someone has mentioned it on LW after all. Lots of singulatarian ideas depend heavily exponential growth.

comment by Psychohistorian · 2009-08-27T03:48:27.369Z · LW(p) · GW(p)

Obligatory topical XKCD link. Though it's linear, not exponential.

comment by talisman · 2009-08-29T03:00:46.321Z · LW(p) · GW(p)

I do not think your claim is what you think it is.

I think your claim is that some people mistake the model for the reality, the map for the territory. Of course models are simpler than reality! That's why they're called "models."

Physics seems to have gotten wiser about this. The Newtonians, and later the Copenhagenites, did fall quite hard for this trap (though the Newtonians can be forgiven to some degree!). More recently, however, the undisputed champion physical model, whose predictions hold to 987 digits of accuracy (not really), has the humble name "The Standard Model," and it's clear that no one thinks it's the ultimate true nature of reality.

Can you give specific examples of people making big mistakes from map/territory confusion? The closest thing I can think of offhand is the Stern Report, which tries to make economic calculations a century from now based on our current best climate+social+political+economic models.

comment by CronoDAS · 2009-08-27T07:29:37.338Z · LW(p) · GW(p)

Any continuous function is approximately linear over a small enough scale. ;)

Replies from: SforSingularity, Johnicholas, Vladimir_Nesov

↑ comment by SforSingularity · 2009-08-27T12:23:41.355Z · LW(p) · GW(p)

False. What you mean is "any differentiable function is approximately linear over a small enough scale".

See this

Replies from: Eliezer_Yudkowsky, ArthurB, CronoDAS

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-08-27T17:11:37.600Z · LW(p) · GW(p)

Heck, any linear function is approximately exponential over a small enough scale.

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-27T18:55:07.814Z · LW(p) · GW(p)

Do you mean "the exponential function is approximately linear over a small enough scale"?

Replies from: None

↑ comment by [deleted] · 2009-08-27T19:11:17.185Z · LW(p) · GW(p)

Both are true.

↑ comment by ArthurB · 2009-08-27T18:24:28.626Z · LW(p) · GW(p)

Question is, what do you mean "approximately".

If you mean, for any error size, the supremum of distance between the linear approximation and the function is lower than this error for all scales smaller than a given scale, then a necessary and sufficient condition is "continuous". Differentiable is merely sufficient.

When the function is differentiable, you can make claims on how fast the error decreases asymptotically with scale.

Replies from: Johnicholas

↑ comment by Johnicholas · 2009-08-27T21:44:26.660Z · LW(p) · GW(p)

And if you use the ArthurB definition of "approximately" (which is an excellent definition for many purposes), then a piecewise constant function would do just as well.

Replies from: ArthurB

↑ comment by ArthurB · 2009-08-27T22:05:57.121Z · LW(p) · GW(p)

Indeed.

But I may have gotten "scale" wrong here. If we scale the error at the same time as we scale the part we're looking at, then differentiability is necessary and sufficient. If we're concerned about approximating the function, on a smallish part, then continuous is what we're looking for.

↑ comment by CronoDAS · 2009-08-27T19:16:00.853Z · LW(p) · GW(p)

Indeed, you can't get a good linear approximation to that function...

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2009-08-27T19:18:55.771Z · LW(p) · GW(p)

Locally, even an elephant is approximately a tree trunk. Or a rope.

↑ comment by Johnicholas · 2009-08-27T10:29:09.292Z · LW(p) · GW(p)

Under the usual mathematical meanings of "continuous", "function" and so on, this is strictly false. See: http://en.wikipedia.org/wiki/Weierstrass_function

It might be true under some radically intuitionist interpretation (a family of philosophies I have a lot of sympathy with). For example, I believe Brouwer argued that all "functions" from "reals" to "reals" are "continuous", though he was using his own interpretation of the terms inside of quotes. However, such an interpretation should probably be explained rather than assumed. ;)

Replies from: ArthurB, CronoDAS, tut

↑ comment by ArthurB · 2009-08-27T18:33:11.827Z · LW(p) · GW(p)

No he's right. The Weierstrass function can be approximated with a piecewise linear function. It's obvious, pick N equally spaced points and join then linearly. For N big enough, you won't see the difference. It means that is is becoming infinitesimally small as N gets bigger.

Replies from: SforSingularity, SforSingularity

↑ comment by SforSingularity · 2009-08-27T18:49:22.263Z · LW(p) · GW(p)

you won't see the difference

that's because you can't "see" the The Weierstrass function in the first place, because our eyes cannot see functions that are everywhere (or almost everywhere) nondifferentiable. When you look at a picture of the The Weierstrass function on google image search, you are looking at a piecewise linear approaximation of it. Hence, if you compare what you see on google image search with a piecewise linear approaximation of it, they will look the same...

Replies from: None

↑ comment by [deleted] · 2009-08-28T16:14:30.954Z · LW(p) · GW(p)

I'm sort of annoyed by your insistence that the Weierstrass function cannot be approximated by piecewise linear functions when, after all, it is the limit of a series of piecewise linear functions.

RTFM.

↑ comment by SforSingularity · 2009-08-27T18:53:04.563Z · LW(p) · GW(p)

you won't see the difference

that is because our eyes cannot see nowhere differentiable functions, so a "picture" of the Weierstrass function is some piecewise linear function that is used as a human-readable symbol for it.

Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

Replies from: ArthurB

↑ comment by ArthurB · 2009-08-27T19:01:13.043Z · LW(p) · GW(p)

that is because our eyes cannot see nowhere differentiable functions

That is because they are approximated by piecewise linear functions.

Consider that when you look at a "picture" of the Weierstrass function and pick a point on it, you would swear to yourself that the curve happens to be "going up" at that point. Think about that for a second: the function isn't differentialble - it isn't "going" anywhere at that point!

It means on any point you can't make a linear approximation whose precision increases like the inverse of the scale, it doesn't mean you can't approximate.

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-27T20:38:27.419Z · LW(p) · GW(p)

taboo "approximate" and restate.

Replies from: ArthurB

↑ comment by ArthurB · 2009-08-27T21:04:49.937Z · LW(p) · GW(p)

I defined approximate in an other comment.

Approximate around x : for every epsilon > 0, there is a neighborhood of x over which the absolute difference between the approximation and the approximation function is always lower than epsilon.

Adding a slop to a small segment doesn't help or hurt the ability to make a local approximation, so continuous is both sufficient and necessary.

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-27T21:28:33.478Z · LW(p) · GW(p)

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function.

Furthermore, two nonidentical functions f and g cannot approximate each other. Just choose, for a given x, epsilon less than f(x) and g(x); then no matter how small your neighbourhood is, |f(x) - g(x)| > epsilon.

Replies from: ArthurB

↑ comment by ArthurB · 2009-08-27T21:45:04.270Z · LW(p) · GW(p)

ok, but with this definition of "approximate", a piecewise linear function with finitely many pieces cannot approximate the Weierstrass function.

The original question is whether a continuous function can be approximated by a linear function at a small enough scale. The answer is yes.

If you want the error to decrease linearly with scale, then continuous is not sufficient of course.

Replies from: SforSingularity

↑ comment by SforSingularity · 2009-08-27T22:13:00.597Z · LW(p) · GW(p)

The answer is yes.

I think we have just established that the answer is no... for the definition of "approximate" that you gave...

Replies from: ArthurB

↑ comment by ArthurB · 2009-08-27T23:20:12.239Z · LW(p) · GW(p)

Hum no you haven't. The approximation depends on the scale of course.

↑ comment by CronoDAS · 2009-08-27T19:24:18.590Z · LW(p) · GW(p)

Yeah, you're right. I think I needed to say any analytic function, or something like that.

↑ comment by tut · 2009-08-27T11:21:04.092Z · LW(p) · GW(p)

Mathematically he should have said "any C1 function". But if you are measuring with a tolerance level that allows a step function to be called exponential, then we can probably say that any continuous function is analytic too.

↑ comment by Vladimir_Nesov · 2009-08-27T09:38:00.626Z · LW(p) · GW(p)

Which is the origin of many physical laws, since Nature usually doesn't care about scale at which nonlinear effects kick in, leaving huge areas of applicability for the laws based on linear approximation.

comment by [deleted] · 2009-08-27T01:21:07.971Z · LW(p) · GW(p)

My computer is biased toward not running at 100 petahertz and having 70 petabytes of RAM. My brain is biased toward not using so many complicated models that it needs 1 trillion neurons each with 1 million connections and firing up to 10,000 times per second.

And now for something perhaps more useful than sarcasm, it seems to me that people tend to simply come up with the consistent model that is either the easiest one to compute or the simplest one to describe. Are heuristics for inconsistency, such as "exponential growth/decay rarely occurs in nature", quickly spread and often used? How about better approximations such as logistical growth?

Replies from: cousin_it

↑ comment by cousin_it · 2010-08-23T22:46:50.791Z · LW(p) · GW(p)

Hahaha, anthropic Occam's Razor! If a science allows simple theories that can fit in our tiny brains, we call it a good science and observe with satisfaction that it "obeys Occam". If a science doesn't allow simple theories, we call it a bad science and go off to play somewhere else!

Come to think of it, physics seems to be the only science where Occam's Razor actually works. Even math is a counterexample: there's no law of nature saying simple theorems should have short proofs, and easy-to-formulate statements like 4-color or Fermat's last can cause huge explosions of complexity when you try to prove them.

Replies from: None

↑ comment by [deleted] · 2010-09-01T13:14:58.082Z · LW(p) · GW(p)

Occam's razor still applies. If we're looking for the most elegant possible proof of a theorem (whatever that means), any sufficiently short proof is much more likely to be it than any sufficiently long proof. If you want to take a completely wild guess about what statement an unknown theorem proves, you're better off guessing short statements than long ones.

Replies from: cousin_it, cousin_it

↑ comment by cousin_it · 2010-09-01T13:39:13.159Z · LW(p) · GW(p)

If we're looking for the most elegant possible proof of a theorem (whatever that means), any sufficiently short proof is much more likely to be it than any sufficiently long proof.

Could you try to make that statement more precise? Because I don't believe it.

If you take the shortest possible proofs to all provable theorems of length less than N, both the maximum and the average length of those proofs will be extremely (uncomputably) fast-growing functions of N. To see that, imagine Gödel-like self-referential theorems that say "I'm not provable in less than 3^^^^3 steps" or somesuch. They're all true (because otherwise the axiom system would prove a false statement), short and easy to formulate, trivially seen to be provable by finite enumeration, but not elegantly provable because they're true.

Another way to reach the same conclusion: if "expected length of shortest proof" were bounded from above by some computable f(N) where N is theorem length in bits, we could write a simple algorithm that determines whether a theorem is provable: check all possible proofs up to length f(N)*2^N. if the search succeeds, say "yes". If the search fails, the shortest proof (if it exists) must be longer than f(N)*2^N, which is impossible because that would make the average greater than f(N). Therefore no shortest proof exists, therefore no proof exists at all, so say "no". But we know that provability cannot be decidable by an algorithm, so f(N) must grow uncomputably fast.

↑ comment by cousin_it · 2010-09-01T13:32:51.389Z · LW(p) · GW(p)

If we're looking for the most elegant possible proof of a theorem (whatever that means), any sufficiently short proof is much more likely to be it than any sufficiently long proof.

Could you give a precise meaning to that statement? I can't think of any possible meaning except "if a proof exists, it has finite length", which is trivial. Are short proofs really more likely? Why?

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-01T13:34:50.490Z · LW(p) · GW(p)

More emphasis on the most elegant possible.

Replies from: cousin_it

↑ comment by cousin_it · 2010-09-01T13:48:34.318Z · LW(p) · GW(p)

Sorry for deleting my comment, I got frustrated and rewrote it. See my other reply to grandparent.

Replies from: wedrifid

↑ comment by wedrifid · 2010-09-01T14:27:28.197Z · LW(p) · GW(p)

Could you try to make that statement more precise? Because I don't believe it.

I don't believe it either, by the way.

comment by Psychohistorian · 2009-08-27T05:44:58.144Z · LW(p) · GW(p)

I recall a discussion I had with a fellow econ student on the effects of higher taxes. He said, "Higher taxes are inefficient; should I draw the graph." (Unfortunately the topic changed before I could dissect this for him.)

This (rather common) view reflects two major problems with modeling (particularly in economics): an amoral value (economic efficiency) becomes a normative value because it's relatively easy to understand and (in theory) measure, and, more relevant as an example for this post, the model is seen as demonstrating reality, and not vice versa. The model thus becomes a complete way of looking at the world, as it is both normative and the world is supposed to conform to it.

comment by MendelSchmiedekamp · 2009-08-26T20:40:12.665Z · LW(p) · GW(p)

Generally (and therefore somewhat inaccurately) speaking, one way that our brains seem to handle the sheer complexity computing in the real world us is a tendency to simplify the information we gather.

In many cases these sorts of extremely simple models didn't start that way. They may have started with more parameters and complexity. But as they were repeated, explained and applied the model becomes, in effect, simpler. The example begins to represent the entire model, rather than serving to show only a piece of it.

Technically the exponential radioactive decay model for radioactivity of a mixture has most of the pieces you describe fairly directly. But this hardly means they will be appropriately applied, that they will be available when we are thinking of how to use the model. We need to fight the simplification effect to be able to make our models more nuanced and detailed - even though they are still almost certainly lossy compression of the facts, observations, and phenomena they were built from.

On the other hand, the simplification serves its purpose too, if we could devote unlimited cognitive resources to a model, then we risk not being unable to actually reach a decision from the model.

Replies from: fburnaby, taw

↑ comment by fburnaby · 2009-08-27T01:28:06.275Z · LW(p) · GW(p)

So pretty much, this: http://en.wikipedia.org/wiki/Medawar_zone

Replies from: MendelSchmiedekamp

↑ comment by MendelSchmiedekamp · 2009-08-27T17:44:42.622Z · LW(p) · GW(p)

No. The Medawar zone is more about scientific discoveries as marketable products to the scientific community, not the cultural and cognitive pressures of those communities which affect how those products are used as they become adopted.

Different phenomena, although there are almost certainly common causes.

↑ comment by taw · 2009-08-26T21:16:51.322Z · LW(p) · GW(p)

If errors were a few percent randomly up or down it wouldn't matter, but the inaccuracy is not tiny, over long timescales it's many orders of magnitude, and almost always in the same direction - growth/decay are slower over long term than exponential models predicts.

Replies from: MendelSchmiedekamp

↑ comment by MendelSchmiedekamp · 2009-08-27T04:44:28.840Z · LW(p) · GW(p)

Oh yes, but it's not just a prediliction for simple models in the first place, but also a tendency to culturally and cognitively simplify the model we access to use - even if the original model had extensions to handle this case and even to the tune of orders of magnitude of error.

Of course sometimes it may be worth computing an estimate that is (unknown to you) orders of magnitude off, in a very short amount of time. Certainly if the impact of the estimate is delayed and subtle less conscious trade-offs may factor in between cognitive effort to access and use a more detailed model and the consequences of error. Yet another form of akrasia.