# How valuable is it to learn math deeply?

post by JonahS (JonahSinick) · 2013-09-02T18:01:13.433Z · score: 23 (23 votes) · LW · GW · Legacy · 79 comments## Contents

My experience learning math deeply How generalizable is this? None 79 comments

I've been wondering how useful it is for the typical academically strong high schooler to learn math deeply. Here by "learn deeply" I mean "understanding the concepts and their interrelations" as opposed to learning narrow technical procedures exclusively.

## My experience learning math deeply

When I started high school, I wasn't interested in math and I wasn't good at my math coursework. I even got a D in high school geometry, and had to repeat a semester of math.

I subsequently became interested in chemistry, and I thought that I might become a chemist, and so figured that I should learn math better. During my junior year of high school, I supplemented the classes that I was taking by studying calculus on my own, and auditing a course on analytic geometry. I also took physics concurrently.

Through my studies, I started seeing the same concepts over and over again in different contexts, and I became versatile with them, capable of fluently applying them in conjunction with one another. This awakened a new sense of awareness in me, of the type that Bill Thurston described in his essay Mathematics Education:

Mathematics is like a flight of fancy, but one in which the fanciful turns out to be real and to have been present all along. Doing mathematics has the feel of fanciful invention, but it is really a process of sharpening our perception so that we discover patterns that are everywhere around.

I understood the physical world, the human world, and myself in a way that I had never before. Reality seemed full of limitless possibilities. Those months were the happiest of my life to date.

More prosaically, my academic performance improved a lot, and I found it much easier to understand technical content (physics, economics, statistics etc.) ever after.

So in my own case, learning math deeply had very high returns.

## How generalizable is this?

I have an intuition that many other people would benefit a great deal from learning math deeply, but I know that I'm unusual, and I'm aware of the human tendency to implicitly assume that others are similar to us. So I would like to test my beliefs by soliciting feedback from others.

Some ways in which learning math deeply can help are:

**Reduced need for memorization (while learning math).**When you understand math deeply, you see how many different mathematical problems are special cases of a single more general problem, so that in order to remember how to do all of the problems, it suffices to remember the solution to that more general problem. This reduces the cognitive load of doing math relative to what it would be if one was considering each individual problem in isolation. When I taught calculus to freshmen at University of Illinois, I got the impression that many of the students studied for tests by trying to memorize all of the homework problems individually. There were too many homework problems to memorize, so this didn't work very well. Had they learned the material on a deep level, they wouldn't have had this problem.**Ability to apply knowledge in novel contexts (that require mathematical reasoning).**When you understand general mathematical principles, you can apply mathematical knowledge to tackle mathematical problems that you've never seen before. This contrasts with mathematical knowledge that's restricted to knowledge of how to solve specified problems.**Higher retention of (mathematical) material.**Cognitive psychologists have found that students retain information better when they engage in "deep level processing" rather than "shallow level processing" (see the notes on Video 2 of Stephen Chew's "How to Get the Most Out of Studying" video series). Developing deep understanding of math reduces need to review mathematical material when one needs to know it for future units and courses (whether within math or adjacent to math). This cuts down on the amount of study time necessary to master later material.**Developing better general reasoning skills (across domains).**Learning math deeply is closely connected with developing mathematical reasoning skills. Distilling general principles from special cases involves abstract reasoning. In the other direction, when you understand general principles, it makes mathematical reasoning feel a lot less cumbersome, which incentivizes one to do more of it (relative to the counterfactual). Mathematical reasoning ability may be transferable to reasoning ability in other contexts, so that learning math deeply builds general reasoning skills.

Some arguments against learning math deeply being useful are:

**It may be too hard.**Sometimes when I suggest that learning math deeply is helpful, people respond by saying that most people aren't capable of learning abstract concepts with enough ease so that it makes sense for them to try to learn math deeply rather than just memorizing how to do specific problems. This is an ill-defined claim, but it can be made precise by specifying a population and a given level of mathematical abstraction.**The span of the payoff may be too short.**For people who won't go on to take many math courses, the benefits of reduced future study time and higher retention might not be worth the upfront investment of learning math deeply.**Mathematical reasoning may not be very transferable.**A counterpoint to the "developing better reasoning skills" point above: it's known that transfer of learning from one domain to another is often very low. So learning mathematical reasoning skills may not be an efficient way of developing reasoning skills that can be used in the context of one's career or personal life.

I'd be grateful to anyone who's able to expand on these three considerations, or who offers additional considerations against the utility of learning math deeply. I would also be interested in any anecdotal evidence about benefits (or lack thereof) that readers have received from learning math deeply.

## 79 comments

Comments sorted by top scores.

Reason to learn math deeply: by forcing you to master alternating quantifiers, it expands your ability to understand and handle complex arguments.

This falls, possibly, under your "developing better general reasoning skills", but I would stress it separately, because I think it's an especially transferrable skill that you get from learning rigorous math. Humans find chains of alternating quantifiers (statements like "for every x, there exists y, such that for every z...") very difficult to process. Even at length 2, people without training often confuse the meanings of forall-exist and exist-forall. To get anywhere in rigorous math, a student needs to confidently handle chains of length 4-5 without confusion or undue mental strain. This is drilled into the student during the first 1-2 years of undergraduate rigorous math, starting most notably with the epsilon-delta formalism in analysis. The reason this formalism is notoriously difficult for many students to master is precisely that it trains and drills larger chains of quantifiers than the students have hithertoo been exposed to.

(other math-y subjects have their own analogues; for example, I think the chief reason the pumping lemma is taught in CS education is to force the same sort of training)

The benefit you get is not merely the ability to easily handle statements of this sort in other, non-math fields. It seems to be (personal speculation, no hard data) more general: your ability to understand complex descriptions and arguments with multiple "points of view" improves. Understanding chains of quantifiers demands calmly keeping track of several interrelated facts in your short "reasoning memory", so to speak. With this ability improved, it's easier to keep track of complex interrelated he said-she said-they said scenarios, what-if-x-then-y-then-z-but-not-w scenarios, and so on.

Reasons not to learn math deeply, assuming you're not a research mathematician or going to become one:

It's effectively bottomless (unless you actually become a mathematician, and even then in many subfields you remain mostly ignorant of). That is, you'll never say "OK,

*now*I understand the deep principles of why math is the way it is". Rather, you'll always be aware there's fascinating depth beyond what you already know. This can be frustrating.If you don't use it, you'll forget it in a few years. Maybe you'll remember the basic definitions, but you'll likely forget the deep results, and definitely their proofs. If you were emotionally invested in their beauty and thought them a valuable part of your mindscape, this will frustrate and dismay you.

Unlike with almost any other scientific discipline, you likely won't be able to construct handwavy cool laymen-accessible metaphors of the deep stuff you found so fascinating. You can chat with outsiders about neutrons, chemical solutions, DNA or phonemes, but not about L-functions.

I do agree with the part about the quantifiers. This is, at least in theory, one of the reasons that we are supposed to teach the epsilon-delta definition of limit in college calculus courses. I generally try to frame it as a game between the prover and the skeptic, see for instance the description here. One of the main difficulties that students have with the definition is staying clear of whose strategic interest lies in what, for instance, who should be the one picking the epsilon, and who should be the one picking the delta (the misconceptions on the same page highlight common mistakes that students make in this regard).

Incidentally, this closely connects with the idea of steelmanning: in a limit proof or other mathematical proof showing that a definition involving quantifiers is satisfied, one needs to demonstrate that for all the moves of one's opponent, one has a winning strategy to respond to the *best* move the opponent could possibly make.

The first time I taught epsilon-delta definition in a (non-honors) calculus class at the University of Chicago, even though I did use the game setup, almost nobody understood it. I've had considerably more success in future years, and it seems like students get something like 30-50% of the underlying logic on average (I'm judging based on their performance on hard conceptual multiple choice questions based on the definition).

Couldn't you develop the same skill more efficiently by just studying formal logic?

Probably? But the number of people who study formal logic to the required degree is dwarfed by the number of people who need this skill.

Also, mathematical logic, studied properly, is *hard*. It forces you to conceptualize a clean-cut break between syntax and semantics, and then to learn to handle them separately and jointly. That's a skill many *mathematicians* don't have (to be fair, not because they couldn't acquire it, they absolutely could, but because they never found it useful).

I have a personal story. Growing up I was a math whiz, I loved popular math books, and in particular logical puzzles of all kinds. I learned about Godel's incompleteness from Smullyan's books of logical riddles, for example. I was also fascinated by popular accounts of set theory and incompleteness of the Continuum Hypothesis. In my first year at college, I figured it was time to learn this stuff rigorously. So, independent of any courses, I just went to the math library and checked out the book by Paul Cohen where he sets out his proof of CH incompleteness from scratch, including first-order logic and axiomatic set theory from first principles.

I failed *hard*. It felt so weird. I just couldn't get through. Cohen begins with setting up rigorous definitions of what logical formulas and sentences are, I remember he used the term "w.f.f.-s" (well-formed formulas), which are defined by structural induction and so on. I could understand every word, but it was as if my mind went into overload after a few paragraphs. I couldn't process all these things together and understand what they *mean*.

Roll forward maybe a year or 1.5 years, I don't remember. I'm past standard courses in linear algebra, analysis, abstract algebra, a few more math-oriented CS courses (my major was CS). I have a course in logic coming up. Out of curiosity, I pick up the same book in the library and I am *blown away* - I can't understand what it was that stopped me before. Things just make sense; I read a chapter or two leisurely until it gets hard again, but different kind of hard, deep inside set theory.

After that, whenever I opened a math textbook and saw in the preface something like "we assume hardly any prior knowledge at all, and our Chapter 0 recaps the very basics from scratch, but you will need some mathematical maturity to read this", I understood what they meant. Mathematical maturity - that thing I didn't have when I tried to read a math logic book that ostensibly developed everything from scratch.

I think this notion of "mathematical maturity" is hard to grasp for a beginning student.

I had a very similar experience. Introduction to (the Russian edition of) Fomenko & Fuchs "Homotopic topology" said that "later chapters require higher level of mathematical culture". I thought that this was just a weasel-y way to say "they are not self-contained", and disliked this way of putting it as deceptive. Now, a few years later I know fairly well what they meant (although, alas, I still have not read those "later chapters").

I wonder if there is a way to explain this phenomenon to those who have not experienced it themselves.

**[deleted]**· 2013-09-24T12:28:19.811Z · score: 0 (0 votes) · LW(p) · GW(p)

Fomenko

Interesting off-topic fact about Fomenko -- I'd read his book on symplectic geometry, and then discovered he's a massive crackpot). That was a depressing day.

He is a massive crackpot in "pseudohistory", but he is also a decent mathematician. His book in symplectic geometry is probably fine, so unless you are generally depressed by the fact that mathematicians can be crackpots in other fields, I don't think you should be too depressed.

**[deleted]**· 2013-09-25T12:26:10.583Z · score: 3 (3 votes) · LW(p) · GW(p)

so unless you are generally depressed by the fact that mathematicians can be crackpots in other fields,

Yes.

Your point 1 resonates with me. Learning math has steadily increased my effectiveness as a scientist/engineer/programmer. Sometimes just knowing a mathematical concept exists and roughly what it does is enough to give you an edge in solving a problem - you can look up how to do it in detail when you need it. However, despite the fact that life continues to demonstrate to me the utility of knowing the math that I've learned, this has failed to translate into an impulse within me to actively learn more math. Pretty much at any time in the past I've felt like I knew "enough" math, and yet always see a great benefit when I learn more. You'd think this would sink in, you'd think I would start learning math for its own sake with the implicit expectation that it will very probably come in handy, but it hasn't.

Thanks for the thoughtful and insightful comment. I really appreciate it :)

Random thoughts:

The decision that smart high school students should take calculus rather than statistics (in the U.S.) strikes me as pretty seriously misguided. Statistics has broader uses.

I got through four semesters of engineering calculus; that was the clear limit of my abilities without engaging in the troublesome activity of "trying." I use virtually no calculus now, and would be fine if I forgot it all (and I'm nearly there). I think it gave me no or almost no advantages. One readthrough of Scarne on Gambling (as a 12-year-old) gave me more benefit than the entirety of my calculus education.

I ended up as the mathiest guy around in a non-math job. But it's really my facility with numbers that makes it; my wife (who has a master's degree in math) says what I am doing is arithmetic and not math, but very fast and accurate arithmetic skills strike me as very handy. (As a prosecutor, my facility with numbers comes as a surprise to expert witnesses. Sometimes, they are sad afterward.)

Anecdotally, math education may make people crazy or attract crazy people disproportionately. I think that pursuit of any topic aligns your brain to think in a way conducive to that topic.

My tentative conclusions are that advanced statistics has uses in understanding the world; other serious math is fun but probably not optimal use of time, unless it's really fun. "Really fun," has value. This conclusion is based on general observation, and is hardly scientific; I may well be wrong.

I agree that basic probability and statistics is more practically useful than basic calculus, and should be taught at the high-school level or even earlier. Probability is fun and could usefully be introduced to elementary-school children, IMO.

However, more advanced probability and stats stuff often requires calculus. I have a BS in math and many years of experience in software development (IOW, not much math since college). I am in a graduate program in computational biology, which involves more advanced statistical methods than I'd been exposed to before, including practical Bayesian techniques. Calculus is used quite a lot, even in the definition of basic probabilistic concepts such as expectation of a random variable. Anything involving continuous probability distributions is going to be a lot more straightforward if approached from a calculus perspective. I, too, had four semesters of calculus as an undergrad and had forgotten most of it, but I found it necessary to refresh intensely in order to do well.

"Computational biology," sounds really cool. Or made up. But I'm betting heavily on "really cool." (Reads Wikipedia entry.) Outstanding!

Anyway, I concede that you are right that calculus has uses in advanced statistics. Calculus *does* make some problems easier; I'd like calculus to be used as a fuel for statistics rather than almost pure signaling. I actually know people who ended up having real uses for some calculus, and I've tried to stay fluent in high school calculus partly for its rare use and partly for the small satisfaction of not losing the skill. And probably partly for reasons my brain has declined to inform me of.

I nonetheless generally stand by my statement that we're wasting one hell of a lot of time teaching way too much calculus. So we basically agree on all of this; I appreciate your points.

Calculus has value for signalling intelligence to colleges. I'm told that for professions (e.g. economists) that do use calculus, real analysis plays more-or-less the same role- a rarely used signal of intelligence.

It seems to me that making it mandatory for everyone to learn math beyond percents and simple fractions is even less useful than the old approach of making ancient Greek and Latin mandatory.

When I first read your comment, I thought, "that's not obvious to me". Then a few seconds later I realized: less useful *given the opportunity cost of not learning the best possible alternatives*. And while math is useful (so are Greek and Latin), there are much better alternatives for mandatory high-school education, basic programming for one.

Some lessons that I've learned from attempting to solve hard and tricky math problems, which I've found can be applied to problem-solving in general: (a) Focus hard and listen to confusions; (b) Your tendency to give up occurs much before the point at which you should give up; (c) Don't get stuck on one approach, keep trying many different approaches and ideas; (d) Find simpler versions of your problem; (e) Don't beat yourself up over stupid mistakes; (f) Don't be embarrassed to get help.

But of course I don't mean to say that learning math is the only way or the best way to learn these techniques.

I agree that math can teach all these lessons. It's best if math is taught in a way that encourages effort and persistence.

One problem with putting *too* much time into learning math deeply is that math is much more precise than most things in life. When you're good at math, with work you can usually become completely clear about what a question is asking and when you've got the right answer. In the rest of life this isn't true.

So, I've found that many mathematicians avoid thinking hard about ordinary life: the questions are imprecise and the answers may not be right. To them, mathematics serves as a *refuge* from real life.

I became very aware of this when I tried getting mathematicians interested in the Azimuth Project. They are often sympathetic but feel unable to handle the problems involved.

So, I'd say math should be done in conjunction with other 'vaguer' activities.

So, I've found that many mathematicians avoid thinking hard about ordinary life: the questions are imprecise and the answers may not be right. To them, mathematics serves as a refuge from real life.

Have you noticed any difference between pure mathematicians and theoretical physicists in this regard?

**[deleted]**· 2013-09-03T18:06:33.525Z · score: 2 (2 votes) · LW(p) · GW(p)

Thanks for pointing me toward the Azimuth Project.

I used to follow your "this week" blog for a while, but I must have lost track of it a few years ago. Must have been before this showed up on the radar.

Yes, but it is genuinely the case that imprecision and low quality of answers indicate lower utility of an activity, or lower gains due to mathematical skill. Furthermore, what you are saying contradicts existence of mathematicians who did contribute to philosophy (e.g. Godel). edit: I mostly meant, the stories of such - it seems to me that mathematicians who come up with important insights not so rarely try to apply them.

what you are saying contradicts existence of mathematicians who did contribute to philosophy

It doesn't; "many mathematicians avoid..." doesn't imply that all do.

Well, not existence per se, that was a very poor wording on my part, but specific circumstances of their contribution. I think that whenever a mathematician has relevant novel insights, they not so rarely apply it to various relevant problems including 'fuzzy' ones. Or, when they don't, applied mathematicians do.

It's just that novel mathematical concepts are very difficult to generate in general and even more difficult to generate starting from some broad problem statement.

I wanted to thank you for this. I read this post a few weeks ago, and while it was probably a matter of like two minutes for you to type it up, it was extremely valuable to me.

Specifically a paraphrase of point B, "The point where you feel like you should give up is way before the point at which you should ACTUALLY give up" has become my new mantra in learning maths, and since I do math tutoring when the work's there, I'm passing this message on to my students as well.

So, thank you very much for this advice.

A counterpoint to the "developing better reasoning skills" point above: it's known that transfer of learning from one domain to another is often very low.

In my anecdotal experience, math is the most transferable of all skills I've learnt.

Add physics to that.

**[deleted]**· 2013-09-20T09:19:20.282Z · score: 1 (1 votes) · LW(p) · GW(p)

One of the barriers I run into when I delve into physics is that I have a very rationalist approach to math. I hate terminology and I want as little of it as possible in my reasoning. Physics has rather high barriers in that way in that academic physicists don't really like mathematical rigour, and don't precisely specify, say, the abstract algebraic axioms of the structures they are using. But when I get to a point of being able to specify what structure is behind a physical theory, I can usually intuit it readily.

Physics is domain knowledge compared to mathematical reasoning ability.

**[deleted]**· 2013-09-21T13:57:19.721Z · score: 2 (2 votes) · LW(p) · GW(p)

I have a very rationalist approach to math.

What does this mean? You only attack problems with high VoI?

**[deleted]**· 2013-09-22T20:52:53.250Z · score: -2 (2 votes) · LW(p) · GW(p)

I have a bad habit of stating things and then explaining them. I meant it is rationalist in that I:

- Hate terminology. Give me axioms, definitions and theorems; then we can discuss them in words later.
- Build up my intuitions, and especially weed out the useless ones. I don't really do proofs if it is not necessary, and sometimes I even skimp on the formal details; using my connectionist intelligence to it's full potential.
- I try to explore as much as possible, and look for people to learn from. Proving things is a question of strategy, and many a Nobel laureate has had mentors who were Nobel laureates too.

**[deleted]**· 2013-09-22T21:25:34.784Z · score: 2 (2 votes) · LW(p) · GW(p)

How are those things particularly rationalist? Sounds to me you're just using the word in some inflationary sense.

**[deleted]**· 2013-09-23T09:50:30.213Z · score: -2 (2 votes) · LW(p) · GW(p)

The Humans Guide to Words sequence and the concept of "words should refer to something" pertains to the first item.

The Quantum Mechanics sequence and the concept of "It all adds up to normality" pertains to the second item.

The third is based on an inversion of the idea behind the Sequences in general, that I need giants to stand on the shoulders of, and I forget exactly where it says that the most valuable skills in maths are non-verbal.

These three points I have on reflexive gedankenexperiment and discourse with more experienced CS and mathematics students, attempted to disprove and I have found that this is difficult, long winded and that the counterarguments are weak.

I also recognize that maths have tremendous instrumental value in the work I plan to do in the future.

All of this is basic bayesian skills, and I have met several people, CS, maths and physics students who were doing things adverse to understanding maths, which could be fixed by implementing any of the above strategies.

**[deleted]**· 2013-09-23T16:43:44.658Z · score: 3 (3 votes) · LW(p) · GW(p)

The word "rational" does not mean "was discussed in the Sequences" and certainly doesn't mean "was analogous to something that was discussed in the Sequences".

I relish the irony of your belief that "words should refer to something" when you readily inflate the meaning of "rational" and "bayesian".

These three points I have on reflexive gedankenexperiment and discourse with more experienced CS and mathematics students, attempted to disprove and I have found that this is difficult, long winded and that the counterarguments are weak.

This indicates to me that you've assumed I'm criticizing the substance of your advice. This is a false assumption.

**[deleted]**· 2013-10-10T12:09:25.869Z · score: -2 (2 votes) · LW(p) · GW(p)

Great. Now you have really confused me.

Do we agree that you can implement more or less winning strategies as a member of the species of homo-sapiens, congruent with the utility-concept of 'making the world a better place' , and that there is an absolute ranking criterion on how good said strategies are?

Do we agree that a very common failure mode of homo sapiens is statistical biases in their bayesian cognition, and that these biases have clear causal origin in our evolutionary history?

Do we agree that said biases hamper homo sapiens' ability to implement winning strategies in the general case?

Do we agree that the writings of Eliezer Yudkowsky and the content of this site as a whole describe ways to partially get around these built in flaws of homo sapiens?

I am fairly confident that a close reading of my comments will find the interprentation of 'rational' to be synonymous with 'winning-strategy-implementation', and 'bayesian' to be synonymous with (in the case that it refers to a person) 'lesswrong-site-member/sequence-implementor/bayes-conspiracist' or (in the case that it refers to cognitive architectures) 'bayesian inference' and I am tempted to edit them as such.

**[deleted]**· 2013-10-11T13:01:22.643Z · score: 1 (1 votes) · LW(p) · GW(p)

I am nonplussed at your attempt to lull readers into agreeing with you by asking a lot of rhetorical questions. It'd have been less wrong to post just the last paragraph:

I am fairly confident that a close reading of my comments will find the interprentation [sic] of 'rational' to be synonymous with 'winning-strategy-implementation', and 'bayesian' to be synonymous with (in the case that it refers to a person) 'lesswrong-site-member/sequence-implementor/bayes-conspiracist' or (in the case that it refers to cognitive architectures) 'bayesian inference' and I am tempted to edit them as such.

The missing link in the argument here is how your examples are, in fact, winning strategies. You claimed some superficial resemblance to things in the sequences, and that you did better than some small sample of humans.

I disapprove of this expanded definition of "bayesian" on the basis that it conflates honest mathematics with handwaving and specious analogies. For example, "it all adds up to normality" is merely a paraphrase of the correspondence principle in QM and does not have any particular legislative force outside that domain.

**[deleted]**· 2013-10-13T00:48:15.705Z · score: 0 (0 votes) · LW(p) · GW(p)

I'll concede the point, partially because I tire of this discourse.

If mathematical details matter, they should be specified (or be clear anyway - e.g. you don't define "real numbers" in a physics paper). Physics can need some domain knowledge, but knowledge alone is completely useless - you need the same general reasoning ability as in mathematics to do anything (both for experimental and theoretical physics).

In fact, many physics problems get solved by reducing them to mathematical problems (that is the physics part) and then solving those mathematical problems (still considered as "solving the physical problem", but purely mathematics)

Logic even more so.

I'd start with an anecdote from the local practice here, with regards to learning math shallowly vs with an understanding from the grounds up:

It is fairly common to derive supposed ultra low prevalences of geniuses in populations with lower mean IQs.

For example, an IQ of 160 or more is 5 SDs from 85 , but 4SDs from the 100 , so the rarity is 1/3,483,046 vs 1/31,560 , for a huge ratio of 110 times the prevalence of genius in the population with the mean IQ of 100.

This is not how it works; the higher means are a result of decreased prevalence of negative contributors - iodine deficiency, perhaps some alleles, etc. For a very extreme example, suppose that you have a population which is like US baseline, but with 50% prevalence of iodine deficiency. The mean IQ could well be 85 , but the ratio at high IQs will still be 2 rather than increase exponentially with the deviation. Of course in practice, it won't be as clear cut as this, the example is just to illustrate the point.

Figuring things like this out is not so much helped by knowing the concepts as by training and actual practice, and of course, by being trained to know where things like Gaussian distribution come from, not merely declaratively but procedurally as well. (I'm mostly speaking from the perspective of applied mathematics here).

I heard somewhere that IQ scores are normally distributed by definition, because they are calculated by projecting the measured rank onto the normal distribution with mean 100 and stddev 15. Can't seem to find a reference on Wikipedia though, so maybe that's not true.

IQ distributions are *calibrated* based on a reference sample, such that the reference sample has mean 100 and std 15 and follows a normal distribution. I believe the reference sample is generally British nationals or European Americans, so that interracial comparisons are sensible.

That doesn't mean that the distribution of all test-takers follows a normal distribution with mean 100 and std 15.

Precisely. If you are looking at some third world nation, well, there's all those kids who have various nutritional deficiencies, their IQs are impaired. The mean is lowered considerably, but that's through introduction of extra variables into the (approximate) sum.

If you don't take that into account and assume that only the mean in the distribution has changed, you get entirely invalid results at the high range due to how rapidly the normal distribution falls off far from the mean (as exponent of a square). For example if you were to calculate number of some rare geniuses out of the reference population (say, 300 millions with mean of 100 and standard deviation of 15), and from the world population assuming some lower mean and same standard deviation, for sufficiently rare "genius" you'll get a smaller number of geniuses in the whole world than in that one reference population (which is ridiculous).

edit: which you can see by noting that this with c smaller than b grows as x grows (i.e. ratio of prevalences between two populations grows with distance from the mean).

The example I'd give here is India, where you have lots of mostly distinct ethnic groups, and so it's reasonable to expect that the true distribution is a mixture of Gaussians. Knowing the Indian average national IQ would totally mislead you on the number of Parsis with IQs of 120 or above, if all you knew about Parsis was that they lived in India.

(It's not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it's probably the case that many different levels of severity of malnourishment are roughly equally well represented.)

the true distribution is a mixture of Gaussians

In the limit, the mixture of Gaussians is a Gaussian.

It's not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance

Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution. And yes, with a lower mean and higher variance.

In the limit, the mixture of Gaussians is a Gaussian.

Nope. The *sum* of Gaussian random variables is a Gaussian random variable, but a mixture Gaussian model is a very different thing. (In particular, mixture Gaussians are useful for modeling because their components are easy to deal with, but if you have infinite mixtures you can faithfully represent an arbitrary distribution.)

Theoretically, malnourishment (given that only a part of the population suffers from it) should lead to a negatively skewed distribution.

Yep, I should have mentioned that also.

(It's not clear to me that malnourishment leads to multiple modes, rather than just decreasing the mean while probably increasing the variance, because I think damage due to malnourishment is linear, and it's probably the case that many different levels of severity of malnourishment are roughly equally well represented.)

Not everyone's malnourished, though - a significant number of people are into diminishing returns, nutrition wise. It's very nonlinear in the sense that as long as there's adequate nutrition, it plate-outs - access to more nutrition does not improve anything.

Sorry, is your claim that IQ does not follow a normal distribution in the general population?

Sorry, is your claim that IQ does not follow a normal distribution in the general population?

It seems likely to me that this is actually the case, especially when you look at the tails, which is what he was discussing. The existence of things like Down's syndrome means that the lower part of the tail certainly doesn't look like you would expect from a solely additive model, and that might also be true at the upper end of the distribution.

(It's also much more likely to be the case if you want to use some other measure of intelligence which is scaled to be linear in predictive ability for some task, rather than designed to be a normal distribution.)

This should be straightforwardly testable by standard statistics.

Given the empirical distribution of IQ scores and given the estimated measurement error (which depends on the score -- scores in the tails are much less accurate) one should be able to come up with a probability that the empirical distribution was drawn from a particular normal.

Although I don't know if I'd want to include cases with clear brain damage (e.g. Downs) into the population for this purpose.

This should be straightforwardly testable by standard statistics.

Agreed.

Given the empirical distribution of IQ scores

If you have a source for one of these, I would love to see it. I haven't been able to find any, but I also haven't put on my "I'm affiliated with a research university" hat and emailed people asking for their data, so it might be available.

estimated measurement error (which depends on the score -- scores in the tails are much less accurate)

Agreed that this should be the case, but it's not clear to me how to estimate measurement error besides test-retest variability, which can be corrupted by learning effects unless you wait a significant time between tests. I think Project Talent only tested its subjects once, but unless you have something of that size which tests people during adulthood several times you're unlikely to get sufficient data to have a good estimate here.

This should be straightforwardly testable by standard statistics

Agreed.

That may require prohibitively large sample sizes, i.e. not be testable.

With regards to measuring g, and high IQs, you need to keep in mind regression towards the mean, which becomes fairly huge at the high range, even for fairly strongly correlated variables.

Other more subtle issue is that proxies generally fare even worse far from the mean than you'd expect from regression alone. I.e. if you use grip strength as a proxy for how quick someone runs a mile, that'll obviously work great for your average person, but at the very high range - professional athletes - you could obtain negative correlation because athletes with super strong grip - weightlifters maybe? - aren't very good runners, and very good runners do not have extreme grip strength. It's not very surprising that folks like Chris Langan are at very best mediocre crackpots rather than super-Einsteins.

That may require prohibitively large sample sizes, i.e. not be testable.

At least for certain populations the sample sizes should be pretty large. Also a smaller-than-desired sample size doesn't mean it's not testable, all it means is that your confidence in the outcome will be lower.

proxies generally fare even worse far from the mean than you'd expect from regression alone

Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.

Yes, I agree. The tails are a problem in general, estimation in the tails gets very fuzzy very quickly.

And it seems to me that having studied math complete with boring exercises could help with understanding of that somewhat... all too often you see people not even ballpark by just *how much* necessary application of regression towards the mean affects the rarity.

Now that I've started to think about it, the estimation of the measurement error might be a problem.

First we need to keep in mind the difference between precision and accuracy. Re-tests will only help with precision, obviously.

Moreover, given that we're trying to measure *g*, it happens to be unobservable. That makes estimates of accuracy somewhat iffy. Maybe it will help if you define *g* "originally", as the first principal component of a variety of IQ tests...

On the other hand, I think our measurement error estimates can afford to be guesstimates and as long as they are in the ballpark we shouldn't have too many problems.

As to the empirical datasets, I don't have time atm to go look for them, but didn't US Army and such ran large studies at some point? Theoretically the results should be in public domain. We can also look at proxies (of the SAT/GRE/GMAT/LSAT/etc.) kind, but, of course, these are only imperfect proxies.

In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.

When discussing a population with a mean IQ other than 100, it is automatically implied that it is not the population that the test has been normed for.

Do you have any psychometric lit. pointers on cases where e.g. normal goodness of fit tests fail? Is this just standard knowledge in the field?

So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.

To the best of my knowledge, few people have actually applied goodness of fit tests to IQ score distributions to check normality.

So, one of the known things is that standard deviation varies by race. For example, both the African American mean and variance are lower than the European American mean and variance.

I don't understand why this is relevant.

I don't understand why this is relevant.

Hm. When I read the great-grandparent earlier, I got the impression it would be helpful to corroborate this claim in the great-great-grandparent:

In any population other than the one for which the test has been normed to follow a normal distribution with mean of 100 and standard deviation of 15, yes, results need not be normally distributed or to have a standard deviation of 15.

Rereading the great-grandparent now, it's not clear to me why I got that impression. (I may have been thinking that the "general population," as it contains distinct subpopulations, will be at best a mixture Gaussian rather than a Gaussian.)

I do agree that private_messaging's claim- that the ratio we see at the tails doesn't seem to follow what would be predicted by the normal distribution- hinges on the right tail being fatter than what the normal distribution predicts. (The mixture Gaussian claim is irrelevant if you've split the general population up into subpopulations that are normally distributed, *unless* the low IQ group contains subpopulations, so it isn't normally distributed. There's some reason to believe this is true for African Americans, for example, if you don't separate out people by ancestry and recency of immigration.)

The data is sparse enough that I would not be surprised if this were the case, but I don't think anyone's directly investigated it, and a few of the investigations that hinge on the thickness of the tails (like Sex Differences in Mathematical Aptitude, which predicts female representation in elite math institutions by looking at the mean and variance of math SAT scores of large populations) seem to have worked well, which is evidence for normality.

Incidentally, is there even any empirical evidence that intelligence is normally distributed in any concrete sense?

I don't think any existing measure could be Gaussian with any sort of accuracy at tail ends, because there you need too large sample size to norm the test & generally, the approximate Gaussian you get due to many random additive factors deviates by huge factors from Gaussian at the tail ends. Bulk of norming of a test comes from average people.

Ditto for correlations between IQ and anything. Bulk of reported correlation comes from near the mean.

I was very successful in my early mathematical education. I'd get As with ease, take exams early, enter mathematics competitions, etc. I had a deep understanding despite doing very little work because all the concepts seemed obvious.

I continued in the exact same way and my performance declined to the point where I was struggling to get Cs. I was now meeting concepts that were not intuitively obvious (eg. limits, proofs, complex numbers), and because of my previous success I had not developed any techniques to gain deep understanding of them. I lost all sense of enjoyment of mathematics and convinced myself that it didn't matter because a good CAS could do it all for me.

I have now started learning again, and there's one realization which has made a big difference. As a student I was always told to solve "problems". This is a terrible name and they should really be called "exercises". The questions are obviously not problems because the teacher has the answer right there in his book. If they are problems then the correct way to solve them is to copy somebody else.

Thinking about the questions as "exercises" makes it clear why you're supposed to solve them, and makes clear how much effort you should put into them. It's analogous to physical exercises -- you don't lift a weight just once and declare it solved, and you don't keep lifting the exact same weight when it becomes easy. I now take the exercises seriously and my understanding improves. I am starting to enjoy mathematics again. I wish somebody had explained this when I was a student.

(This discussion doesn't distinguish what could be called the rigorous and post-rigorous levels of skill, and so feels a little off (at least terminologically). At the rigorous level, which seems like what you are talking about, you know how the tools work, and can reassemble them to attack novel problems. At post-rigorous level, which seems like a better referent for "learning math deeply", you've sufficiently exercised intuitive mental models to offload most routine observations to System 1, freeing up conscious attention and allowing more ambitious intuitive inferences. Fluency as opposed to competence.)

Thanks Vladimir!

Why does my post give the impression of talking about the rigorous level?

You are opposing "learning math deeply" to rote memorization of brittle special cases, but the threshold of being able to work with standard tools (for e.g. understanding technical content of physics/statistics courses) is only rigorous level. Moving further requires additional practice/motivation, when you are already capable of using the tools, and that is not separately discussed in the post.

One (wo)man's brittle special case is another's generalization. There are many different levels of abstraction. One can be at the rigorous level on some dimensions and at the post-rigorous level on others. In the other direction, many things that once required post-rigorous thinking are now sufficiently codified so that they now require only rigorous thinking. There's not a well-defined body of "standard tools."

It's interesting that the comments on this post are split in terms of whether they interpret the focus to be on *math* or on *deeply*. It's also worth noting that the term "deeply" has many different connotations. Stephen Chew, whom you link to, is using deep learning in the sense of learning something by pondering its meaning and associations. But it's very much possible for an unsophisticated to learn something deeply in the Chewish sense without acquiring a conceptual understanding of it that has transferable value. For instance, one might "deep learn" the product rule for differentiation:

(fg)' = f'g + fg'

by saying "each function gets its turn with being differentiated, and then we add up the products." This is reasonably generalizable (for instance, it generalizes to products of more than two functions, and also to product-like settings in multivariable calculus) but it doesn't necessarily help with deep conceptual understanding of the rule. On the other hand, the somewhat deeper understanding the product rule for differentiation using the chain rule for partial differentiation (see here) actually helps provide a deep sense of why the result is true.

Now, my example above in some sense disproves my claim. The reason being that the Chewish deep learning of the product rule: "each function gets it chance at being differentiated, and then we add up" -- is actually not that far off from the conceptually enlightening explanation based on the chain rule for partial differentiation. So perhaps it is true that attempting Chewish deep learning, without actually having a deep conceptual understanding, enables one to generally get quite close to the correct conceptual understanding.

I used a similar approach to learning chemistry at university level (undergraduate to PhD level, although my PhD drifted a bit from pure chemistry into computing and education). There were lots of situations where, to solve a problem, you needed the appropriate applied formula. Many (most?) students tried to memorise these formulae and the situations they applied in. I struggled to memorise them, so instead focused on how to derive the applied formula from a much smaller set of basic equations. Often there's a mental trick that makes it easier - e.g. to derive the equation for surface tension of a liquid, you think about what happens if you split a cylinder of liquid in half. I found it a lot easier to remember that sort of thing than an equation. (I've not used that particular equation since an exam in 1991 or 1992, but I can still vividly remember the mental model to derive it.)

I can't say that my approach was a better one in terms of getting good marks. When you have to answer specific questions - which you know will be drawn only from the set of situations you were taught - it's much easier and quicker to produce the appropriate equation from your memory and apply it. With my approach, you have to spend valuable exam time deriving the equation before you can apply it. (You rarely got marks for deriving the correct equation.)

However, it has been enormously useful for dealing with real world problems beyond the constrained world of exams. My basic approach - work out what I want to know, think of all the equations that might help, then do a bit of thinking and dimensional analysis to see how to get an equation to relate them - is applicable even when you've never seen the precise equation you need. Or, indeed, when nobody has ever seen it, which is often the situation when you're doing genesis research. It's also extremely helpful for understanding when your equation might not be appropriate, since you only get the equation when you've thought a fair bit about the situation you're trying to apply it to.

Also, in the real world these days, it's trivial to look up an equation. If you're doing any serious work that might require formulae, you'll almost always be sat at a computer with an internet connection. It might be marginally quicker to produce the equation from memory, but the generic skill of being able to look up the right equation is much more widely applicable. It's also, I suspect, less prone to trivial errors (e.g. misremembering a sign) for most people.

I've switched to this approach with maths over time. These days I wouldn't dream of doing any serious algebra or calculus by hand - much easier, quicker and less error-prone to stick it in to Mathematica, Wolfram Alpha or whatever. Those systems know more tricky integrals than (almost?) any human. But I don't do any research in maths, I just use maths in my research.

This ready access to information and symbolic mathematics certainly wasn't the case when the people who taught me learned their basics - it was only becoming the case for some people when I learned them (early 90s). Luckily I was one of the early ones. Sadly, it appears to be taking a long time for the world of formal education to catch on to this shift in how knowledge work happens.

My intuition is that my focus on turning things in to transferable knowledge makes it more transferable. My subjective experience is certainly that this generic approach is valuable across domains, and it feels that that's part of what's enabled me to work in many different domains. But I'm not sure I have convincing evidence that it's the case.

It definitely feels more personally satisfying to me, when I feel that I understand what I'm doing, rather than following a recipe, and have the skills to get to work on a totally-novel situation. But that's a personal preference that is not universally shared. Though my guess is that it'll be more widely shared among LWers than among the general population.

If you're in a position to choose between these approaches (learn deeply and how to apply vs memorising large amounts), I'd strongly recommend learning deeply and how to apply it. It may well not get you the best exam marks, but it'll set you up better for dealing with new situations, and that's a more valuable long-term skill.

**[deleted]**· 2013-09-20T09:26:43.771Z · score: 2 (2 votes) · LW(p) · GW(p)

Up until University where I am now I have never actually had to think hard about math problems I was presented with.

Last summer I had an epiphany in abstract algebra, and it has been hugely beneficial to see these structures everywhere in computer science. That is a lot of handy theorems you get for free.

I think that high-level pattern maching strategies are very valuable. Category theory, abstract algebra, etc.

I don't remember a period of my life where I *didn't* feel like I had a deep understanding of math, and so it's hard for me to separate out mathematical ability and cognitive ability.

I've also seen advice from a handful of places I respect to learn as much math as you can stand, because there often *is* transfer from mathematical topics to practical applications. This is much more true for engineers, physicists, and software developers than it is for people in other professions, but still suggests that the first negative consideration you raise is strong (unless it doesn't apply to you).

Reduced need for memorization (while learning math). When you understand math deeply, you see how many different mathematical problems are special cases of a single more general problem, so that in order to remember how to do all of the problems, it suffices to remember the solution to that more general problem.

I remember talking with a friend in high school about physics of electromagnetism. He had the poor fortune to take the non-calculus based version of physics, and so he had to memorize various geometries and the electric potentials they created. I was horrified- in calculus-based physics, we learned one law and then integrated as necessary.

I don't remember a period of my life where I didn't feel like I had a deep understanding of math, and so it's hard for me to separate out mathematical ability and cognitive ability.

I'd be interested in hearing more about your experience. A lot of smart people don't develop a deep understanding of math because *that's not how the subject is taught* and because they don't have the initiative to try to work things out themselves. With this in mind, to what do you attribute your success?

that's not how the subject is taught

Hope this isn't too off-topic, but I wonder if you have any ideas about why that is.

The main impediment to many far-mode thinkers learning hard (post-calculus) math is the drill and drudgery involved. If you're going to learn hard math, it seems you should, by all means, learn it deeply. That's not the obstacle. The obstacle is that to learn math deeply, you must first learn a lot of it rotely--at least the way it's taught.

In the far-distant past, when I was in school, learning elementary calculus meant rote drilling on techniques of solving integrals. Is this still the case? Is it inevitable, or is it the result of methods of education?

The main reason "smart people" avoid math isn't that they want to avoid depth; rather, what is, at least for some of them, drudgery. Math, more than any subject I know of, seems to require a very high level of sheer *diligence* to get to the point where you can start thinking about it deeply. Is this inevitable?

Hope this isn't too off-topic, but I wonder if you have any ideas about why that is.

I think that the point is that more people are capable of routine tasks than of conceptual understanding, and that educational institutions want lots of people to do well in math class on account of a desire for (the appearance of) egalitarianism.

In the far-distant past, when I was in school, learning elementary calculus meant rote drilling on techniques of solving integrals. Is this still the case?

What time period was this? (No need to answer if you'd prefer not to :-) )

Math, more than any subject I know of, seems to require a very high level of sheer diligence to get to the point where you can start thinking about it deeply. Is this inevitable?

Some diligence is necessary, but not as much as it appears based on standard pedagogy. I wish that I could substantiate this in a few lines. If you say something about what math you know/remember, I might be able to point you to some helpful references.

In the far-distant past, when I was in school, learning elementary calculus meant rote drilling on techniques of solving integrals. Is this still the case? Is it inevitable, or is it the result of methods of education?

Some degree of this is probably inevitable. Integration in particular has no closed solution (unlike differentiation), so there really is no one general method you can apply to all problems. All you can do is remember a bag of tricks. While for differentiation, a few general rules allow you to integrate all elementary and trigonometric functions, and that's pretty much all you encounter in school.

With this in mind, to what do you attribute your success?

Well, looking back I have to attribute a lot of my *perception* of success with blindness, in the sense that 5-6 year old me thought he was a hot math talent because he knew about integers when the teacher was teaching the class about natural numbers. (I still remember raging against the claim that the right answer was "you can't subtract 3 from 2!" instead of "negative 1!") From what I can tell from looking at curriculum online, that's ~5 years ahead of schedule but I'd interpret that as the curriculum putting it late (though, on reflection, that could be Dunning-Kruger).

I remember jumping ahead of (well, deeper than- below?) the curriculum frequently, and suspect that it had different causes in different circumstances. Rapid calculation is probably just high *g*, but rapid perception of concepts and connections probably has something to do with intuition or vision that I find difficult to articulate.

I've also never been particularly good at explaining *why* I know what I know with regards to math- from refusing the step through the algebra when I could solve a problem in my head, to avoiding college classes which were primarily about proving that methods worked (i.e. calculus the second time around) rather than introducing new methods. I have, through deliberate practice, gotten better at writing proofs in the last year or two, but still regularly come across simple theorems where I say "I know X is true, but don't know how to *show* X is true."

I do think I would have been *more* successful in a Moore method environment which is designed to teach a deep understanding of mathematics- it seems likely to me_now that me_past would have learned/wanted to care about rigor much earlier in that sort of environment, and would have kept pushing my math boundaries much more uniformly.

Deduction and analogy seem like largely different reasoning processes. I suspect that what you're describing is that by learning the notation and doing enough deductive arguments, the tasks begin to become intuitive, that is, they begin to become analogical and not deductive.

Deductive thinking is conscious, deliberative, and "slow." Analogical and intuitive thinking is unconscious, nondeliberative, and "fast." So you're probably right that by learning to relegate many mathematical tasks to analogical thinking, one increases their efficiency of learning across domains.

This means that as you jump across mathematical problems you start to see that one telescoping argument looks like another, or one proof by contradiction looks like another, just like your brain has assembled an otherwise arbitrary class "tree" by taking many samples of trees across many domains, building up some kind of conditional inference algorithm for recognizing "trees."

Perhaps this comes under "Reduced need for memorization" but when someone says "deeply" I assume they mean understanding the underlying principles - specifically understanding the limitations of the tools being used:

An extremely trivial example might be how often people in businesses communicate using measures of central tendency (mean) but almost never talk about spread (standard deviation). Yet the SD is as important as the mean.

Perhaps less trivial might be that the analysis of small samples (N < 50) often use T-Statistics. This kind of test *requires* that your null-hypothesis is normally distributed. Often there are lots of things where this can be reasonably assumed (or has been established) however I've come across places where the assertion was at least questionable.

More advanced might knowing which tools yield a conservative result. Bonferroni for example is a method for handling multiple comparisons. However if the samples are related it can actually yield a false negative.