Dense Math Notation
post by JK_Ravenclaw · 2011-04-01T03:37:02.357Z · LW · GW · Legacy · 23 commentsContents
23 comments
I program, and am also presently working my way through some math books. I find that I often have to backtrack to look up pieces of notation like variables and operators. Unfortunately, this is very problematic. Greek and Latin letters give no indication of where they came from and are not usable search terms, even knowing the full context in which they appeared. Many authors have their own bits of idiosyncratic notation, often combinations of subscripting and line art generated by TeX macros. Since expanding equations out to their definitions is so difficult, I sometimes don't bother to investigate when one looks odd, which as you might expect leads to big trouble later when errors in understanding creep by.
This is not nearly as big a problem for me when reading code, however, because there every variable and nonstandard operator has a descriptive name wherever it's used, and documentation is never more than a few hotkeys away. The same thing could be done for math. Suppose you took a typical higher math book, and replaced every single-letter variable and operator with an appropriate identifier. For me, this would make it much more readable; I would gain a better understanding in less time. However, I don't know the effect size or how broadly this generalizes.
Do other people have this problem? Might this issue deter some people from studying math entirely? Has anyone tried the obvious controlled experiment? How about with an experimental group specifically of programmers?
23 comments
Comments sorted by top scores.
comment by SarahNibs (GuySrinivasan) · 2011-04-01T04:20:33.537Z · LW(p) · GW(p)
Here is Jason Dyer's recent attempt to make a proof of the Fundamental Theorem of Arithmetic that is actually nice to read. He's looking for feedback! Dyer's proof
Replies from: benelliott, Gray, prase, cousin_it, CronoDAS↑ comment by benelliott · 2011-04-01T20:49:17.482Z · LW(p) · GW(p)
This seems to have gone a bit too far in the other direction, I felt like I was wasting my time reading a whole page for an insight that could have been conveyed in one line.
Replies from: jdyer↑ comment by jdyer · 2011-11-14T14:28:16.631Z · LW(p) · GW(p)
(I wrote the Fundamental Theorem of Algebra post.)
I don't necessarily disagree; I wrote the original thing as a proof of concept and wasn't expecting it to be perfect first time out. However, I do have the extra notion -- given my design ideas presume things are being delivered electronically, as seems to be the case with most math papers these days -- that one could expand or condense different parts of the proof so the amount of detail would be selectable.
Also, however, regarding the fiddly details, some of the insights conveyable in one line are in the original proof are written as the original whereas some of the denser things that DID need expansion were originally mashed together as one line with cryptic variable names. This can happen with math papers quite often due to the demands on rigor. One proof redesign I was working on but never posted (at least not yet, I should dig it out) was that the set-theory definition of ordered pairs (x,y) actually works; it's head-slappingly obvious but the demands of mathematics require every small detail accounted for.
Also also, a lot of this is dependent on personal preference. One commenter didn't like extra emphasis the indents-for-suppositions but I've found them extremely helpful in reading proofs.
↑ comment by Gray · 2011-04-01T17:27:11.544Z · LW(p) · GW(p)
There's a link to this article that makes a lot of sense. But I get the overall sense that there is a limit to consistency, expressibility, and ease of reading. If you try to go too far in one factor, you're going to lose some of the other factors. It boils down to a presentation matter, and it depends on what purpose are you making the presentation.
Replies from: folkTheory↑ comment by folkTheory · 2011-04-01T17:49:51.612Z · LW(p) · GW(p)
Sure, but the current status quo is largely a result of chance and mathematical fads over the centuries than any concerted effort to find consistent and good symbols and notation (with Leibniz being an exception to this). There's really no reason to think we can't do better than the current state of mathematical notation.
Replies from: Gray↑ comment by Gray · 2011-04-01T23:11:53.484Z · LW(p) · GW(p)
Well, I was hinting at this, but I think you should also consider the idea that form follows function. I think the function of mathematical notation for the sake of mathematics proper is for greater and greater abstraction, which involves ignoring any element not considered necessary or relevant to what is being proposed.
Those of us who are, instead, more interested in practical reason and wish to gain some mileage from the achievements of mathematics, are more likely to adopt notation more similar to programming languages, where we want to express relationships that are more grounded and more concrete.
There isn't any perfect mathematical notation, only notation that is most efficient for your particular usage. Like everything else, finding "good notation" is an economics problem.
comment by Risto_Saarelma · 2011-04-01T06:18:00.901Z · LW(p) · GW(p)
I do get the feeling that mathematics is made somewhat hard to approach for people who grew up programming computers by both the notation that's optimized for writing on blackboards and some unspoken cultural assumptions made by people who worked their way up to the stuff without the notion that a lot of the things might get programmed into computers or be analogous to computer programs.
There might also be a bias in math for excessively clever analytical solutions for things for which a more verbose and straightforward algorithmic approach would do as well. A recent blog post compared an inscrutable closed-form formula for calculating the weekday of a date into a much more readable programming language function.
I wouldn't take that as a very solid example though, in a mathematical sense, the calender system is a massively complex way to represent what is basically the number line. A lot of interesting math is about working with entirely new structures, instead of dealing with accidental complexity in established structures. The programming analogy starts to have trouble on two fronts here.
One problem is that new structures involve new formalisms, and programming languages are more about operating within an existing formalism. Diving into metamathematics by trying to keep things within one formal system when discussing another sounds much more headache-inducing than just doing the informal discussion of the new formal system that mathematicians do now.
The other problem is that the verbose variable names programming languages use, that are helpful when working with somewhat real-world phenomena, become much less useful once you dive sufficiently deep into abstraction. There might not be a better term for the operands of a suitably uncommon and abstract structure X than "things X operates on", so the single-letter names are as good as any. (You don't even have to go into strange theory for this, just see the higher-order functions from any functional language, like map
or foldl
, which operate on sequences of any values of the same type. Haskell calls the sequences x:xs
for "current x" and "the rest of the exes", which is pretty much all you can say about them from the context of the higher-order function.)
If someone wants to experiment with this, I'd like to see how programmers who know little math get along with The Haskell Road to Logic, Maths and Programming and maybe what programming-savvy physics students with some advanced math skills get out of Structure and Interpretation of Classical Mechanics.
Replies from: JK_Ravenclaw, jschulter, bogus↑ comment by JK_Ravenclaw · 2011-04-01T12:58:57.806Z · LW(p) · GW(p)
There might not be a better term for the operands of a suitably uncommon and abstract structure X than "things X operates on", so the single-letter names are as good as any.
No, they aren't! You can't search to jump between the usages and definitions of a single-letter name, but you can jump between the usages and definitions of a full-word name, even if that name is blarghle.
Haskell calls the sequences x:xs for "current x" and "the rest of the exes", which is pretty much all you can say about them from the context of the higher-order function.
Haskell can get away with this because it has strict, well-defined scoping rules which ensure that the names x and xs never appear too far from their definitions, and there is an algorithm which text editors can implement to find those definitions. Math books do not have either of those benefits.
Replies from: LeibnizBasher, bogus↑ comment by LeibnizBasher · 2011-07-24T20:57:01.762Z · LW(p) · GW(p)
Even worse than trying to search for single-letter variables that are defined somewhere in a mathematical text is trying to find the definitions of operators, if all you know is the squiggle used to denote that operator. For example, integrals are denoted by a slide-looking squiggle, so if you see one and don't know it's called an "integral", you can't look up what it means. If you do find a definition, the Wikipedia page you'll get describes integrals as "the signed area of region bound by (the function parameter's) graph", with a full-page of explanation and links to 5 or 6 pages of supplementary explanation. Good luck translating that into code!
What you won't find is the 5-line program that shows you how to actually calculate an integral (for that, see this page from SICP). Mathematicians descend into maddening vaguery when trying to describe concepts that could easily be described by a very short computer program, because math notation (and therefore mathematical study itself) lacks an equivalent of the for loop. So instead, mathematicians think they're describing something so fundamental that it's ineffable-- Integrals, man! Integrals! Either you grok it, or you don't.
↑ comment by bogus · 2011-04-01T13:22:04.432Z · LW(p) · GW(p)
Haskell can get away with this because it has strict, well-defined scoping rules which ensure that the names x and xs never appear too far from their definitions, and there is an algorithm which text editors can implement to find those definitions. Math books do not have either of those benefits.
Actually, they do, since x and xs are bound variables. Now, variable binding in mathematics is more complicated than it needs to be, but "forall x ", "exists x ", "f(x) =", "d / d x", "int ... d x", "sum_( i = 0 .. k)" ... are variable binding operators, which are quite comparable to the Haskell syntax binding x and xs in the definition of foldr.
Haskell also has support for free variables bound by a closure, where the scoping rules are not so strict and well-defined. But I would expect Haskell programmers to use more readable names for these.
Replies from: Dan_Moore↑ comment by jschulter · 2011-04-02T06:38:56.472Z · LW(p) · GW(p)
I'm taking note of the latter and adding it to my list of "books to read when I have time and motivation for independent education." The addition of Scheme to the application of mechanics does seem quite useful from what I can tell after a cursory look. And there's a nice bit more mathematical rigor than I had the luxury of in my physics classes. Overall, it looks like this text takes an approach that I'll like a lot, once I get to it.
For the record, I'm a physics and mathematics undergrad, graduating next May. My schools physics program recently decided to actually start making us apply that programming they had us learn; I might consider trying Scheme instead of C if I feel like it.
↑ comment by bogus · 2011-04-01T14:20:15.048Z · LW(p) · GW(p)
A recent blog post compared an inscrutable closed-form formula for calculating the weekday of a date into a much more readable programming language function.
This is a good reference for the "clever, analytical" bias in math, but "programming language functions" are just as formulaic; they just lack the obfuscation of a thoroughly optimized closed-form solution: For instance, the "serial number of a day in the year" can be phrased as follows:
daynum(d, m, y) is a function on natural numbers which is defined if 1 <= m <= 12. We will define it by induction on m:
- If m = 1, then let daynum(d, m, y) = d
- Otherwise, let m' = m - 1 and l = monthLen(m', y). Then let daynum(d, m, y) = daynum(d + l, m', y).
Proof: see the blogpost.
The case analysis in monthLen can be expressed as is, whereas the leapyear predicate is a matter of style: neither of Van Emden's C translations match intuition closely, but the following is reasonably intuitive and mathematically clear:
leap(y): A year y is a leap year when it is divisible by 4 and not divisible by 100, or else when it is divisible by 400.
In sum, there is no compelling reason for using C code here, except for it being easier to run on a computer: mathematical language can be just as expressive or more so.
Replies from: jimrandomh↑ comment by jimrandomh · 2011-04-01T14:35:23.312Z · LW(p) · GW(p)
Code has a few other advantages, though they aren't actually being used in this case. You can type-check it, which does for some types of math what dimensional analysis does for physics. You can also bring it into automated theorem provers and verifiers, which is much harder for prose.
comment by prase · 2011-04-01T13:11:52.051Z · LW(p) · GW(p)
It probably depends a lot on the context, but in most situations I would find multi-character variable names distracting. It may work well for short formulae, but try to write down the Taylor series for the hypergeometric 2F1 function, and it may end up like
HypG (firstpar, secondpar, thirdpar, variable) = Sum (i=0, inf, i++) ( variable^i Pochhammer (firstpar, i) Pochhammer (secondpar, i) / ( Pochhammer (thirdpar, i) * Factorial (i) ))
Well, I rather prefer the standard way. Incidentally, once I have argued with a man who held even more drastic stance on this, saying not only that all mathematical formulae should be replaced by algorithms, but also that the raw TeX input is more readable than the processed output. And he even wasn't a programmer.
Replies from: jimrandomh↑ comment by jimrandomh · 2011-04-01T14:16:40.007Z · LW(p) · GW(p)
I think we need to draw a distinction between variables that are used on the same line where they're bound, and variables that live longer. Naming things firstpar, secondpar, thirdpar is silly in this context, because if they had single-letter names, the names would still be right there on the same line where they're used. But I actually did look up Pochhammer there, and got a reminder of its definition as the first search result, and I couldn'tve done that if the TeX symbol were used instead.
comment by Alex Flint (alexflint) · 2011-04-02T13:33:46.976Z · LW(p) · GW(p)
Equations can be crafted so that their layout on the sheet of paper resonates with our brains' pattern matchers. Concepts like factorization, additivity, conditional independence can be embedded in the spatial layout of equations in a way that makes them immediate without the need to go over the equation with a fine-tooth comb. Unfortunately, this requires brevity, which can decrease parseability overall if not used judiciously.
comment by Giles · 2011-04-02T18:15:30.337Z · LW(p) · GW(p)
The main problem I have with mathematical notation right now is that I can't skim it. If I am reading a document with some math notation in, I tend to just skip past it and figure out what's going on from the surrounding text.
I can read it quickly and just see a bunch of apparently meaningless symbols. Or I can read it very, very slowly and carefully and figure out exactly what everything means and exactly what's going on. But there's nothing in between.
Computer code I find rather easier to skim, and natural language is much easier.
Is this a problem with the notation itself or is it just that I work with computer code on a day-to-day basis and don't with maths, so that I've learnt how to skim it and spot the relevant patterns much more easily?
Replies from: Johnicholas↑ comment by Johnicholas · 2011-04-03T01:23:30.363Z · LW(p) · GW(p)
Some mathematics (like some computer code) is skimmable, if you have your eyes trained for it (That's what a lot of programming and mathematics training is - learning to read.)
However, other mathematics, including most mathematics that you are probably interested in reading (groundbreaking research papers, for example) is just as non-skimmable as deliberately terse code by a master programmer in a terse language with a very large library.
Master programmers (because they do not usually have page limits) generally are not as terse as they can be; mathematicians vary, but sometimes are considerably more terse than they "can" be - because they have human readers who can fill in gaps if they're relatively obvious.
comment by komponisto · 2011-04-01T04:17:22.475Z · LW(p) · GW(p)
Music theorists tend to use strings of capital letters as variables, which (at least to my non-programmer eye) gives the literature a computer-programming look.
For example, take this classic book by David Lewin, full of variables and operators with names like EQUIV, SAMETYPE, IFUNC, INJ, and so on.