Inversion of theorems into definitions when generalizing

post by riceissa · 2019-08-04T17:44:07.044Z · score: 24 (8 votes) · LW · GW · 3 comments

This post describes a pattern of abstraction that is common in mathematics, which I haven't seen described in explicit terms elsewhere. I would appreciate pointers to any existing discussions. Also, I would appreciate more examples of this phenomenon, as well as corrections and other insights!

Note on prerequisites for this post: in the opening example below, I assume familiarity with linear algebra and plane geometry, so this post probably won't make much sense without at least some superficial knowledge of these subjects. In the second part of the post, I give a bunch of further examples of the phenomenon, but these examples are all independent, so if you haven't studied a particular subject before, that specific example might not make sense, but you can just skip it and move on to the ones you do understand.

There is something peculiar about the dependency of the following concepts in math:

In the Euclidean geometry of (the plane) and (three-dimensional space), one typically goes through a series of steps like this:

  1. Using the axioms of Euclidean geometry (in particular the parallel postulate), we prove the Pythagorean theorem.
  2. We take the right angle to have angle and calculate other angles in terms of this one.
  3. The Pythagorean theorem allows us to prove the law of cosines (there are many proofs of the law of cosines, but this is one way to do it).
  4. Now we make the Cartesian leap to analytic geometry, and start treating points as strings of numbers in some coordinate system. In particular, the Pythagorean theorem now gives us a formula for the distance between two points, and the law of cosines can also be restated in terms of coordinates.
  5. Playing around with the law of cosines (stated in terms of coordinates) yields the formula , where and are two vectors and is the angle between them (and similarly for three dimensions), which motivates us to define the dot product (as being precisely this quantity).

In other words, we take angle and distance as primitive, and derive the inner product (which is the dot product in the case of Euclidean spaces).

But now, consider what we do in (abstract) linear algebra:

  1. We have a vector space, which is a structured space satisfying some funny axioms.
  2. We define an inner product between two vectors and , which again satisfies some funny properties.
  3. Using the inner product, we define the length of a vector as , and the distance between two vectors and as .
  4. Using the inner product, we define the angle between two non-zero vectors and as the unique number satisfying .
  5. Using these definitions of length and angle, we can now verify the Pythagorean theorem and law of cosines.

In other words, we have now taken the inner product as primitive, and derived angle, length, and distance from it.

Here is a shot at describing the general phenomenon:

Here is a table that summarizes this process:

Notion Concrete case Generalized case
primitive; defined on its own terms defined in terms of
a theorem defined axiomatically

In what sense is this pattern of generalization "allowed"? I don't have a satisfying answer here, other than saying that generalizing in this particular way turned out to be useful/interesting. It seems to me that there is a large amount of trial-and-error and art involved in picking the correct theorem to use as the in the process. I will also say that explicitly verbalizing this process has made me more comfortable about inner product spaces (previously, I just had a vague feeling that "something is not right").

Here are some other examples of this sort of thing in math. In the following examples, the step of using to define does not take place (in this sense, the inner product case seems exceptional; I would greatly appreciate hearing about more examples like it).


Comments sorted by top scores.

comment by cousin_it · 2019-08-04T19:20:27.215Z · score: 25 (10 votes) · LW(p) · GW(p)

Russian mathematician V.I. Arnold had a semi-famous rant against taking this inversion too far. Example quote:

What is a group? Algebraists teach that this is supposedly a set with two operations that satisfy a load of easily-forgettable axioms. This definition provokes a natural protest: why would any sensible person need such pairs of operations? "Oh, curse this maths" - concludes the student (who, possibly, becomes the Minister for Science in the future).

We get a totally different situation if we start off not with the group but with the concept of a transformation (a one-to-one mapping of a set onto itself) as it was historically. A collection of transformations of a set is called a group if along with any two transformations it contains the result of their consecutive application and an inverse transformation along with every transformation.

This is all the definition there is. The so-called "axioms" are in fact just (obvious) properties of groups of transformations. What axiomatisators call "abstract groups" are just groups of transformations of various sets considered up to isomorphisms (which are one-to-one mappings preserving the operations). As Cayley proved, there are no "more abstract" groups in the world. So why do the algebraists keep on tormenting students with the abstract definition?

comment by romeostevensit · 2019-08-04T21:25:18.241Z · score: 6 (3 votes) · LW(p) · GW(p)

The 'art' in picking the correct theorem in B seems related to structural realism. ie figuring out where we are importing structure from and how as we port across representations.

comment by Pattern · 2019-08-05T02:21:22.674Z · score: 2 (2 votes) · LW(p) · GW(p)

Was this intended to gesture at this process:

1) Mathematics (Axioms -> Theorems), 2) Reverse Mathematics? (Theorems -> (sets of axioms* from which it could be proved)

or this process:

2) See what may be proved in System A. 2) Create system B out of what was proved in system A, and prove things.

*made as small as possible