Posts
Comments
As a current Harvard math grad student I think you should read many different easy books to learn a subject whenever possible, especially if you can find them for free. When you say you are mathematically able it is unclear what level you are at. All of my favorite books for learning involve huge number of exercises, and I recommend you do all of them instead of reading ahead.
For basic real analysis, my favorite book is Rosenlicht's Introduction to Analysis but baby Rudin is pretty good too, and I recommend you flip back and forth between them both.
For learning math in general, I think real analysis is a poor place to start, but that may be personal preference because I have a more algebraic slant. I highly recommend books like Herstein's Abstract Algebra, Mathematical Circles: A Russian Experience, I.M. Gelfand's Trigonometry, and Robert Ash's Abstract Algebra: The Basic Graduate Year, mostly for the wealth of exercises. Some of these are books for small children and I think those are the best sort of books to first learn from.
Personally I think real analysis is an awkward way to learn mathematical proofs, and I agree discrete mathematics or elementary number theory is much better. I recommend picking up an Olympiad book for younger kids, like "Mathematical Circles, A Russian Experience."
I'm sure not only "elite" mathematicians intuit the interest of problems like the unsolvability of the quintic. That one can prove a construction impossible, the very concept of an invariant, is startling to the uninitiated. So many classic problems of this nature are held up as paradigms of beauty--the Konigsberg bridge problem, ruler and compass constructions of cube roots, the irrationality of sqrt(2),..
I'm doing my math PhD at Harvard in the same area as Qiaochu. I was also heavily involved in artofproblemsolving and went to MathPath in 2003. I hoped since 2003 that I could stake a manifest destiny in mathematics research.
Qiaochu and I performed similarly in Olympiad competitions, had similar performances in the same undergraduate program, and were both attracted to this website. However, I get the sense that he is driven quite a bit by geometry, or is at least not actively adverse to it. Despite being a homotopy theorist, I find geometry awkward and unmotivated. I cannot form the "vivid" or "bright" images in my mind described in some other article on this website. Qiaochu is also far more social and active in online communities, such as this one and mathoverflow. I wonder about the impact of these differences on our grad school experiences.
Lately I've been feeling particularly incompetent mathematically, to the point that I question how much of a future I have in the subject. Therefore I quite often wonder what mathematical ability is all about, and I look forward to hearing if your perspective gels with my own.
I think it's very important in understanding your first Grothendieck quote to remember that Grothendieck was thrown into Cartan's seminar without requisite training. He was discouraged enough to leave for another institution.
I apologize for the snipy remark, which was a product of my general frustrations with life at the moment.
I was trying to strongly stress the difference between (1) an abstract R-torsor (B-theory), and (2) R viewed as an R-torsor (your patch on A-theory).
Any R-torsor is isomorphic to R viewed as an R-torsor, but that isomorphism is not unique. My understanding is that physicists view such distinctions as useless pendantry, but mathematicians are for better or worse trained to respect them. I do not view an abstract R-torsor as having a basis that can be changed.
You might be able to do it with some abstract nonsense. I think general machinery will prove that in categories such as that defined in the top answer of
there are terminal objects. I don't have time to really think it through though.
I have always heard the affine line defined as an R-torsor, and never seen an alternative characterization. I don't know the alternative axiomatization you are referring to. I would be interested to hear it and see if it does not secretly rely on a very similar and simpler axiomatization of (R,+) itself.
I do know how to characterize the affine line as a topological space without reference to the real numbers.
Torsors seem interesting from the point of view of Occam's razor because they have less structure but take more words to define.
I think that the distinction may be clarified by the mathematical notion of an affine line. I sense that you do not know much modern mathematics, but let me try to clarify the difference between affine and linear space.
The A-theorists are thinking in terms of a linear space, that is an oriented vector space. To them time is splayed out on a real number line, which has an origin (the present) and an orientation (a preferred future direction).
The B-theorists are thinking in terms of an affine line. An affine line is somewhat like the A-theoriests real line, but it doesn't have an origin. Instead, given two points a & b on the affine line, one can take their difference a-b and obtain a point on the real line. The only defined operation is the taking of differences, and the notion of affine line relies on a previously defined notion of real line.
I guess I always took the phrase "unreasonable effectiveness" to refer to the "coincidence" you mention in your reply. I'm not really sure you've gone far toward explaining this coincidence in your article. Just what is it that you think mathematicians have "pure curiousity" about? What does it mean to "perfect a tool for its own sake" and why do those perfections sometimes wind up having practical further use. As a pure mathematician, I never think about applying a tool to the real world, but I do think I'm working towards a very compressed understanding of tool making.
So what does "gone wild" mean? Your paragraph about this is not very charitable to the pure mathematician.
Say that mathematics is about generating compressed models of the world. How do we generate these models? Surely we will want to study (compress) our most powerful compression heurestics. Is that not what pure math is?
Computer scientists seem much more ready to adopt the language of homotopy type theory than homotopy theorists at the moment. It should be noted that there are many competing new languages for expressing the insights garnered by infinity groupoids. Though Voevodsky's language is the only one that has any connection to computers, the competing language of quasi-categories is more popular.
It is misleading to attribute that book solely to Voevodsky.
Imagine that we encounter a truly iid random sequence of 90% likely propositions Q(0),Q(1),Q(2),... Perhaps they are merely pseudorandom but impossibly complicated to reason about, or perhaps they represent some random external output that an agent observes. After observing a very large number of these Q(i), one might expect to place high probability on something like "About 90% of the next 10^100 Q(j) I haven't observed yet will be true," but there is unlikely to be any simple rule that describes the already observed Q(i). Do you think that the next 10^100 Q(j) will all individually be believed 90% likely to be true, or will the simpler to describe Q(j) receive closer to 50% probability?
The key here is that you are using finite S. What do you do if S is infinite? More concretely, is your schema convergent if you grow your finite S by adding more and more statements? I believe we touch on such worries in the writeup.
Sorry if there is something fishy in the writeup :(. I could believe it, given how rushed I was writing it.
Suppose we consider not just a,~a,b,~b, and c,~c, but also statements q="exactly one of a,b,c is true" and ~q. Suppose now that we uniformly pick a truth value for a, then for q, then a logically consistent but otherwise random value for b, and finally a logically consistent but otherwise random value for c. Such an asymmetric situation could occur if b and c have high mu but a and q have small mu. In the worlds where we believe q, b and c are much more often disbelieved than a. I believe that basically captures the worries about Demski's scheme that Paul was having; maybe he will comment himself.
Does that clarify anything?
Hi Manfred, the link is in the theme 1and section of the original post. Let me know if it clarifies our worries about simplicity priors.
Hi Manfred, It might help to communicate if you read the report, though I'm not sure it's understandable since it was written quite rapidly :(. Thanks for sending your stream of consciousness and I look forward to hearing more of it. You seem quite convinced that a simplicity prior exhibits reasonable behavior, but I am less so. A lot of the workshop involved understanding potential pitfalls of simplicity priors. Basically, some pathologies appear to arise from the fact that a very large pseudorandom number with a short description is treated very differently from a truly random large number with no short description.
It is difficult to analyze simplicity priors since they are so intractably complicated, but do you believe that, after updating on the first 1 million values of a random 90% true sequence of propositions P(0), P(1),...,P(1 million), P(x) will be assigned similar truth probabilities both when x admits short description and when x admits only a complicated description?
Hi Manfred, your post indeed touches on many of the same issues we were considering on the workshop. If I understand it correctly, you are worried about the robot correctly learning universal generalizations. For example, if P(x) is true for all odd x and false for all even x, you want your robot to correctly guess that after observing P(0),P(1),P(2),...,P(1 million). In fact, most current proposals, including Demski's, are capable of learning these universal generalizations. Our current programs are able to correctly learn rules like "P(x) is true for all odd x," or even "P(x) is true for 90% of x between 0 and 10 million." You could call this step "generalization." There are many interesting things to say about generalization, including your post and Will Sawin's Pi1-Pi2 Theorem. The problem Paul brought to our attention at this workshop however was not generalization, but "specialization." If you believe that "P(x) is true for all odd x," it's rather easy to say what you believe about P(994543). But it's surprisingly subtle to understand what you should believe about P(994543) given only that "P(x) is true for 90% of x between 0 and 10 million," particularly when P(994543) is part of a different sequence of propositions Q(x) which you have different statistical beliefs about. Properly handling the "specialization" step was the focus of the workshop, and I'd be interested to hear if you have any thoughts about the subtleties brought up in the write-up.
|A kind of counter-example to your claim is the following: http://www.math.rutgers.edu/~zeilberg/GT.html It is an automated reasoning system for Euclidean geometry. Starting from literally nothing, it derived all of the geometric propositions in Euclid's Elements in a matter of seconds. Then it proceeded to produce a number of geometric theorems of human interest that were never noticed by any previous Euclidean geometers, classical or modern.
This is simply to point out that there are some fields of math that are classically very hard but computers find trivial. Another example is the verification of random algebraic identities by brute force.
On the other hand, large fields of mathematics have not yet come to be dominated by computers. For those I think this paper is a good introduction to some state-of-the-art, machine-learning based techniques: http://arxiv.org/abs/1108.3446 One can see from the paper that there is plenty of room for machine learning techniques to be ported from fields like speech and vision.
Progress in machine learning in vision and speech has recently been driven by the existence of huge training data-sets. It is only within the last few years that truly large databases of human-made proofs in things like set theory or group theory have been formalized. I think that future progress will come as these databases continue to grow.
I agree that the derivation of (4) from (3) in the paper is unclear. The negation of a=b>=c.
I do not think the situation is as simple as you claim it to be. Consider that a category is more general than a monoid, but there are many interesting theorems about categories.
As far as foundations for mathematical logic go, any one interested in such things should be made aware of the recent invention of univalent type theory. This can be seen as a logic which is inherently higher-order, but it also has many other desirable properties. See for instance this recent blog post: http://golem.ph.utexas.edu/category/2013/01/from_set_theory_to_type_theory.html#more
That univalent type theory is only a few years old is a sign we are not close to fully understanding what foundational logic is most convenient. For example, one might hope for a fully directed homotopy type theory, which I don't doubt will appear a few years down the line.
There are many types of math, with differing sorts of value, but I can say a little about the sort of math I find moving.
I agree with you. For the most part, applied souls dream up their advances and make them without relying on the mathematical machine. They invent the math they need to describe their ideas. Or perhaps they use a little of the pure mathematician's machine, but quickly develop it in ways that are more important to their work than the previous mathematical meanderings.
I think you underestimate the role of mathematics as the grand expositor. It is the tortoise that trails forever beyond the hare of applied science. It takes the insights of applications, of calculus for example, and digests them. It reworks them, understands them, connects them, rigorizes them.
The work of mathematics is not useful in your mind because a mathematician does not make a truly new applied advance. A mathematician invents and connects notations to ease the traversal, the learning, and most importantly the storage in working memory of past insights.
What is the purpose of a category? An operad? A type theory? A vector bundle? The digit 0? When these languages were introduced, it could always be claimed they were worthless because the old languages could express the same content as these new languages. But somehow the new language makes it easier to conceptualize and think about the old ideas; it increases the working human RAM.
And what of the poor student? He who must learn so many subjects is grateful when it is realized that many of those subjects are in fact the same: http://arxiv.org/abs/0903.0340 . Mathematics digests theories and rewrites them as branches of a common base. It makes it possible to learn more insights quickly and to communicate them to the next generation.
So young applied scientists, perhaps generations later, benefit by more compactly and elegantly understanding the insights of their forebearers. Then, the mathematician dreams, they are freer to envision the next great ideas: http://arxiv.org/abs/1109.0955
So why the mathematician's focus on solving specific problems? Why so much energy to characterize finite groups? It is not that these problems are important. It is that they serve as testbeds for new languages, for new characterizations of old insights. The problems of pure math are invented as challenges to understand an old applied language, not to invent a new one.