The Math Learning Experiment

post by Qiaochu_Yuan · 2018-03-21T21:59:04.682Z · score: 124 (35 votes) · LW · GW · 33 comments

Last Sunday I and a small group of volunteers ran an experiment. We got a bunch of CFAR alumni together and asked half of them to tutor the other half in math (in a broad sense, so including computer science) while paying close attention to what was happening in the minds of the tutees, focusing on issues like where the tutees were confused, where they were frustrated, where they were anxious or afraid, etc.

I had a few motivations in trying this. One was a sense that most rationalists could benefit from learning more math on the margin. Another was a sense of there being various obstacles standing in the way of that, for example an identity of "not being a math person" or various flavors of math anxiety (in addition to the fact that it's a hassle / expensive to find a math tutor, and that people have more urgent things to do), and wanting to see the extent to which those obstacles could be debugged at scale. A third was to explore running events at all.

The fourth and most important motivation for me came out of thoughts I've been having since reading Inadequate Equilibria. There's a cluster of people, including but not limited to Eliezer, Critch, and Nate, who (according to me) have what I internally call "trustworthy inside views," another name for which might be the ability to reliably generate useful gears models, and act based on them. This is the thing they do instead of using modest epistemology; it's the thing that allowed Eliezer to write HPMoR, among many other things. And what all of the people who seem to me to have this ability have in common is that they all have strong backgrounds in a technical subject like math, physics, or computer science (in addition to something else, this isn't sufficient). Probably there are other ways to acquire this ability, but acquiring it through learning technical subjects in a particular way seemed promising, and I wanted to see if I could help others make any progress in that direction at all, or at least understand the obstacles in the way of that.


I learned less about inside view than I was hoping. My plan was to run around observing all of the tutor-tutee pairs, but it was harder to do this productively than I expected: I kept feeling like I was missing important conversational context and not knowing how to model the tutee because of that. We tutored for 3 rounds of 30-40 minutes each, and that wasn't long enough; I think to really start approaching the inside view skill would require longer rounds and also particular skills on the part of the tutor that I didn't have the time to attempt to transfer to all of the tutors. I think if I want to learn more about the inside view skill in the future I should try to tutor someone myself, and for longer periods of time; I'll probably try this next.

An interesting point that came up after the 1st round is the unusual role that definitions play in mathematics, as well as the strange way mathematicians typically treat them in textbooks or similar. It's easy to have the experience of reading a new chapter of a textbook, absorbing all the definitions, understanding all the proofs, but still feeling like the wool was pulled over your eyes in some sense. (General topology can really strongly trigger this feeling.) Textbooks generally write down definitions as if there is nothing or very little that needs to be explained in doing so. In fact mathematicians have put a lot of work into writing down the "right" definitions (which is something like writing down the "right" ontology), but this work is basically invisible - I have never seen any textbook seriously discuss it, and it only comes up briefly in discussions of history of mathematics. People don't even talk about it in graduate school, despite the fact that graduate students are expected to be able to generate new mathematics as their job. At no point in a standard math education are students explicitly given the opportunity to practice approaching new mathematical territory with no definitions to help them orient towards it, and coming up with useful definitions on their own.

I consider it an important feature of my approach to mathematics, which feels related to the inside view skill, that I consistently get frustrated at definitions that I don't understand how to reinvent instead of taking them as given. A large part of my math blogging is about motivating definitions. Sometimes it would take me years between my first exposure to and frustration at a definition and the time that I finally had a satisfying motivation for it; for chain complexes it took something like 4 years, and the satisfying motivation is much more complicated to explain than the definition. (For an example that hasn't finished yet, I am still frustrated about entropy, even after writing this post, which clarified a lot of things for me.)


As far as the logistics of tutoring went, the tutor-tutee numbers worked out even on the first 2 rounds, which was remarkable, and on the 3rd round we had some extra tutees. I asked them to be "meta" for a tutor-tutee pair - keeping track of where they were in the discussion, slowing down the tutor if the tutee looks confused, etc. - and people reported that this was extremely helpful. This jives with activities we've been running at CFAR workshops around people talking in groups of 3 in this way (2 people having some sort of conversation and the 3rd person doing various kinds of meta for them) and I'd like to try doing things this way by default if we run something like this again (the next thing I might try is zeroing in on teaching people how to prove things).

I also had forgotten to plan for some things logistically (making sure I had enough hands to set up the space we were using easily, deciding in advance whether to order in lunch or ask people to go out), but they were magically taken care of by my volunteers, for whom I'm very grateful. In the future I'll try to set more time aside earlier in planning for Murphyjitsu.

Overall we at least ended up having a reasonably fun social time, and some people picked up some math in a way that was probably good. I expect the primary impact of this event on the tutees will be to help them feel more like learning math is a thing they can actually do, which is great. The tutors may have learned something about tutoring, which I'm also happy with.

33 comments

Comments sorted by top scores.

comment by Davidmanheim · 2018-03-23T18:07:43.124Z · score: 41 (11 votes) · LW · GW

ARGH! OK, this seems like a really great idea, and I'm thrilled you did it. On the other hand, I feel like LessWrongers fail to optimize tons of effort by not collecting results when they do this class of experiment, so the people involved, and everyone else, fail to gain as much as they could. Obviously you won't get a large sample, and the evidence is weak, but it's REALLY valuable for you and for others considering running similar events.

If I were to specify what types of data I'd hope could be collected:

  • Ask mentees whether they feel they learned something interesting or worthwhile in each session. What was it, specifically? (Both regarding the topic, AND what they gained.) How useful was the session to them, on a scale from 1-10, where 1 is not worth their time, 3 is would consider doing it again, 7 is would certainly attend if it was done again, and 10 is would be willing to pay for similar events. Ask what they would want to change if it were done again.
  • Ask mentors whether they thought their mentee learned something interesting or worthwhile. Ask them to predict what their mentee would say on the above scale. Ask whether the mentors gained anything, and if so, what. Ask what they would want to change if it were done again.

These are all things you could do via a google survey, and you could tell the participants the results would be made public. You don't need to have the datasets matched to figure out who mentored whom, but it might be kind of nice.

Maybe you could still do this, via an emailed survey? If you'd like me to set up the google form, I'd be thrilled to do so!

comment by Qiaochu_Yuan · 2018-03-26T19:28:08.115Z · score: 11 (3 votes) · LW · GW

Yep, sounds right. Just didn't think of it. I expect response rates to be really low to an emailed survey at this point and am not excited about doing it, unfortunately. If I had thought of it I would've had people fill out the survey at the end of the event.

comment by alkjash · 2018-03-21T23:20:31.279Z · score: 30 (7 votes) · LW · GW

Cool experiment, I hope you run more things like this!

Re: definitions, this is a serious struggle I have with the way math is presented. The definitions and notations we have now have been simplified, condensed, and iterated upon for hundreds of years, and what sticks is a matter of practical utility for mathematicians. To present the product of this process as a fait accompli without explanation or motivation is really misleading.

I mean, look how long it took to come up with the modern definition of the group! Even after it was named, for the longest time, mathematicians only considered permutation groups.

I would lay some of the blame at the feet of Platonism. Mathematical Platonism, the idea that "mathematics is out there in idea-space and we just discover it" is at best a useful metaphor or fake framework and has been taken way too far. It implies that definitions are somehow floating in thought-space and the mental move to discover them is to look really hard. In practice, the construction of definitions is more like engineering: there's some weird thing that's there and we'll try to build a vessel that captures its shape as closely as possible.

Note: I tried verifying the received wisdom that "most mathematicians are Platonists" and found the answer to be much murkier than I expected. I would still say that mathematics, the way it is presented, suggests that things like chain complexes and schemes are "things out there in reality" instead of "definitions progressively engineered by people to fit reality."

comment by Qiaochu_Yuan · 2018-03-21T23:25:45.284Z · score: 14 (3 votes) · LW · GW
The definitions and notations we have now have been simplified, condensed, and iterated upon for hundreds of years, and what sticks is a matter of practical utility for mathematicians. To present the product of this process as a fait accompli without explanation or motivation is really misleading.

Yeah, this was one of my main motivations for asking this MathOverflow question.

comment by Chan Bae (chan-bae) · 2018-03-21T22:57:39.104Z · score: 22 (5 votes) · LW · GW

> for chain complexes it took something like 4 years, and the satisfying motivation is much more complicated to explain than the definition.

You can't just say that and not tell us what the motivation is! Is it on your blog somewhere?

comment by Qiaochu_Yuan · 2018-03-21T23:07:58.811Z · score: 15 (3 votes) · LW · GW

Unfortunately, no; it's not a short explanation and I never worked up the motivation to write it up in full. The closest thing I have is this MathOverflow answer. The starting point is the Dold-Kan correspondence but the conceptual meat of the story is about spectra, which loosely speaking are "abelian oo-groups." To really buy the story you have to first buy a story about natively caring about homotopy types which I also haven't written up, but which also shows up in various MO answers of mine (among other places, e.g. the nLab).

Anyway, this is why it took so long; I had to learn and buy a bunch of homotopy theory before I was really satisfied.

comment by An1lam · 2018-03-21T23:46:11.552Z · score: 21 (6 votes) · LW · GW

Awesome idea! I'm curious about some of the details of the event if you'd be willing to share, partly because I've thought of trying something similar for programming stuff:

  • What math topics did you cover and at what level of detail did you cover them?
  • Was the instruction purely informational (one person explaining to another) or were there also problems involved (tutor posing a problem and giving the tutee time to think it through and try to solve it)?
  • Were there people who consider themselves "bad at math" or "not math people" included? Did their views change at all after the event?
comment by Dacyn · 2018-03-23T19:28:39.128Z · score: 16 (4 votes) · LW · GW

In my session I told the tutee to try to prove something and gave hints when he got stuck / warned him when he was going in a wrong direction. For me the most challenging part was coming up with the topic since I wanted to find something that the tutee would have a good chance of figuring out without having to guess a "trick" -- in the end I just said "OK the trick is you just draw this one line here, now you have to analyze the diagram to see how it proves the Pythagorean theorem".

comment by Qiaochu_Yuan · 2018-03-22T16:38:12.274Z · score: 8 (2 votes) · LW · GW

I mostly don't know what was covered because I didn't get to see most of the tutor-tutee pairs; they decided between themselves what to cover. Topics I saw included the Euclidean algorithm (slightly disguised), NP-completeness, the Pythagorean theorem, proofs in linear algebra, and some other stuff I'm forgetting. Detail is whatever people could get through in 30-40 minutes. Tutors used their discretion about how much to do things like problems / exercises.

Yes, I think there were people who considered themselves "bad at math" there, but I didn't really ask questions about this in particular. Hopefully if any of those people are reading they can chime in.

comment by Lanrian · 2018-03-24T09:50:06.381Z · score: 18 (4 votes) · LW · GW
There's a cluster of people, including but not limited to Eliezer, Critch, and Nate, who (according to me) have what I internally call "trustworthy inside views," another name for which might be the ability to reliably generate useful gears models, and act based on them. This is the thing they do instead of using modest epistemology; it's the thing that allowed Eliezer to write HPMoR, among many other things. And what all of the people who seem to me to have this ability have in common is that they all have strong backgrounds in a technical subject like math, physics, or computer science (in addition to something else, this isn't sufficient).

What makes you think this is the result of the technical background rather than a selection effect (where the kind of people who are good at thinking chooses to read technical subjects)?

comment by Qiaochu_Yuan · 2018-03-27T14:49:57.085Z · score: 12 (2 votes) · LW · GW

Good question. So I don't believe this very strongly at all, but it's the hypothesis that affords more to do compared to the selection hypothesis so it's more worth testing. One thing I could learn from running experiments like this repeatedly is that even doing quite a good job teaching math to most people won't cause them to acquire the inside view skill, which would be an important negative result.

It feels to me like the actual math I know is an important component of my inside view skill. More complicated gears models do in fact have components that resemble components of models in math, physics, etc. and it's in fact useful to know what kinds of components are out there. Also I think learning math in a particular way was very useful for training the skill in a particular way. But I would not be surprised if it turns out that the skill starts with a nucleus that was determined by some mostly hereditary personality trait that gets nurtured or not depending on exposure to things like math.

comment by ESRogs · 2018-03-24T22:26:25.103Z · score: 11 (2 votes) · LW · GW

I think it's almost certainly a selection effect. I was surprised by the apparent implication that it wasn't.

comment by habryka (habryka4) · 2018-03-24T22:44:06.225Z · score: 12 (2 votes) · LW · GW

I think that it is more of a causal effect than a selection effect (don't currently have super much time for writing all of it up, but wanted to highlight that while I do think selection effects play a role, there is a large causal effect here as well).

comment by ESRogs · 2018-03-25T10:47:47.433Z · score: 15 (4 votes) · LW · GW

I think the great filter for being this kind of person happens before you're 8 years old. Do you disagree with that?

To the extent that I am similar, I think I've been that way since I was a young child. I was more curious, more interested in numbers, and more pedantic than the other kids my age, at least by kindergarten. I'd be surprised if Eliezer, Critch, and Nate weren't pretty distinct from other kids by age 5.

comment by habryka (habryka4) · 2018-03-25T17:23:05.006Z · score: 11 (2 votes) · LW · GW

Hmm, yeah. I think there is a significant chance the great filter is afterwards.

comment by Raemon · 2018-03-25T20:58:03.036Z · score: 45 (8 votes) · LW · GW

FYI, just talked to Critch. According to him, when he was six years old he learned to read by reading an entire encyclopedia, the first actual "book" he read was A Brief History of Time, which had a big impact on him. He was doing things like checking his understanding of reality at multiple layers of abstraction (i.e. couch is made out of leather which is made out of dead skin which is made out of cells, etc) and making sure the models were harmonious with each other sometime soon after.

When he was 4 he had clear memories of being 2 (and still does). In early teens he set out to become an idealized agent.

He says he thinks having the math-thing was necessary but probably not sufficient.

I think in his particular case, there was definitely a lot of unusual stuff going on. (i.e. he needed some amount of nurture in order for the stuff to take root, i.e. parents who bought him an encyclopedia, helped him learn math, but you clearly don't get the same outcome by giving those same inputs to an arbitrary child)

Eliezer's written up a lot of his early childhood stuff and AFAICT has a sort of similar thing going on (in his case the nurture stuff seems a bit more relevant, but again, lots of kids seem to get fairly similar nurture and not turn out the same)

What this all means depends a lot on "are you trying to find the best model-builders in the world" vs "are you trying to teach arbitrary pretty smart people how to model build."

I think the math thing is most likely quite important for learning how to model build, but it's not obvious in advance to me which sorts of people would be worth investing the effort for it.

comment by Qiaochu_Yuan · 2018-03-26T19:36:34.646Z · score: 11 (2 votes) · LW · GW

Thanks for writing this up!

So, as a contrast, I claim I also have the ability I'm describing, and I was unusually good at math as a kid (increasingly more so as I got older) and just very smart overall but that was about it. I was in a gifted program in elementary school and a different gifted program in middle school; they did a good job developing my ability to think but I didn't really use it on anything except math and schoolwork. The thing that unlocked me was my first CFAR workshop, which is when it first occurred to me that I could enjoy using the parts of my brain I'd developed to think about math to think about anything at all.

comment by G Gordon Worley III (gworley) · 2018-03-22T02:05:22.999Z · score: 15 (3 votes) · LW · GW

At no point in a standard math education are students explicitly given the opportunity to practice approaching new mathematical territory with no definitions to help them orient towards it, and coming up with useful definitions on their own.

I feel lucky in that I got some experience with this, though none of it came as part of my "standard" education.

I remember when starting analysis I felt really lost on the basics of calculus like I didn't know why we were doing anything. So I worked through Serge Lang's "A First Course in Calculus" and it fixed me right up, but only because it was a book that was very careful to work through the intuitions behind why you do things. When you get the intuitions down you can't get confused the way you can if it's all just symbol manipulation.

I got more of this when I did two semesters of directed, independent topology study. The prof and I just sat down and worked through proofs until I could explain them forwards and backwards and work out examples easily because I really groked the basics. Same sort of thing when I switched into graph theory and was doing original research.

By contrast I'm still confused by a lot of stuff in measure theory and PDE and other things because I didn't get a great grounding on the intuitions from classes and didn't have time to go back and get it.

I will say that when I was in computer science the courses did a much better job of training intuitions about the constructs you were working with. My best guess is that there was something of an engineering mindset in that department that was missing from the math department.

comment by Qiaochu_Yuan · 2018-03-22T16:44:02.623Z · score: 22 (4 votes) · LW · GW
I will say that when I was in computer science the courses did a much better job of training intuitions about the constructs you were working with. My best guess is that there was something of an engineering mindset in that department that was missing from the math department.

Yeah, it's very curious that in some sense the best math textbook I ever read is a computer science textbook (Sipser's Introduction to the Theory of Computation). Among other things, before his proofs he generally gives proof sketches describing the high-level ideas and conceptual moves in the proof, which is great and way more people should be doing it.

comment by habryka (habryka4) · 2018-03-23T18:45:50.807Z · score: 17 (3 votes) · LW · GW

My experience has also been that the Computer Science department at Berkeley is much better at teaching Math than the Math department.

comment by sarahconstantin · 2018-03-25T03:27:17.679Z · score: 50 (11 votes) · LW · GW

This matches my experience.

I think academic math has a problem where it's more culturally valorized to *be really smart* than to teach well, to the point that effective communication actually gets stigmatized as catering too much to dumb people.

Having left academic math, I am no longer terrified of revealing my stupidity, so I can now admit that I learned intro probability theory from a class in the operations research department (that used an actual syllabus and lecture notes! unlike most math classes!), that I learned more about solving ODEs from my economics classes than from my ODEs class, that I only grokked Fourier analysis when I revisited it in a signal processing context, and that my favorite introduction to representation theory is Shlomo Sternberg's *Group Theory and Physics.*

Concrete examples are easier for some people to learn from!

comment by totallybogus · 2018-03-25T06:31:45.520Z · score: 14 (3 votes) · LW · GW

I think academic math has a problem where it’s more culturally valorized to be really smart than to teach well

I don't think that's the issue exactly. My guess is that academic math has a culture of teaching something quite different from what most applied practitioners actually want. The culture is to focus really hard on how you reliably prove new results, and to get as quickly as possible to the frontier of things that are still a subject of research and aren't quite "done" just yet. Under this POV, focusing on detailed explanations about existing knowledge, even really effective ones, might just be a waste of time and effort that's better spent elsewhere!

comment by Said Achmiz (SaidAchmiz) · 2018-03-23T19:17:45.802Z · score: 31 (6 votes) · LW · GW

True story:

One semester in high school, I and a bunch of my classmates were taking a math course, where we were learning linear algebra, and also a computer science course (where we were learning C).

Our math teacher had spent a solid two or three weeks on matrices—with most of that time taken up by matrix multiplication. None of us were getting it. We were all smart kids (this was, like, a specialized / magnet / whatever school), and generally reasonably good at math, but this was eluding us; most of the class managed to learn the procedure by rote well enough to pass the tests, but we didn’t really grasp the concept. Eventually, after weeks of frustration, we had more or less “learned” how to multiply matrices sufficiently well that the teacher decided to move on to the next topic in the syllabus.

Not long afterwards, our CS teacher mentioned offhandedly that our next assignment would require us to write matrix multiplication code (it had to do with graphics). As an afterthought, he turned to the class and asked, “Oh, you guys learned this in your math classes, right?”—clearly expecting to get a chorus of “yes”s. When instead his question was greeted by an awkward silence and some “uhh…” and “umm…”, he looked surprised for a moment, then went “Ok, look…”, and started sketching something on the whiteboard.

Five minutes later, the entire class erupted into a collective “ohhhhh!!!”. And that was that.

I’ve had a number of other experiences like this, though that one was the most memorable. So, yes, “computer science departments/instructors teach math better than math departments/instructors” seems to be a trend.

comment by Kaj_Sotala · 2018-03-24T12:45:25.571Z · score: 34 (7 votes) · LW · GW

This reminds me of a story that a CS major friend told me: she and a bunch of others had ran into sum notation earlier in some math classes, but hadn't quite understood how it should be interpreted... until they were taking a CS class, where the TA noticed that they seemed to be confused about it.

The TA was like, "well, you guys know for loops, right?"

Them: "... yes ..."

The TA: "Okay, so if you've got for example, then you could read that as x = 0; for (int n = 1; n <= 5; n++){x = x + n}; return x"

Them: "OOOOOOOH"

comment by Nisan · 2018-03-24T22:48:29.109Z · score: 17 (4 votes) · LW · GW

I heard a similar story about when Paul Sally visited a grade school classroom. He asked the students what they were learning, and they said "Adding fractions. It's really hard, you have to find the greatest common denominator...." Sally said "Forget about that, just multiply the numerator of each fraction by the denominator of the other and add them, and that's your numerator." The students loved this, and called it the Sally method.

comment by Luke A Somers · 2018-03-27T15:48:08.596Z · score: 2 (1 votes) · LW · GW

That does not always produce a reduced fraction, of course. In order to do that, you need to go find a GCF just like before... but I agree, that should be presented as an *optimization* after teaching the basic idea.

comment by Nisan · 2018-03-24T17:25:34.492Z · score: 7 (2 votes) · LW · GW

Cool, do you remember what the 5-minute explanation was?

comment by Vaniver · 2018-03-24T19:11:18.275Z · score: 22 (5 votes) · LW · GW

If I had to guess, it would be something like Kaj's example in the sibling comment--they were doing summations instead of loops, and they hadn't seen the graphical arrangement of matrices that makes the multiplication obvious. ("Okay, wait, why are we using this index and that index?" -> "You put matrix A here, matrix B there, you vector-multiply this row and this column and that creates this cell, and then you do that for all the cells.") If you look at the Wikipedia page, imagine a class that only did the definition section and not the illustration section.

comment by Said Achmiz (SaidAchmiz) · 2018-03-25T00:35:04.358Z · score: 12 (3 votes) · LW · GW

Yes, it was a graphical explanation, I do remember that. (The one you describe sounds plausible, at least.)

comment by Said Achmiz (SaidAchmiz) · 2018-03-24T18:41:59.462Z · score: 9 (2 votes) · LW · GW

Oh gosh, no, sorry; I wish I did. This was many, many years ago, and the last time I had to write matrix code was also (a smaller number, but still) many years ago.

comment by Benito · 2018-03-21T23:52:06.518Z · score: 14 (3 votes) · LW · GW

Yay rationality experiments and reports!