How to teach things well

post by Neel Nanda (neel-nanda-1) · 2020-08-28T16:44:27.817Z · LW · GW · 12 comments

This is a link post for


  How to teach
  Teaching 1-on-1
  Teaching Maths
  Teaching Applied Rationality/Life Advice

(This is a post on my thoughts on good teaching techniques from a daily blogging project, that I thought might be of interest to LessWrong readers)


This is a blog post on how to teach things well. I’ll mostly be focusing on forms of teaching that involve preparation and structure, like talks and tutoring, but these ideas transfer pretty broadly. I think teaching and explaining ideas is an incredibly important skill, and one that most people aren’t great at. I’ve spent a lot of time practicing teaching ideas, and I think I’ve found a bunch of important ideas and approaches that work well. I’m giving a talk next week, so I’ll initially focus on how to give good talks, but try to outline the underlying concepts and high-level ideas of teaching. And then talk about how these can transfer to contexts like tutoring, and to teaching specifically maths or applied rationality - the main areas I have actual teaching experience with.

Note: I mostly care about teaching concepts and ideas, and teaching things to people who genuinely want to learn and be there, so my advice will focus accordingly.

I think it’s useful to think about good teaching even if you don’t intend to spend much time teaching - learning and teaching are flip sides of the same process. I’ve found that even when in the role of a student, understanding what good teaching looks like can often fix a lot of the shortcomings of a bad teacher!


The key insight of this post is that good teaching requires you to be deliberate, and keep the purpose in mind: learning is a process of information compression. When you’re learning something new, you essentially receive a stream of new information. But human cognition doesn’t work by just storing a flood of information as is. The student takes in the information stream, extracts out the important ideas, converts it to concepts, and stores those in their mind. This is a key distinction, because it shows that the job of a teacher is not to give the student information, it’s to get the student to understand the right concepts. Conveying information is only useful as a means to an end to this goal.

In practice, it often works to just give a stream of information! Good students have learned the skill of taking streams of information and converting it to concepts. Often this happens implicitly, they student will absorb and memorise a lot of data, and over time this forms into concepts and ideas in their head automatically. But this is a major amount of cognitive labour. And a good teacher will try to do as much it as possible, to let the student focus their cognitive labour on the important things.

My underlying model here is that we all have a web of concepts in our minds, our knowledge graph. The collection of all the concepts we understand, all of our existing knowledge and intuitions, connected together. And you have learned something when you can convert it to concepts and connect it to your existing understanding. This means not just understanding the concept itself, but understanding where it fits into the bigger picture, where to use it, etc.

The final part there is key - if the student leaves with a good understanding of the ideas in the abstract, but no idea when to think about the ideas again, it’s no better than if they’d learned nothing at all. We call on our knowledge when something related triggers, so in order for a lesson to be useful, you need to build those connections and triggers in the student’s mind.

A key distinction to bear in mind is ideas being legible vs tacit. A legible idea is something concrete that can easily be put into unambiguous words, eg how to do integration by parts. While tacit knowledge is something fuzzier and intuitive, eg recognising the kinds of integrals where you’d use integration by parts in the first place - essentially the intuitions you want the student to have. This is a good distinction to bear in mind, because legible knowledge is much easier to convey, but often your goal is to convey tacit knowledge (at least, it should be!). And there’s a lot of skill to conveying tacit knowledge well, and making it as legible as possible without losing key nuance. And different techniques work better for the two kinds. A lot of my issues with the Cambridge maths course is an extreme focus on legible knowledge over tacit - the underlying intuitions and motivations.

How to teach

There are two key problems when teaching, that any good teaching advice must account for:

Here are some of the most important tools I have for addressing these problems:

Teaching 1-on-1

Practicing tutoring and explaining things one on one can often be more valuable! I think a great use of time for most students is to do tutoring - it’s pretty fun, you get paid decently, and you get way better at explaining ideas. And the ability to explain an idea clearly in a conversation is an amazingly applicable skill - I use this all the time in daily life.

The main difference is that it’s a lot easier to get them to be active, and it’s much easier to adapt the pace and difficulty well. Essentially, invert all of the ideas in my post on how to learn from conversation

Teaching Maths

Teaching Applied Rationality/Life Advice


If you’re planning on teaching something in the future, I hope these thoughts were useful! But even if not, I think these skills transfer excellently to explaining things in everyday life. And that thinking about teaching can make you a much more effective learner.

I find that often, as a student, I can help the teacher be more effective by asking the right questions - asking them which information is the most important, checking that my understanding is correct by paraphrasing back, asking them for the motivations and higher-level picture. The feeling of “something not fitting into my knowledge graph well” can be made into a pretty visceral one. And realising the habits of students that hinder them from learning, like being passive instead of active, and not trying to do information compression themselves, can help me recognise when I fail to do those things!


Comments sorted by top scores.

comment by Liron · 2020-08-29T18:24:41.595Z · LW(p) · GW(p)

My favorite part was the advice to highlight what’s important, and it helped that you applied your own advice by highlighting that the most important part of your lesson is the advice to highlight the most important part of your lesson.

I’ve previously attempted to elaborate on why examples are helpful for teaching: [LW · GW]

comment by johnswentworth · 2020-08-29T19:18:23.967Z · LW(p) · GW(p)

I'm always a bit frustrated when people talk about a "knowledge graph"; the concept seems obviously useful, but also obviously incomplete. What precisely are the nodes and edges in the graph? What are the type signatures of these things?

I was thinking about this over breakfast. Here are some guesses.

One simple model is that each "node" in the graph is essentially a trigger-action plan [? · GW]. There's a small pattern-matcher, for example a pattern which recognizes root-finding problems with quadratic functions. When the pattern matches something, it triggers a bunch of possible connections - e.g. one connection might be a pointer to the quadratic equation, another might be a connection to polynomial factorization, etc. Each of those is itself either another node (e.g. the quadratic equation node) or, in the base-case, a simple action to take (e.g. writing some symbols on paper).

In this model, teaching involves a few different pieces:

  • Creation of the node itself - just giving it a name and emphasizing importance can help
  • Refining the pattern - e.g. practice recognizing root-finding problems with quadratic functions. Examples are probably the best tool here.
  • Installing the "downstream" pointers to other concepts, and "upstream" pointers from other concepts to this one. "Downstream" pointers would be things like "here's a list of tricks you can use to solve this sort of problem", "upstream" pointers would be things like "this is itself a root-finding method, so look for quadratics when you need to solve equations, and also use other equation-solving tools like adding a number to both sides".
  • Giving weight to upstream/downstream concepts - i.e. indicating which connections are more/less important, so they're properly prioritized in the list of "actions" triggered when a pattern is detected.
  • Building the habit of actually checking for the pattern, and actually triggering the "actions" when the pattern is matched. I.e. practice, preferably on a fairly wide variety of problems to minimize "out-of-distribution"-style failures.

So that's one model.

That model seems to capture a lot of useful things about procedural knowledge graphs, but it seems like there's a separate kind of knowledge graph for world-models. The part above is analogous to a program (it guides what-to-do), whereas a world-modelling knowledge graph would be analogous to the contents of a database; it's the datastructure on which the procedural knowledge graph operates. My current best model for a world-modelling knowledge graph looks something like this [? · GW] - it's a causal model recursively built out of reusable sub-models.

Teaching components of the world-modelling knowledge graph would involve somewhat different pieces:

  • We'd typically be teaching some prototypical submodel, a building block to use in many different places in the world-model. For instance, in introductory physics these submodels would be things like "masses" and "inclined planes" and "masses on inclined planes".
  • Teaching the submodel itself means walking through the components of the little causal subgraph the model specifies - e.g. how masses on inclined planes behave.
  • The submodel will have some pattern-matcher associated with it, for recognizing components of the real world to which the submodel applies. This means examples, to practice recognizing e.g. "masses" and "inclined planes" and "masses on inclined planes".
  • The submodel will itself have submodels, and these are the pointers out to other nodes. E.g. if there's a submodel for the prototypical mass-on-inclined-plane problem, then it should have a pointer to a "point mass" submodel. Here, the real key is the connections which are not present - e.g. the point-mass submodel doesn't care about the shape of the object in question, it's just approximated as a point.

It feels like there should be a clean way to unify the procedural and world-modelling knowledge graphs. I'm not sure what it is. I'm sure somebody will argue that it's all procedural and the world-modelling is just embedded in a bunch of procedures, but I'm not convinced; it sure feels like there's a graph of data on which the program operates. I could see it working the opposite way, though... maybe it's all world-modelling, and part of the world-model is something like "model of the best way to solve this problem", and our "procedural" behavior is actually just prediction on that part of the world model (sort predictive-processing-esque).

comment by Lanrian · 2020-08-31T18:35:27.892Z · LW(p) · GW(p)

There is some research on knowledge graphs as a data-structure, and as a tool in AI. Wikipedia and a bunch of references.

comment by Gunnar_Zarncke · 2020-08-31T18:04:45.205Z · LW(p) · GW(p)

Thank you for your comprehensive post. It makes a lot of good points and does a good job of relating them to well-known terminology here. But I am missing sources. Teaching and learning can have counterintuitive effects and we should consider that.

A good overview of what is known about the effectiveness of teaching methods (though mostly in the school-level) is covered in Visible Learning: A Synthesis of Over 800 Meta-Analyses Relating to Achievement

comment by Gunnar_Zarncke · 2020-08-31T18:05:56.098Z · LW(p) · GW(p)

Interestingly, I just read a thread about Project Follow Through that counterintuitively showed that Direct Instruction is effective (effect size 0.58) but almost never used.

Main source is Theory of Instruction: Principles and Applications.

comment by Mathisco · 2020-08-29T13:43:47.008Z · LW(p) · GW(p)

As I grow older I spend more and more time teaching. I can concur with all points in this post. Sadly it contained no diagrams.

Diagrams are truly awesome. Great diagrams are absolutely amazing. High level summary diagrams are the best. I spend most of my time at work now drawing and explaining diagrams.

comment by romeostevensit · 2020-08-29T09:12:37.586Z · LW(p) · GW(p)

Too many great points to call out. Love this post. My own model for teaching a particular topic from reviewing some pedagogy research was

Pattern break, starting with a vivid hook that gets people's attention by being surprising in some way

Several examples from which the pattern can be inferred

Drawing the student's attention back and forth between the common elements of the examples

Creating a toy example of the core concept that has moving parts the student can then move themselves to see how other parts move (conceptually)

Anchoring the new set of intuitions with a succinct anchor phrase or image that ideally has conceptual hooks into the relevant problem domains so that the concept automatically gets triggered in the situations in which it is useful

This is all much easier said than done, but is a good skeleton for when you really really want people to get something.

comment by Neel Nanda (neel-nanda-1) · 2020-08-29T10:00:49.433Z · LW(p) · GW(p)


Anchoring the new set of intuitions with a succinct anchor phrase or image that ideally has conceptual hooks into the relevant problem domains so that the concept automatically gets triggered in the situations in which it is useful

Strongly agreed, I've been very pleasantly surprised by how valuable this approach is. I think having a clear label to important intuitions is one of the really valuable things I've gotten from the rationalist community. When writing blog posts, I try fairly hard to give clear labels to the key ideas and to put them in bold.

Creating a toy example of the core concept that has moving parts the student can then move themselves to see how other parts move (conceptually)

I'd be curious to see any examples of this you have in mind? I'm super excited about this as a form of learning, but struggle to imagine a specific example for anything I've tried teaching. This seems better suited to tutoring 1 on 1 than to larger groups/talks, I think?

comment by Liron · 2020-08-29T18:32:32.406Z · LW(p) · GW(p)

Re examples of toy examples with moving parts:

Andy Grove’s classic book High Output Management starts with the example of a diner that has to produce breakfasts with cooked eggs, and keeps referring to it to teach management concepts.

Minute Physics introduces a “Spacetime Globe” to visualize spacetime (the way a globe visualizes the Earth’s surface) and refers to it often starting at 3:25 in this video:

comment by romeostevensit · 2020-08-29T15:31:18.087Z · LW(p) · GW(p)

It's more scalable with remote learning where each student can access an animation with sliders that they can move themselves. This is extremely valuable for helping math concepts click IME. The intuitions get tuned by directly seeing how some output varies with an input. Otherwise there is manually going through several dimensions, what happens if we vary this vs if we vary that etc.

comment by FallibleDan · 2020-09-01T15:08:49.582Z · LW(p) · GW(p)

Thank you for your outline and pearls. Getting more skillful at framing, as you point out, is a key mindset. The framing of teaching depends on the learner's various states (current abilities in the subject domain(s); physical, social and emotional states, etc.) and the learner's context. Teaching requires that the teacher adjust to the learner's current states and the learner's context, and select the appropriate frames.

One perhaps obvious frame is to think of teaching as "that which enables learning." What enables learning?

Imprinting to the body, including, as pointed out, by doing. Sometimes called "getting the learner's skin in the game." The body is a human's interface with what is, so of course learning relies on bodies. Repetition is a particularly powerful way of getting the body's attention: "Oh, I guess this isn't just a one-off - I keep coming across this experience so I guess I'd better adapt to it." Examples: athletic or musical performance training, doing problem sets in engineering, etc. The body's strategy, including its brain, is that, for the long run (literally), an efficient response to repeated experiences is to hypertrophy muscles/neural pathways.

Example: paraphrasing, as pointed out, is a way to check whether the learner is keeping up. See whether the student "follows." The phrase, "Do you follow me?" uses the language of the body.

Engaging affective valences (joy, fear, longing, satisfaction, appreciation of beauty (e.g., maths concepts are often beautiful)). An appropriate emotional valence is crucial for long-term memory.

Engaging social or intrasocial valences - how can one belong, join, nurture or protect? 99.99% of the human operating system can be regarded as, "mammalian," but, like water to a fish, it's ubiquity makes it invisible to us. Yet who optimally trains or learns in a social vacuum or executes or performs in a social vacuum? Huge stadia and social media platforms and the fact that we love to hear and tell stories are more obvious testimony to the importance of social valences.

  • e.g., working in a group (including see one, do one, teach one); getting students to "pair and share" as a way to anchor student mindsets into a learning mode
  • e.g., working with a future self, an idealized or shadow self, or with an in-dwelling parent, child, friend, advocate or mentor

Copying the best or what has been honed over time by linking with culture. Think of culture as being the ancient apperceptive mass of humanity's experience and learning. Engage with culture, e.g., by looking up the recent (in English, most words have a Germanic or Latin origin; scientific or technical words may, in addition, have a Greek origin) and more ancient (Proto-indo-european - the payoffs in PIE are often massive) etymologies of any new word and every key concept.

In sum, teaching is often more successful if it has actionable 'relevance' (recent etymology of relevant = "apropos;" ancient etymology = "that which lightens [a burden]"). Learning is easier if the learner or learner's body senses that something is useful or unburdening to him or her or to his or her "group," especially in ways in which the body (including emotion) or culture (especially language) have already provided hooks to latch on to.

comment by Peterson Yook (peterson-yook) · 2020-08-29T04:20:47.798Z · LW(p) · GW(p)

While delivering knowledges is a major part of education, I just want to mention that post-education is very important because most students stop learning about subjects. When I was young, I joined a science summer camp and had fun and challenging time, but after camp my life hadn't changed at all. Today I reflect past memories and think I may become different person if my learning is continued even after no teachers are around.

Finding outside resources, planning a research or engaging with a community can be desirable habits for students. Also questioning a big picture and real-life applications help students learn about other subjects.